Query Apache Iceberg™ tables with an external engine through Snowflake Horizon Catalog

Query Snowflake-managed Apache Iceberg™ tables by using an external query engine through Snowflake Horizon Catalog. To ensure this interoperability with external engines, Apache Polaris™ (incubating) is integrated into Horizon Catalog. In addition, Horizon Catalog exposes the Apache Iceberg™ REST API (Horizon Iceberg REST Catalog API). This API lets you read the tables by using external query engines.

To query Snowflake-managed Iceberg tables with an external query engine, you can use this feature instead of syncing Snowflake-managed Iceberg tables with Snowflake Open Catalog. For more information about Open Catalog, see Snowflake Open Catalog overview.

By connecting an external query engine to Iceberg tables through Horizon Catalog, you can perform the following tasks:

  • Use any external query engine that supports the open Iceberg REST protocol to query these tables, such as Apache Spark™.

  • Query any existing and new Snowflake-managed Iceberg tables in a new or existing Snowflake account by using a single Horizon Catalog endpoint.

  • Query the tables by using your existing users, roles, policies, and authentication in Snowflake.

  • Use vended credentials.

For more information about Snowflake Horizon Catalog, see Snowflake Horizon Catalog.

The following diagram shows external query engines reading Snowflake-managed Iceberg tables through Horizon Catalog and Snowflake reading and writing to these tables:

Diagram that shows external query engines reading Snowflake-managed Iceberg tables through Horizon Catalog and Snowflake reading and writing to these tables.

Billing

  • The Horizon Iceberg REST Catalog API is available in all Snowflake editions.

  • The API requests are billed as 0.5 credit per million calls and charged as Cloud Services.

  • For cross-region data access, standard cross-region data egress charges as stated in the Snowflake Service Consumption Table are applicable.

Note

Billing for this feature is scheduled to begin in mid-2026, subject to change.

Supported external engines and catalogs

The following tables, although not exhaustive, show many external engines and catalogs that integrate with the Horizon Iceberg REST Catalog API. This integration enables access to Snowflake managed Iceberg tables through external systems.

Supported external engines

The following external query engines integrate with the Horizon Iceberg REST Catalog API:

Product

Access Snowflake-managed Iceberg tables through Horizon Catalog

Apache Doris™

Apache Flink™

Apache Spark™

Dremio

DuckDB

PyIceberg

StarRocks

Trino

Supported external catalogs

The following external catalogs integrate with the Horizon Iceberg REST Catalog API:

Product

Access Snowflake-managed Iceberg tables through Horizon Catalog

Comment

Apache Polaris™

AWS Glue

For instructions on how to configure this integration, see Access Snowflake Horizon Catalog data using catalog federation in the AWS Glue Data Catalog in the AWS Big Data Blog.

Palantir Foundry

For instructions on how to configure this integration, see Iceberg tables (virtual tables only) in the Palantir documentation.

Databricks Unity Catalog

Not announced

Google BigLake Metastore

In development

Microsoft Fabric / Synapse

In development

Prerequisites

Retrieve the account identifier for your Snowflake account that contains the Iceberg tables that you want to query. For instructions, see Account identifiers. You specify this identifier when you connect an external query engine to your Iceberg tables.

Tip

To get your account identifier by using SQL, you can run the following command:

SELECT CURRENT_ORGANIZATION_NAME() || '-' || CURRENT_ACCOUNT_NAME();
Copy

(Optional) Private connectivity

For secure connectivity, consider configuring Inbound and Outbound private connectivity for your Snowflake account while you access the Horizon Catalog endpoint.

Note

Private connectivity is only supported for Snowflake-managed Iceberg tables stored on Amazon S3 or Azure Storage (ADLS).

Workflow for querying Iceberg tables by using an external query engine

To query Iceberg tables by using an external query engine, complete the following steps:

  1. Create Iceberg tables

  2. Configure access control

  3. Obtain an access token for authentication

  4. Verify access token permissions

  5. (Optional) Configure data protection policies

  6. Connect an external query engine to Iceberg tables through Horizon Catalog

  7. Query Iceberg tables

Step 1: Create Iceberg tables

Important

If you already have Snowflake-managed Iceberg tables you want to query, you can skip this step.

In this step, you create Snowflake-managed Iceberg tables that use Snowflake as the catalog, so you can query them with an external query engine. For instructions, see the following topics:

Step 2: Configure access control

Important

If you already have roles that are configured with access to the Iceberg tables that you want to query, you can skip this step.

In this step, you configure access control for the Snowflake-managed Iceberg tables that you want to query with an external query engine. For example, you can set up the following roles in Snowflake:

  • DATA_ENGINEER role, which has access to all schemas and all Snowflake-managed Iceberg tables in a database.

  • DATA_ANALYST role, which has access to one schema in the database and only access to two Snowflake-managed Iceberg tables within that schema.

For instructions, see Configuring access control.

The following example sets up a service account user in Snowflake with read-only access to an Iceberg table. It creates a DATA_ENGINEER role, grants the role usage and monitor privileges on the ICEBERG_TEST_DB database and its PUBLIC schema, grants SELECT privileges on the TEST_TABLE Iceberg table, creates a service user named HORIZON_REST_SRV_ACCOUNT_USER, and assigns the DATA_ENGINEER role to that user.

CREATE OR REPLACE ROLE DATA_ENGINEER;

GRANT USAGE,MONITOR ON DATABASE ICEBERG_TEST_DB TO ROLE DATA_ENGINEER;
GRANT USAGE,MONITOR ON SCHEMA ICEBERG_TEST_DB.PUBLIC TO ROLE DATA_ENGINEER;

GRANT SELECT ON TABLE ICEBERG_TEST_DB.PUBLIC.TEST_TABLE TO ROLE DATA_ENGINEER;

CREATE OR REPLACE USER HORIZON_REST_SRV_ACCOUNT_USER TYPE=SERVICE DEFAULT_ROLE=DATA_ENGINEER;

GRANT ROLE DATA_ENGINEER TO USER HORIZON_REST_SRV_ACCOUNT_USER;
Copy

For more information about access control in Snowflake, see Overview of Access Control.

Step 3: Obtain an access token for authentication

In this step, you obtain an access token, which you must have to authenticate to the Horizon Catalog endpoint for your Snowflake account. You need to obtain an access token for each user — service or human — and role that is configured with access to Snowflake-managed Iceberg tables. For example, you need to obtain one access token for a user with DATA_ENGINEER role and another user with a DATA_ANALYST role.

You specify this access token later when you connect an external query engine to Iceberg tables through Horizon Catalog.

You can obtain an access token by using one of the following authentication options:

External OAuth

If you’re using External OAuth, generate an access token for your identity provider. For instructions, see External OAuth overview.

Note

For External OAuth, alternatively, you can configure your connection to the engine with automatic token refresh instead of specifying an access token.

Key-pair authentication

If you use key-pair authentication, to obtain an access token, you sign a JSON web token (JWT) with your private key.

The following steps cover how to generate an access token for key-pair authentication:

  1. Configure key-pair authentication

  2. Grant a role to the user

  3. Generate a JSON Web Token (JWT)

  4. Generate an access token

Step 1: Configure key-pair authentication

In this step, you perform the following tasks:

  • Generate a private key

  • Generate a public key

  • Store the private and public keys securely

  • Grant the privilege to assign a public key to a Snowflake user

  • Assign the public key to a Snowflake user

  • Verify the user’s public key fingerprint

For instructions, see Configuring key-pair authentication.

Step 2: Grant a role to the user

Run the GRANT ROLE command to grant the Snowflake role that has privileges to the tables you want to query to the key-pair authentication user. For example, to grant the ENGINEER role to the my_service_user user, run the following command:

GRANT ROLE ENGINEER to user my_service_user;
Copy

Step 3: Generate a JSON Web Token (JWT)

In this step, you use SnowSQL to generate a JSON Web Token (JWT) for key-pair authentication.

Note

Use SnowSQL to generate a JWT:

snowsql --private-key-path "<private_key_file>" \
  --generate-jwt \
  -h "<account_identifier>.snowflakecomputing.com" \
  -a "<account_locator>" \
  -u "<user_name>"
Copy

Where:

  • <private_key_file> is the path to your private key file that corresponds to the public key assigned to your Snowflake user. For example: /Users/jsmith/.ssh/rsa_key.p8.

  • <account_identifier> is the account identifier for your Snowflake account, in the format <organization_name>-<account_name>. To find the account identifier, see Supported external engines and catalogs. An example of an account identifier is myorg-myaccount.

  • <account_locator> is the account locator for your Snowflake account.

    To find your account locator, see Locate your Snowflake account information in Snowsight and view the Account locator in the Account Details dialog.

  • <user_name> is the user name for a Snowflake user with the public key assigned to the user.

Step 4: Generate an access token

Important

To generate an access token, you must first generate a JWT. You must first generate a JWT because you use the JWT to generate the access token.

Use a curl command to generate an access token:

curl -i --fail -X POST "https://<account_identifier>.snowflakecomputing.com/polaris/api/catalog/v1/oauth/tokens" \
 --header 'Content-Type: application/x-www-form-urlencoded' \
 --data-urlencode 'grant_type=client_credentials' \
 --data-urlencode 'scope=session:role:<role>' \
 --data-urlencode 'client_secret=<JWT_token>'
Copy

Where:

  • <account_identifier> is the account identifier for your Snowflake account, in the format <organization_name>-<account_name>. To find the account identifier, see Supported external engines and catalogs. An example of an account identifier is myorg-myaccount.

  • <role> is the Snowflake role that is granted access to Iceberg tables, such as ENGINEER.

  • <JWT_token> Is the JWT that you generated in the previous step.

Programmatic access token (PAT)

If you use PATs, generate a PAT for authentication.

First, you generate a PAT, which you use to connect an external query engine to Iceberg tables. Then, you generate an access token, which you only use to verify the permissions for your PAT.

Step 1: Generate a PAT

For instructions on how to configure and generate a PAT, see Using programmatic access tokens for authentication.

The following example creates a programmatic access token (PAT) for the service account user that you created in the previous step by using the ALTER USER … ADD PROGRAMMATIC ACCESS TOKEN (PAT) command:

ALTER USER IF EXISTS HORIZON_REST_SRV_ACCOUNT_USER
ADD PAT HORIZON_REST_SRV_ACCOUNT_USER_PAT
  DAYS_TO_EXPIRY = 7
  ROLE_RESTRICTION = 'DATA_ENGINEER'
  COMMENT = 'HORIZON REST API PAT FOR SERVICE ACCOUNT';
Copy

Step 2: Generate an access token for your PAT

In this step, you generate an access token for your PAT.

Attention

You only specify the access token that you generate in this step when you verify the permissions for your PAT. When you connect an external query engine to Iceberg tables, you must specify your PAT that you generated in the previous step, not the access token that you generate in this step.

Use a curl command to generate an access token for your PAT:

curl -i --fail -X POST "https://<account_identifier>.snowflakecomputing.com/polaris/api/catalog/v1/oauth/tokens" \
 --header 'Content-Type: application/x-www-form-urlencoded' \
 --data-urlencode 'grant_type=client_credentials' \
 --data-urlencode 'scope=session:role:<role>' \
 --data-urlencode 'client_secret=<PAT_token>'
Copy

Where:

  • <account_identifier> is the account identifier for your Snowflake account, in the format <organization_name>-<account_name>. To find the account identifier, see Supported external engines and catalogs. An example of an account identifier is myorg-myaccount.

  • <role> is the Snowflake role that is granted to your PAT and has access to the Iceberg tables you want to query, such as ENGINEER.

  • <PAT_token> is the value for the PAT token that you generated in the previous step.

Step 4: Verify access token permissions

In this step, you verify the permissions for the access token that you obtained in the previous step.

Verify access to the Horizon IRC endpoint

Use a curl command to verify that you have permission to access your Horizon IRC endpoint:

curl -i --fail -X GET "https://<account_identifier>.snowflakecomputing.com/polaris/api/catalog/v1/config?warehouse=<database_name>" \
-H "Authorization: Bearer <access_token>" \
-H "Content-Type: application/json"
Copy

Where:

  • <account_identifier> is the account identifier for your Snowflake account, in the format <organization_name>-<account_name>. To find the account identifier, see Supported external engines and catalogs. An example of an account identifier is myorg-myaccount.

  • <access_token> is your access token that you generated. If you’re using a PAT, this value is the access token you generated, not the personal access token (PAT) you generated.

  • <database_name> is the name of the database you want to query.

    Important

    You must specify the database name in all capital letters, even if it was created with lowercase letters.

Example return value:

{
  "defaults": {
    "default-base-location": ""
  },
  "overrides": {
    "prefix": "MY-DATABASE"
  }
}

Retrieve the metadata for a table

You can also make a GET request to retrieve the metadata for a table. Snowflake uses the loadTable operation to load table metadata from your REST catalog.

curl -i --fail -X GET "https://<account_identifier>.snowflakecomputing.com/polaris/api/catalog/v1/<database_name>/namespaces/<namespace_name>/tables/<table_name>" \
 -H "Authorization: Bearer <access_token>" \
 -H "Content-Type: application/json"
Copy

Where:

  • <account_identifier> is the account identifier for your Snowflake account, in the format <organization_name>-<account_name>. To find the account identifier, see Supported external engines and catalogs. An example of an account identifier is myorg-myaccount.

  • <database_name> is the database of the table whose metadata you want to retrieve.

  • <namespace_name> is the namespace of the table whose metadata you want to retrieve.

  • <table_name> is the table whose metadata you want to retrieve.

  • <access_token> is your access token that you generated. If you’re using a PAT, this value is the access token you generated, not the personal access token (PAT) you generated.

Important

You must specify the database, namespaces, and table names in all capital letters, even if the object was created with lowercase letters.

(Optional) Step 5: Configure data protection policies

In this step, you configure data protection policies for Iceberg tables. If you don’t have tables that you need to protect with Snowflake data policies, you can proceed to the next step.

Note

Tables protected by data protection policies can be accessed over the Horizon Iceberg REST API and by using Apache Spark™.

For instructions on how to configure data protection policies, see Configure data protection policies on Iceberg tables accessed over Horizon Iceberg REST API and using Apache Spark™.

Step 6: Connect an external query engine to Iceberg tables through Horizon Catalog

In this step, you connect an external query engine to Iceberg tables through Horizon Catalog. With this connection, you can query the tables by using the external query engine.

The external engines use the Apache Iceberg™ REST endpoint exposed by Snowflake. For your Snowflake account, this endpoint is in the following format:

https://<account_identifier>.snowflakecomputing.com/polaris/api/catalog
Copy

The example code in this step shows how to set up a connection in Spark, and the example code is in PySpark. For more information, see the following sections:

Connect by using External OAuth or key pair authentication

Use one of the following configurations to connect:

Connect an external query engine without enforcing data policies

  • To connect the external query engine to Iceberg tables by using External OAuth or key pair authentication. Use the following example code.

This code doesn’t enforce data protection policies:

# Snowflake Horizon Catalog Configuration, change as per your environment

CATALOG_URI = "https://<account_identifier>.snowflakecomputing.com/polaris/api/catalog"
HORIZON_SESSION_ROLE = f"session:role:<role>"
CATALOG_NAME = "<database_name>" #provide in UPPER CASE

# Cloud Service Provider Region Configuration (where the Iceberg data is stored)
REGION = "eastus2"

# Paste the External Oauth Access token that you generated in Snowflake here
ACCESS_TOKEN = "<your_access_token>"

# Iceberg Version
ICEBERG_VERSION = "1.9.1"

def create_spark_session():
  """Create and configure Spark session for Snowflake Iceberg access."""
  spark = (
      SparkSession.builder
      .appName("SnowflakeIcebergReader")
      .master("local[*]")

# JAR Dependencies for Iceberg and Azure
      .config(
          "spark.jars.packages",
          f"org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:{ICEBERG_VERSION},"
          f"org.apache.iceberg:iceberg-aws-bundle:{ICEBERG_VERSION}"
          # for Azure storage, use the below package and comment above azure bundle
          # f"org.apache.iceberg:iceberg-azure-bundle:{ICEBERG_VERSION}"
      )

      # Iceberg SQL Extensions
      .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions")
      .config("spark.sql.defaultCatalog", CATALOG_NAME)

      # Horizon REST Catalog Configuration
      .config(f"spark.sql.catalog.{CATALOG_NAME}", "org.apache.iceberg.spark.SparkCatalog")
      .config(f"spark.sql.catalog.{CATALOG_NAME}.type", "rest")
      .config(f"spark.sql.catalog.{CATALOG_NAME}.uri", CATALOG_URI)
      .config(f"spark.sql.catalog.{CATALOG_NAME}.warehouse", CATALOG_NAME)
      .config(f"spark.sql.catalog.{CATALOG_NAME}.token", ACCESS_TOKEN)
      .config(f"spark.sql.catalog.{CATALOG_NAME}.scope", HORIZON_SESSION_ROLE)
      .config(f"spark.sql.catalog.{CATALOG_NAME}.client.region", REGION)

      # Required for vended credentials
      .config(f"spark.sql.catalog.{CATALOG_NAME}.header.X-Iceberg-Access-Delegation", "vended-credentials")
      .config("spark.sql.iceberg.vectorization.enabled", "false")
      .getOrCreate()
  )
  spark.sparkContext.setLogLevel("ERROR")
  return spark
Copy

Where:

  • <account_identifier> is your Snowflake account identifier for the Snowflake account that contains the Iceberg tables that you want to query. To find this identifier, see Supported external engines and catalogs.

  • <your_access_token> is your access token that you obtained. To obtain it, see Step 3: Obtain an access token for authentication.

    Note

    For External OAuth, alternatively, you can configure your connection to the engine with automatic token refresh instead of specifying an access token.

  • <database_name> is the name of the database in your Snowflake account that contains Snowflake-managed Iceberg tables that you want to query.

    Note

    The .warehouse property in Spark expects your Snowflake database name, not your Snowflake warehouse name.

  • <role> is the role in Snowflake that is configured with access to the Iceberg tables that you want to query. For example: DATA_ENGINEER.

Important

By default, the code example is set up for Apache Iceberg™ tables stored on Amazon S3. If your Iceberg tables are stored on Azure Storage (ADLS), perform the following steps:

  1. Comment out the following line: f"org.apache.iceberg:iceberg-aws-bundle:{ICEBERG_VERSION}"

  2. Uncomment the following line: # f"org.apache.iceberg:iceberg-azure-bundle:{ICEBERG_VERSION}"

Connect an external query engine with data policies enforced

Connect by using a programmatic access token (PAT)

Use one of the following configurations to connect:

Connect an external query engine without enforcing data policies

  • To connect the external query engine to Iceberg tables by using a programmatic access token (PAT), use the following example code.

This code doesn’t enforce data protection policies:

# Snowflake Horizon Catalog Configuration, change as per your environment

CATALOG_URI = "https://<account_identifier>.snowflakecomputing.com/polaris/api/catalog"
HORIZON_SESSION_ROLE = f"session:role:<role>"
CATALOG_NAME = "<database_name>" #provide in UPPER CASE

# Cloud Service Provider Region Configuration (where the Iceberg data is stored)
REGION = "eastus2"

# Paste the PAT you generated in Snowflake here
PAT_TOKEN = "<your_PAT_token>"

# Iceberg Version
ICEBERG_VERSION = "1.9.1"

def create_spark_session():
  """Create and configure Spark session for Snowflake Iceberg access."""
  spark = (
      SparkSession.builder
      .appName("SnowflakeIcebergReader")
      .master("local[*]")

# JAR Dependencies for Iceberg and Azure
      .config(
          "spark.jars.packages",
          f"org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:{ICEBERG_VERSION},"
          f"org.apache.iceberg:iceberg-aws-bundle:{ICEBERG_VERSION}"
          # for Azure storage, use the below package and comment above azure bundle
          # f"org.apache.iceberg:iceberg-azure-bundle:{ICEBERG_VERSION}"
      )

      # Iceberg SQL Extensions
      .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions")
      .config("spark.sql.defaultCatalog", CATALOG_NAME)

      # Horizon REST Catalog Configuration
      .config(f"spark.sql.catalog.{CATALOG_NAME}", "org.apache.iceberg.spark.SparkCatalog")
      .config(f"spark.sql.catalog.{CATALOG_NAME}.type", "rest")
      .config(f"spark.sql.catalog.{CATALOG_NAME}.uri", CATALOG_URI)
      .config(f"spark.sql.catalog.{CATALOG_NAME}.warehouse", CATALOG_NAME)
      .config(f"spark.sql.catalog.{CATALOG_NAME}.credential", PAT_TOKEN)
      .config(f"spark.sql.catalog.{CATALOG_NAME}.scope", HORIZON_SESSION_ROLE)
      .config(f"spark.sql.catalog.{CATALOG_NAME}.client.region", REGION)

      # Required for vended credentials
      .config(f"spark.sql.catalog.{CATALOG_NAME}.header.X-Iceberg-Access-Delegation", "vended-credentials")
      .config("spark.sql.iceberg.vectorization.enabled", "false")
      .getOrCreate()
  )
  spark.sparkContext.setLogLevel("ERROR")
  return spark
Copy

Where:

  • <account_identifier> is your Snowflake account identifier for the Snowflake account that contains the Iceberg tables that you want to query. To find this identifier, see Supported external engines and catalogs.

  • <your_PAT_token> is your PAT that you obtained. To obtain it, see Step 3: Obtain an access token for authentication.

  • <role> is the role in Snowflake that is configured with access to the Iceberg tables that you want to query. For example: DATA_ENGINEER.

  • <database_name> is the name of the database in your Snowflake account that contains Snowflake-managed Iceberg tables that you want to query.

    Note

    The .warehouse property in Spark expects your Snowflake database name, not your Snowflake warehouse name.

Important

By default, the code example is set up for Apache Iceberg™ tables stored on Amazon S3. If your Iceberg tables are stored on Azure Storage (ADLS), perform the following steps:

  1. Comment out the following line: f"org.apache.iceberg:iceberg-aws-bundle:{ICEBERG_VERSION}"

  2. Uncomment the following line: # f"org.apache.iceberg:iceberg-azure-bundle:{ICEBERG_VERSION}"

Connect an external query engine with data policies enforced

Step 7: Query Iceberg tables

This step provides the following code examples for using Apache Spark™ to query Iceberg tables:

  • Show namespaces

  • Use namespaces

  • Show tables

  • Query a table

Show namespaces

spark.sql("show namespaces").show()
Copy

Use namespace

spark.sql("use namespace <your_schema_name_in_snowflake>")
Copy

Show tables

spark.sql("show tables").show()
Copy

Query a table

spark.sql("use namespace spark_demo")
spark.sql("select * from <your_table_name_in_snowflake>").show()
Copy

Considerations for querying Iceberg tables with an external query engine

Consider the following items when you query Iceberg tables with an external query engine:

  • Iceberg

    • For tables in Snowflake:

      • Only Snowflake-managed Iceberg tables are supported.

      • Querying the following tables isn’t supported:

        • Remote tables

        • Snowflake native tables

        • Externally managed Iceberg tables including Delta-based Iceberg tables and Snowflake-managed Iceberg tables that you loaded with data from Iceberg-compatible Parquet data files by using the COPY INTO table command

    • You can query but can’t write to Iceberg tables.

    • The external reads are supported only on Iceberg version 2 or earlier.

  • Access control:

  • Network and private connectivity:

    • Using network policies that are set at the user level isn’t supported with this feature.

    • For Snowflake-managed network rules, egress IP addresses that are static aren’t supported.

    • Explicitly granting the Horizon Catalog endpoint access to your storage accounts isn’t supported. We recommend that you use private connectivity for secure connectivity from external engines to Horizon Catalog and from Horizon Catalog to your storage account.

  • Listings:

    • Iceberg tables that you share through auto-fulfillment for listings aren’t accessible through the consumer account’s Horizon Iceberg REST Catalog API.

  • Clouds:

    • This feature is only supported for Snowflake-managed Iceberg tables that are stored on Amazon S3, Google Cloud, or Microsoft Azure for all public cloud regions. S3-compatible non-AWS storage isn’t yet supported.

    • For Iceberg tables stored on Amazon S3:

      • If you want to use SSE-KMS encryption, contact customer support or your account team for assistance with enabling access.

    • For Iceberg tables stored on Azure:

      • Azure Virtual Network (VNet) isn’t supported.

  • Authentication:

    • For key-pair authentication, key-pair rotation isn’t supported.

    • Workload identity federation isn’t supported with this feature.