Apache Iceberg™ tables¶

Apache Iceberg™ tables for Snowflake combine the performance and query semantics of typical Snowflake tables with external cloud storage that you manage. They are ideal for existing data lakes that you cannot, or choose not to, store in Snowflake.

Iceberg tables use the Apache Iceberg™ open table format specification, which provides an abstraction layer on data files stored in open formats and supports features such as:

ACID (atomicity, consistency, isolation, durability) transactions
Schema evolution
Hidden partitioning
Table snapshots

Snowflake supports Iceberg tables that use the Apache Parquet™ file format.

Getting started¶

To get started with Iceberg tables, see Tutorial: Create your first Apache Iceberg™ table.

How it works¶

This section provides information specific to working with Iceberg tables in Snowflake. To learn more about the Iceberg table format specification, see the official Apache Iceberg documentation and the Iceberg Table Spec.

Data storage
Catalog
Metadata and snapshots
Cross-cloud/cross-region support
Billing

Data storage¶

Iceberg tables store their data and metadata files in an external cloud storage location (Amazon S3, Google Cloud Storage, or Azure Storage). The external storage is not part of Snowflake. You are responsible for all management of the external cloud storage location, including the configuration of data protection and recovery. Snowflake does not provide Fail-safe storage for Iceberg tables.

Snowflake connects to your storage location using an external volume, and Iceberg tables incur no Snowflake storage costs. For more information, see Billing.

To learn more about storage for Iceberg tables, see Storage for Apache Iceberg™ tables.

External volume¶

An external volume is a named, account-level Snowflake object that you use to connect Snowflake to your external cloud storage for Iceberg tables. An external volume stores an identity and access management (IAM) entity for your storage location. Snowflake uses the IAM entity to securely connect to your storage for accessing table data, Iceberg metadata, and manifest files that store the table schema, partitions, and other metadata.

A single external volume can support one or more Iceberg tables.

To set up an external volume for Iceberg tables, see Configure an external volume.

Catalog¶

An Iceberg catalog enables a compute engine to manage and load Iceberg tables. The catalog forms the first architectural layer in the Iceberg table specification and must support:

Storing the current metadata pointer for one or more Iceberg tables. A metadata pointer maps a table name to the location of that table’s current metadata file.
Performing atomic operations so that you can update the current metadata pointer for a table.

To learn more about Iceberg catalogs, see the Apache Iceberg documentation.

Snowflake supports different catalog options. For example, you can use Snowflake as the Iceberg catalog, or use a catalog integration to connect Snowflake to an external Iceberg catalog.

Catalog integration¶

A catalog integration is a named, account-level Snowflake object that stores information about how your table metadata is organized for the following scenarios:

When you don’t use Snowflake as the Iceberg catalog. For example, you need a catalog integration if your table is managed by AWS Glue.
When you want to integrate with Snowflake Open Catalog to:
- Query an Iceberg table in Snowflake Open Catalog using Snowflake.
- Sync a Snowflake-managed Iceberg table with Snowflake Open Catalog so that third-party compute engines can query the table.

A single catalog integration can support one or more Iceberg tables that use the same external catalog.

To set up a catalog integration, see Configure a catalog integration.

Metadata and snapshots¶

Iceberg uses a snapshot-based querying model, where data files are mapped using manifest and metadata files. A snapshot represents the state of a table at a point in time and is used to access the complete set of data files in the table.

To learn about table metadata and Time Travel support, see Metadata and retention for Apache Iceberg™ tables.

Cross-cloud/cross-region support¶

Snowflake supports using an external volume storage location with a different cloud provider (in a different region) from the one that hosts your Snowflake account.

Table type	Cross-cloud/cross-region support	Notes
Tables that use an external catalog with a catalog integration	✔	If your Snowflake account and external volume are in different regions, your external cloud storage account incurs egress costs when you query the table.
Tables that use Snowflake as the catalog	✔	If your Snowflake account and external volume are in different regions, your external cloud storage account incurs egress costs when you query the table. These tables incur costs for cross-region data transfer usage. For more information, see Billing.

Table type

Cross-cloud/cross-region support

Notes

Tables that use an external catalog with a catalog integration

✔

If your Snowflake account and external volume are in different regions, your external cloud storage account incurs egress costs when you query the table.

Tables that use Snowflake as the catalog

✔

If your Snowflake account and external volume are in different regions, your external cloud storage account incurs egress costs when you query the table.

These tables incur costs for cross-region data transfer usage. For more information, see Billing.

Billing¶

Snowflake bills your account for virtual warehouse (compute) usage and cloud services when you work with Iceberg tables. Snowflake also bills your account if you use automated refresh.

If a Snowflake-managed Iceberg table is cross-cloud/cross-region, Snowflake bills your cross-region data transfer usage under the TRANSFER_TYPE of DATA_LAKE. To learn more, see:

DATA_TRANSFER_HISTORY view in the ORGANIZATION_USAGE schema.
DATA_TRANSFER_HISTORY view in the ACCOUNT_USAGE schema.

Snowflake does not bill your account for the following:

Iceberg table storage costs. Your cloud storage provider bills you directly for data storage usage.
Active bytes used by Iceberg tables. However, the INFORMATION_SCHEMA.TABLE_STORAGE_METRICS and ACCOUNT_USAGE.TABLE_STORAGE_METRICS views display ACTIVE_BYTES for Iceberg tables to help you track how much storage a table occupies. To view an example, see Retrieve storage metrics.

Note

If your Snowflake account and external volume are in different regions, your external cloud storage account incurs egress costs when you query the table.

Catalog options¶

Snowflake supports the following Iceberg catalog options:

Use Snowflake as the Iceberg catalog
Use an external Iceberg catalog

The following table summarizes the differences between these catalog options.

	Use Snowflake as the catalog	Use an external catalog
Read access	✔	✔
Write access	✔	✔ Write support for externally managed tables is in preview.
Write access across regions	✔	✔ with Write support for externally managed tables
Data and metadata storage	External volume (cloud storage)	External volume (cloud storage)
Snowflake platform support	✔
Integrates with Snowflake Open Catalog	✔ You can sync a Snowflake-managed table with Open Catalog to query a table using other compute engines.	✔ You can use Snowflake to query or write to Iceberg tables managed by Open Catalog.
Works with the Snowflake Catalog SDK	✔	✔

Use Snowflake as the catalog¶

An Iceberg table that uses Snowflake as the Iceberg catalog (Snowflake-managed Iceberg table) provides full Snowflake platform support with read and write access. The table data and metadata are stored in external cloud storage, which Snowflake accesses using an external volume. Snowflake handles all life-cycle maintenance, such as compaction, for the table.

How Iceberg tables that use Snowflake as the Iceberg catalog work

Use an external catalog¶

An Iceberg table that uses an external catalog provides limited Snowflake platform support.

With this table type, Snowflake uses a catalog integration to retrieve information about your Iceberg metadata and schema.

You can use this option to create an Iceberg table for the following sources:

Remote Iceberg REST catalog, including AWS Glue and Snowflake Open Catalog. Snowflake supports writes to externally managed tables that use a remote Iceberg REST catalog.
Delta table files in object storage
Iceberg metadata files in object storage

Snowflake does not assume any life-cycle management on the table.

The table data and metadata are stored in external cloud storage, which Snowflake accesses using an external volume.

Note

If you want full Snowflake platform support for an Iceberg table that uses an external catalog, you can convert it to use Snowflake as the catalog. For more information, see Convert an Apache Iceberg™ table to use Snowflake as the catalog.

The following diagram shows how an Iceberg table uses a catalog integration with an external Iceberg catalog.

How Iceberg tables that use a catalog integration work

Considerations and limitations¶

The following considerations and limitations apply to Iceberg tables, and are subject to change:

Clouds and regions

Iceberg tables are available for all Snowflake accounts, on all cloud platforms and in all regions.

Cross-cloud/cross-region tables are supported. For more information, see Cross-cloud/cross-region support.

Iceberg

Versions 1 and 2 of the Apache Iceberg specification are supported, excluding the following features:

Row-level equality deletes. However, tables that use Snowflake as the catalog support Snowflake DELETE statements.

Using the history.expire.min-snapshots-to-keep table property to specify the default minimum number of snapshots to keep. For more information, see Metadata and snapshots.

Iceberg partitioning with the bucket transform function impacts performance for queries that use conditional clauses to filter results.

For Iceberg tables that aren’t managed by Snowflake, be aware of the following:

Time travel to any snapshot generated after table creation is supported as long as you periodically refresh the table before the snapshot expires.

Converting a table that has an un-materialized identity partition column isn’t supported. An un-materialized identity partition column is created when a table defines an identity transform using a source column that doesn’t exist in a Parquet file.

For row-level deletes:

Snowflake supports position deletes only.

For the best read performance when you use row-level deletes, perform regular compaction and table maintenance to remove old delete files. For information, see Maintain tables that use an external catalog.

Automated refresh isn’t currently supported when you use position deletes.

Excessive position deletes, especially dangling position deletes, might prevent table creation and refresh operations. To avoid this issue, perform table maintenance to remove extra position deletes.

The table maintenance method to use depends on your external Iceberg engine. For example, you can use the rewrite_data_files method for Spark with the delete-file-threshold or rewrite-all options. For more information, see rewrite_data_files in the Apache Iceberg™ documentation.

File formats

Iceberg tables support Apache Parquet files.

Parquet files that use the unsigned integer logical type aren’t supported.

For Parquet files that use the LIST logical type, be aware of the following:

The three-level annotation structure with the element keyword is supported. For more information, see Parquet Logical Type Definitions. If your Parquet file uses an obsolete format with the array keyword, you must regenerate your data based on the supported format.

External volumes

You can’t access the cloud storage locations in external volumes using a storage integration.

You must configure a separate trust relationship for each external volume that you create.

You can use outbound private connectivity to access Snowflake-managed Iceberg tables and Iceberg tables that use a catalog integration for object storage, but cannot use it to access Iceberg tables that use other catalog integrations.

After you create a Snowflake-managed table, the path to its files in external storage does not change, even if you rename the table.

Snowflake can’t support external volumes with S3 bucket names that contain dots (for example, my.s3.bucket). S3 doesn’t support SSL for virtual-hosted-style buckets with dots in the name, and Snowflake uses virtual-host-style paths and HTTPS to access data in S3.

Metadata files

The metadata files don’t identify the most recent snapshot of an Iceberg table.

You can’t modify the location of the data files or snapshot using the ALTER ICEBERG TABLE command. To modify either of these settings, you must recreate the table (using the CREATE OR REPLACE ICEBERG TABLE syntax).

For tables that use an external catalog:

Ensure that manifest files don’t contain duplicates. If duplicate files are present in the same snapshot, Snowflake returns an error that includes the path of the duplicate file.

You can’t create a table if the Parquet metadata contains invalid UTF-8 characters. Ensure that your Parquet metadata is UTF-8 compliant.

Snowflake detects corruptions and inconsistencies in Parquet metadata produced outside of Snowflake, and surfaces issues through error messages.

It’s possible to create, refresh, or query externally managed (or converted) tables, even if the table metadata is inconsistent. When writing Iceberg data, ensure that the table’s metadata statistics (for example, RowCount or NullCount) match the data content.

For tables that use Snowflake as the catalog, Snowflake processes DDL statements individually and produces metadata in a way that might differ from other catalogs. For more information, see DDL statements.

Clustering

Clustering support depends on the type of Iceberg table.

Table type

Notes

Tables that use Snowflake as the Iceberg catalog

Set a clustering key by using either the CREATE ICEBERG TABLE or the ALTER ICEBERG TABLE command. To set or manage a clustering key, see CREATE ICEBERG TABLE (Snowflake as the Iceberg catalog) and ALTER ICEBERG TABLE.

Tables that use an external catalog

Clustering is not supported.

Converted tables

Snowflake only clusters files if they were created after converting the table, or if the files have since been modified using a DML statement.

Table type	Notes
Tables that use Snowflake as the Iceberg catalog	Set a clustering key by using either the CREATE ICEBERG TABLE or the ALTER ICEBERG TABLE command. To set or manage a clustering key, see CREATE ICEBERG TABLE (Snowflake as the Iceberg catalog) and ALTER ICEBERG TABLE.
Tables that use an external catalog	Clustering is not supported.
Converted tables	Snowflake only clusters files if they were created after converting the table, or if the files have since been modified using a DML statement.

Delta

Snowflake supports Delta reader version 2 and can read all tables written by engines using Delta Lake version 2.2.0.

Snowflake streams aren’t supported for Iceberg tables created from Delta table files with partition columns. However, insert-only streams for tables created from Delta files without partition columns are supported.

Iceberg tables created from Delta files that were created before the 2024_04 release bundle are not supported in dynamic tables.

Snowflake doesn’t support creating Iceberg tables from Delta table definitions in the AWS Glue Data Catalog.

Parquet files (data files for Delta tables) that use any of the following features or data types aren’t supported:

Field IDs.

The INTERVAL data type.

The DECIMAL data type with precision higher than 38.

LIST or MAP types with one-level or two-level representation.

Unsigned integer types (INT(signed = false)).

The FLOAT16 data type.

You can use the Parquet physical type int96 for TIMESTAMP, but Snowflake doesn’t support int96 for TIMESTAMP_NTZ.

For more information about Delta data types and Iceberg tables, see Delta data types.

Snowflake processes a maximum of 1000 Delta commit files each time you refresh a table using CREATE/ALTER … REFRESH. If your table has over 1000 commit files, you can do additional manual refreshes. Each time, the refresh process continues from where the last one stopped.

Note

Snowflake uses Delta checkpoint files when creating an Iceberg table. The 1,000 commit file limit only applies to commits after the latest checkpoint.

When you refresh an existing table, Snowflake processes Delta commit files, but not checkpoint files. If table maintenance removes stale log and data files for the source Delta table, you should refresh Delta-based Iceberg tables in Snowflake more frequently than the retention period of Delta logs and data files.

The following Delta Lake features aren’t currently supported: Row tracking, deletion vector files, change data files, change metadata, DataChange, CDC, protocol evolution.

Automated refresh

For catalog integrations created before Snowflake version 8.22 (or 9.2 for Delta-based tables), you must manually set the REFRESH_INTERVAL_SECONDS parameter before you enable automated refresh on tables that depend on that catalog integration. For instructions, see ALTER CATALOG INTEGRATION … SET AUTO_REFRESH.

If your table is empty, insert data into the table and then perform a manual refresh in Snowflake before you enable automated refresh to avoid undefined behavior.

For catalog integrations for object storage, automated refresh is only supported for integrations with TABLE_FORMAT = DELTA.

For tables with frequent updates, using a shorter polling interval (REFRESH_INTERVAL_SECONDS) can cause performance degradation.

Catalog-linked databases and automatic table discovery

Supported only when you use a catalog integration for Iceberg REST (for example, Snowflake Open Catalog).

Billing: During the preview period, Snowflake doesn’t bill for catalog-linked databases. Billing starts when Apache Iceberg™ catalog-linked databases become generally available.

To limit automatic table discovery to a specific set of namespaces, use the ALLOWED_NAMESPACES parameter. You can also use the BLOCKED_NAMESPACES parameter to block a set of namespaces.

Snowflake doesn’t sync remote catalog access control (users or roles).

You can create schemas or externally managed Iceberg tables in a catalog-linked database. Creating other Snowflake objects isn’t currently supported.

Latency:

For databases linked to 7,500 namespaces in a remote catalog, namespace and table discovery takes about one hour.

For remote catalogs with 500,000 tables, the automated refresh process takes about one hour to complete. For namespaces with different latency requirements, we recommend that you create separate catalog-linked databases. Each database should reference a catalog integration with an appropriate auto-refresh interval (REFRESH_INTERVAL_SECONDS).
For Iceberg tables in a catalog-linked database:
Snowflake doesn’t copy remote catalog table properties (such as retention policies or buffers), and doesn’t currently support altering table properties.

Automated refresh is enabled by default. If the table-uuid of an external table and the catalog-linked database table don’t match, refresh fails and Snowflake drops the table from the catalog-linked database; Snowflake doesn’t change the remote table.

If you drop a table from the remote catalog, Snowflake drops the table from the catalog-linked database. This action is asynchronous, so you might not see the change in the remote catalog right away.

If you rename a table in the remote catalog, Snowflake drops the existing table from the catalog-linked database and creates a table with the new name.

Masking policies and tags are supported. Other Snowflake-specific features including replication, cloning, and sharing aren’t supported.

The character that you choose for the NAMESPACE_FLATTEN_DELIMITER parameter can’t appear in your remote namespaces. During the autodiscovery process, Snowflake skips any namespace that contains the delimiter, and does not create a corresponding schema in your catalog-linked database.

If you specify anything other than _, $, uppercase letters, or numbers for the NAMESPACE_FLATTEN_DELIMITER parameter, you must put the schema name in quotes when you query the table.
For databases linked to AWS Glue, you must use lowercase letters and surround the schema, table, and column names in double quotes. This is also required for other Iceberg REST catalogs that only support lowercase identifiers.

The following example shows a valid query:
CREATE SCHEMA “s1”;
Copy
The following statements aren’t valid, because they use uppercase letters or omit the double quotes:
CREATE SCHEMA s1;
CREATE SCHEMA "Schema1";
Copy
Using UNDROP ICEBERG TABLE isn’t supported.

Sharing with a listing is supported for tables that use an external volume. Direct sharing isn’t currently supported.
For writing to tables in a catalog-linked database:

Vended credentials aren’t supported.

Writing to tables in nested namespaces isn’t currently supported.

Externally managed write support

Snowflake provides Data Definition Language (DDL) and Data Manipulation Language (DML) commands for externally managed tables. However, you configure metadata and data retention using your external catalog and the tools provided by your external storage provider. For more information, see Tables that use an external catalog.

For writes, Snowflake ensures that changes are committed to your remote catalog before updating the table in Snowflake.

If you use a catalog-linked database, you can use the CREATE ICEBERG TABLE syntax with column definitions to create a table in Snowflake and in your remote catalog. If you use a standard Snowflake database (not linked to a catalog), you must first create a table in your remote catalog. After that, you can use the CREATE ICEBERG TABLE (Iceberg REST catalog) syntax to create an Iceberg table in Snowflake and write to it.

For the AWS Glue Data Catalog: Dropping an externally managed table through Snowflake does not delete the underlying table files. This behavior is specific to the AWS Glue Data Catalog implementation.
If you participated in private preview for this feature, position row-level deletes might be enabled by default in your account. To check, run the following command:
SHOW PARAMETERS LIKE 'ENABLE_ICEBERG_MERGE_ON_READ' IN ACCOUNT;
Copy
If the parameter is visible in your account (regardless of its value), position deletes are enabled. To turn off position deletes, set the ENABLE_ICEBERG_MERGE_ON_READ parameter to FALSE at the table, schema, or database level.
Writing to externally managed tables with the following Iceberg data types isn’t supported:

uuid

fixed(L)

The following features aren’t currently supported when you use Snowflake to write to externally managed Iceberg tables:

Catalog-vended credentials.

Server-side encryption (SSE) for GCS or Azure external volumes.

Multi-statement transactions. Snowflake supports autocommit transactions only.

Conversion to Snowflake-managed tables.

External Iceberg catalogs that don’t conform to the Iceberg REST protocol.

Row-level deletes (merge-on-read).

Using the OR REPLACE option when creating a table.

Using the CREATE TABLE … AS SELECT syntax if your use AWS Glue as your remote catalog.

For creating schemas in a catalog-linked database, be aware of the following:

The CREATE SCHEMA command creates a corresponding namespace in your remote catalog only when you use a catalog-linked database.

The ALTER and CLONE options aren’t supported.

Delimiters aren’t supported for schema names. Only alphanumeric schema names are supported.

Access by third-party clients to Iceberg data, metadata

Third-party clients can’t append to, delete from, or upsert data to Iceberg tables that use Snowflake as the catalog.

Unsupported features

The following Snowflake features aren’t currently supported for all Iceberg tables:

Collation

Fail-safe

Hybrid tables

Listings that enable Cross-Cloud Auto-Fulfillment.

Query Acceleration Service

Replication of Iceberg tables, external volumes, or catalog integrations

Snowflake encryption

Snowflake Native App Framework

Snowflake schema evolution

Tagging using the ASSOCIATE_SEMANTIC_CATEGORY_TAGS stored procedure

Temporary and transient tables

The following features aren’t supported for externally managed Iceberg tables:

Cloning

Clustering

Standard and append-only streams. Insert-only streams are supported.