Iceberg tables¶
An Iceberg table uses the Apache Iceberg open table format specification, which provides an abstraction layer on data files stored in open formats and supports features such as:
ACID (atomicity, consistency, isolation, durability) transactions
Schema evolution
Hidden partitioning
Table snapshots
Iceberg tables for Snowflake combine the performance and query semantics of regular Snowflake tables with external cloud storage that you manage. They are ideal for existing data lakes that you cannot, or choose not to, store in Snowflake.
Snowflake supports Iceberg tables that use the Apache Parquet file format.
Getting started¶
To create an Iceberg table, start by configuring an external volume. For an introduction to using Iceberg tables in Snowflake, see Quickstart: Getting Started with Iceberg Tables.
How Iceberg tables work¶
This section provides information specific to working with Iceberg tables in Snowflake. To learn more about the Iceberg table format specification, see the official Apache Iceberg documentation and the Iceberg Table Spec.
Data storage¶
Iceberg tables store their data and metadata files in an external cloud storage location (Amazon S3, Google Cloud Storage, or Azure Storage). The external storage is not part of Snowflake. You are responsible for all management of the external cloud storage location, including the configuration of data protection and recovery. Snowflake does not provide Fail-safe storage for Iceberg tables.
Snowflake connects to your storage location using an external volume.
Iceberg tables incur no Snowflake storage costs. For more information, see Billing.
External volume¶
An external volume is a named, account-level Snowflake object that stores an identity and access management (IAM) entity for your external cloud storage. Snowflake securely connects to your cloud storage with an external volume to access table data, Iceberg metadata, and manifest files that store the table schema, partitions, and other metadata.
A single external volume can support one or more Iceberg tables.
To set up an external volume for Iceberg tables, see Configure an external volume for Iceberg tables.
Iceberg catalog¶
An Iceberg catalog enables a compute engine to manage and load Iceberg tables. The catalog forms the first architectural layer in the Iceberg table specification and must support:
Storing the current metadata pointer for one or more Iceberg tables. A metadata pointer maps a table name to the location of that table’s current metadata file.
Performing atomic operations so that you can update the current metadata pointer for a table.
To learn more about Iceberg catalogs, see the Apache Iceberg documentation.
Snowflake supports different catalog options. For example, you can use Snowflake as the Iceberg catalog, or use a catalog integration to connect Snowflake to an external Iceberg catalog.
Catalog integration¶
A catalog integration is a named, account-level Snowflake object that defines the source of metadata and schema for an Iceberg table when you don’t use Snowflake as the Iceberg catalog.
A single catalog integration can support one or more Iceberg tables.
To set up a catalog integration for Iceberg tables, see Configure a catalog integration for Iceberg tables.
Metadata and snapshots¶
Iceberg uses a snapshot-based querying model, where data files are mapped using manifest and metadata files. A snapshot represents the state of a table at a point in time and is used to access the complete set of data files in the table.
Snowflake uses the DATA_RETENTION_TIME_IN_DAYS parameter to handle metadata in different ways, depending on the type of Iceberg table.
Note
Specifying the default minimum number of snapshots with the history.expire.min-snapshots-to-keep
table property is not supported
for any type of Iceberg table.
Tables that use Snowflake as the Iceberg catalog
For this table type, Snowflake generates metadata on a periodic basis and writes the metadata to the table’s Parquet files on your external volume.
Snowflake uses the value of DATA_RETENTION_TIME_IN_DAYS to determine the following:
When to expire old table snapshots to reduce the size of table metadata.
How long to retain table metadata to support Time Travel and undropping the table. When the retention period expires, Snowflake deletes any table metadata and snapshots that it has written for that table from your external volume location.
Note
Snowflake does not support Fail-safe for Iceberg tables, because the table data is in external cloud storage that you manage. To protect Iceberg table data, you should configure data protection and recovery with your cloud provider.
Tables that use a catalog integration
Snowflake uses the value of DATA_RETENTION_TIME_IN_DAYS to set a retention period for Snowflake Time Travel and undropping the table. When the retention period expires, Snowflake does not delete the table’s Iceberg metadata or snapshots from your external cloud storage.
Snowflake sets DATA_RETENTION_TIME_IN_DAYS at the table level to the smaller of the following values:
The
history.expire.max-snapshot-age-ms
value in the current metadata file. Snowflake converts the value to days (rounding down).The following value, depending on your Snowflake account edition:
Standard Edition: 1 day.
Enterprise Edition or higher: 5 days.
You can’t change the value of DATA_RETENTION_TIME_IN_DAYS manually. Instead, you must update
history.expire.max-snapshot-age-ms
in your metadata file and refresh the table.
Cross-cloud/cross-region support¶
Cross-cloud/cross-region support depends on the type of Iceberg table.
Table type |
Cross-cloud/cross-region support |
Notes |
---|---|---|
Tables that use a catalog integration |
✔ |
If the active storage location for your external volume is not with the same cloud provider or in the same region as your Snowflake account, the following limitations apply:
If your Snowflake account and external volume are in different regions, your external cloud storage account incurs egress costs when you query the table. |
Tables that use Snowflake as the catalog |
❌ |
Your external volume must use an active storage location with the same cloud provider (in the same region) that hosts your Snowflake account. If the active location is not in the same region, the CREATE ICEBERG TABLE statement returns a user error. |
Billing¶
Snowflake bills your account for virtual warehouse (compute) usage and cloud services when you work with Iceberg tables.
Snowflake does not bill your account for the following:
Iceberg table storage costs. Your cloud storage provider bills you directly for data storage usage.
Active bytes used by Iceberg tables. However, the TABLE_STORAGE_METRICS View displays ACTIVE_BYTES for Iceberg tables to help you track how much storage a table occupies.
Note
If your Snowflake account and external volume are in different regions, your external cloud storage account incurs egress costs when you query the table.
Iceberg catalog options¶
When you create an Iceberg table in Snowflake, you can use Snowflake as the Iceberg catalog or you can use a catalog integration.
The following table summarizes the differences between these catalog options.
Read access |
✔ |
✔ |
Write access |
✔ |
❌ For full platform support, you can convert the table to use Snowflake as the catalog. |
Data and metadata storage |
External volume (cloud storage) |
External volume (cloud storage) |
Full Snowflake platform support |
✔ |
❌ |
Works with the Snowflake Iceberg Catalog SDK |
✔ |
✔ |
Use Snowflake as the Iceberg catalog¶
An Iceberg table that uses Snowflake as the Iceberg catalog provides full Snowflake platform support with read and write access. The table data and metadata are stored in external cloud storage, which Snowflake accesses using an external volume. Snowflake handles all life-cycle maintenance, such as compaction, for the table.
Use an external catalog¶
An Iceberg table that uses an external catalog provides limited Snowflake platform support with read-only access. With this table type, Snowflake uses a catalog integration to retrieve information about your Iceberg metadata and schema.
You can use this option to create an Iceberg table registered in the AWS Glue Data Catalog or to create a table from Iceberg metadata files in object storage.
Snowflake does not assume any life-cycle management on the table.
The table data and metadata are stored in external cloud storage, which Snowflake accesses using an external volume.
The following diagram shows how an Iceberg table uses a catalog integration with an external Iceberg catalog.
Considerations and limitations¶
The following considerations and limitations apply to Iceberg tables, and are subject to change:
Iceberg
Versions 1 and 2 of the Apache Iceberg specification are supported, excluding the following features:
Row-level deletes (either position deletes or equality deletes).
Using the
history.expire.min-snapshots-to-keep
table property to specify the default minimum number of snapshots to keep. For more information, see Metadata and snapshots.Iceberg partitioning with the
bucket
transform function impacts performance for queries that use conditional clauses to filter results.Iceberg tables created from files in object storage aren’t supported if the following conditions are true:
The table contains a partition spec that defines an identity transform.
The source column of the partition spec does not exist in a Parquet file.
For Iceberg tables that aren’t managed by Snowflake, be aware of the following:
It’s important to align your Snowflake refresh schedule with table maintenance operations such as snapshot expiration or compaction. You should refresh the table each time you perform a maintenance operation.
Time travel to any snapshot generated after table creation is supported as long as you periodically refresh the table before the snapshot expires.
File formats
Support is limited to Apache Parquet files.
Parquet files that use the unsigned integer logical type are not supported.
External volumes
You can’t access the cloud storage locations in external volumes using a storage integration.
The trust relationship must be configured separately for each external volume that you create.
Metadata files
The metadata files do not identify the most recent snapshot of an Iceberg table.
You cannot modify the location of the data files or snapshot using the ALTER ICEBERG TABLE command. To modify either of these settings, you must recreate the table (using the CREATE OR REPLACE ICEBERG TABLE syntax).
Snowflake features
The following features and actions are currently not supported on Iceberg tables:
Iceberg tables don’t support table stages.
Creating a clone from an Iceberg table. In addition, clones of databases and schemas do not include Iceberg tables.
Automatically applying tags using the ASSOCIATE_SEMANTIC_CATEGORY_TAGS stored procedure.
Snowflake schema evolution. However, Iceberg tables that use Snowflake as the catalog support Iceberg schema evolution.
Note
Tables that were created prior to Snowflake version 7.42 don’t support Iceberg schema evolution.
Creating temporary or transient Iceberg tables.
Replicating Iceberg tables, external volumes, or catalog integrations.
Creating and working with Iceberg tables in SnowGov Regions.
Querying historical data is supported for Iceberg tables.
Clustering support depends on the type of Iceberg table.
Table type
Notes
Tables that use Snowflake as the Iceberg catalog
Set a clustering key by using either the CREATE ICEBERG TABLE or the ALTER ICEBERG TABLE command. To set or manage a clustering key, see CREATE ICEBERG TABLE (Snowflake as the Iceberg catalog) and ALTER ICEBERG TABLE.
Tables that use an external catalog
Clustering is not supported.
Converted tables
Snowflake only clusters files if they were created after converting the table, or if the files have since been modified using a DML statement.
Access by third-party clients to Iceberg data, metadata
Third-party clients cannot append to, delete from, or upsert data to Iceberg tables that use Snowflake as the catalog.