Metadata and retention for Apache Iceberg™ tables

Snowflake handles metadata for Apache Iceberg™ tables according to the type of catalog you use (Snowflake or external).

Note

Specifying the default minimum number of snapshots with the history.expire.min-snapshots-to-keep table property is not supported for any type of Iceberg table.

Tables that use Snowflake as the catalog

Snowflake manages the metadata life cycle for this table type, and deletes old metadata, manifest lists, and manifest files based on the retention period for the table data and snapshots.

To set the retention period for table data and snapshots, set the DATA_RETENTION_TIME_IN_DAYS parameter at the account, database, schema, or table level.

Creation

Snowflake generates metadata for version 2 of the Apache Iceberg specification on a periodic basis, and writes the metadata to files on your external volume. Each new metadata file contains all DML or DDL changes since the last Snowflake-generated metadata file was created.

You can also create metadata on demand by using the SYSTEM$GET_ICEBERG_TABLE_INFORMATION function. For instructions, see Generate snapshots of DML changes.

For information about locating metadata files, see Data and metadata directories.

Viewing metadata creation history

To access a full history of metadata generation attempts, view the query history for your account and filter the results. Search for the SYSTEM$GET_ICEBERG_TABLE_INFORMATION function name in the SQL text.

Snowflake internally uses the same SYSTEM$GET_ICEBERG_TABLE_INFORMATION function to generate table metadata. Attempts made by Snowflake appear under the user called SYSTEM in the query history. The STATUS column in the query history indicates whether metadata was successfully generated.

For viewing options, see Monitor query activity with Query History.

Deletion

Snowflake deletes Iceberg metadata from your external cloud storage when the following events occur:

  • After you drop a table.

  • When the Iceberg metadata refers to snapshots or table data that has expired.

Deletion doesn’t occur immediately after the data retention period expires. As a result, metadata storage might incur costs with your cloud storage provider for longer than a table’s lifetime.

Warning

Snowflake does not support Fail-safe for Snowflake-managed Iceberg tables, because the table data is in external cloud storage that you manage. To protect Iceberg table data, you need to configure data protection and recovery with your cloud provider.

After dropping a table

When you drop a table, you can use the UNDROP ICEBERG TABLE command to restore it within the data retention period.

When the retention period expires, Snowflake deletes table metadata and snapshots that it has written from your external volume location. Deletion occurs asynchronously and can take a few days to complete after the retention period has passed.

Note

For converted tables, Snowflake deletes only metadata that was generated after table conversion.

After snapshots expire

Snowflake deletes Iceberg metadata files related to expired snapshots after the data retention period passes. Deletion usually occurs 7-14 days after a snapshot expires.

Only previous table snapshots can expire. Snowflake never deletes metadata files that represent the latest (current) state of a table from your external cloud storage.

Tables that use an external catalog

For tables that use an external catalog, Snowflake uses the value of the DATA_RETENTION_TIME_IN_DAYS parameter to set a retention period for Snowflake Time Travel and undropping the table. When the retention period expires, Snowflake does not delete the Iceberg metadata or snapshots from your external cloud storage.

Snowflake sets DATA_RETENTION_TIME_IN_DAYS at the table level to the smaller of the following values:

  • The history.expire.max-snapshot-age-ms value in the current metadata file. Snowflake converts the value to days (rounding down).

  • The following value, depending on your Snowflake account edition:

    • Standard Edition: 1 day.

    • Enterprise Edition or higher: 5 days.

You can’t manually change the value of DATA_RETENTION_TIME_IN_DAYS in Snowflake. To change the value, you must update history.expire.max-snapshot-age-ms in your metadata file and then refresh the table.

You can use the following table functions to retrieve information about the files registered to an externally managed Iceberg table or the most recent snapshot refresh history:

Delta-based tables

Note

If you want to use metadata writes for Delta-based Iceberg tables, the 2025_01 behavior change bundle must not be disabled in your account.

For Iceberg tables created from Delta table files, Snowflake automatically writes Iceberg metadata to your external storage if you configure your external volume with write access (see ALLOW_WRITES). For more information about the write location, see Data and metadata directories.

To prevent Snowflake from writing Iceberg metadata, you can set the ALLOW_WRITES parameter to FALSE on your external volume as long as no Snowflake-managed Iceberg tables use the same external volume.

Iceberg partitioning

“Hidden” partitioning for Apache Iceberg™ is metadata-based and adaptable. Iceberg produces partition values based on transforms that you define when you create a table. When they read from a partitioned table, Iceberg engines use the partition values defined in your table metadata to efficiently identify relevant data.

Snowflake supports the following partitioning use cases:

  • Reading from and writing to partitioned Iceberg tables.

  • Creating partitioned Iceberg tables that are Snowflake-managed or externally managed in a catalog-linked database.

To create a partitioned Iceberg table, include the PARTITION BY clause with one or more partition transforms in your regular CREATE ICEBERG TABLE statement. For an example, see Create an Iceberg table in a catalog-linked database.

Partitioning considerations

Consider the following before you use partitioned writes for Iceberg tables:

  • If you use an external engine to add, drop, or replace a partition field in an externally managed table, Snowflake writes data according to the latest partition specification.

  • Snowflake uses a target file size of 16 MB for partitioned tables. If the default file size is larger, Snowflake automatically decreases the target size to 16 MB when you write to a partitioned table.

  • The GET_DDL function doesn’t include the PARTITION BY clause in its output.

  • Using COPY INTO <table> on partitioned tables is only supported for LOAD_MODE = FULL_INGEST, which is the default value.

  • The sum of the sizes of the outputs for all partition transforms can’t exceed 1024 bytes for a single row.

The following feature and actions aren’t supported for partitioned Iceberg tables:

  • Partition evolution for Snowflake-managed tables. You must drop the table and create a new one with partitioning.

  • Using PARTITION BY and CLUSTER BY together.

  • Snowpipe and Snowpipe Streaming.

  • COPY commands with the ON_ERROR = ABORT_STATEMENT option.

Time travel

With Snowflake Time Travel, you can use Snowflake to query historical data for a table.

You can also use a third-party compute engine to perform time travel queries on Snowflake-managed tables when you Sync a Snowflake-managed table with Snowflake Open Catalog or use the Snowflake Catalog SDK.

You can query any snapshots that were committed within the data retention period. To specify the data retention period, set the DATA_RETENTION_TIME_IN_DAYS object parameter.

When you delete table data or drop a table, Snowflake deletes objects after the table retention period expires. This might incur costs with your cloud storage provider for longer than the table’s lifetime.