Metadata and retention for Apache Iceberg™ tables¶
Snowflake handles metadata for Apache Iceberg™ tables according to the type of catalog you use (Snowflake or external).
Note
Specifying the default minimum number of snapshots with the history.expire.min-snapshots-to-keep
table property is not supported
for any type of Iceberg table.
Tables that use Snowflake as the catalog¶
Snowflake manages the metadata life cycle for this table type, and deletes old metadata, manifest lists, and manifest files based on the retention period for the table data and snapshots.
To set the retention period for table data and snapshots, set the DATA_RETENTION_TIME_IN_DAYS parameter at the account, database, schema, or table level.
Creation¶
Snowflake generates metadata for version 2 of the Apache Iceberg specification on a periodic basis, and writes the metadata to files on your external volume. Each new metadata file contains all DML or DDL changes since the last Snowflake-generated metadata file was created.
You can also create metadata on demand by using the SYSTEM$GET_ICEBERG_TABLE_INFORMATION function. For instructions, see Generate snapshots of DML changes.
For information about locating metadata files, see Data and metadata directories.
Viewing metadata creation history¶
To access a full history of metadata generation attempts, view the query history for your account and filter the results. Search for the SYSTEM$GET_ICEBERG_TABLE_INFORMATION function name in the SQL text.
Snowflake internally uses the same SYSTEM$GET_ICEBERG_TABLE_INFORMATION function to generate table metadata. Attempts made by Snowflake
appear under the user called SYSTEM
in the query history. The STATUS
column in the query history
indicates whether metadata was successfully generated.
For viewing options, see Monitor query activity with Query History.
Deletion¶
Snowflake deletes Iceberg metadata from your external cloud storage when the following events occur:
After you drop a table.
When the Iceberg metadata refers to snapshots or table data that has expired.
Deletion doesn’t occur immediately after the data retention period expires. As a result, metadata storage might incur costs with your cloud storage provider for longer than a table’s lifetime.
Warning
Snowflake does not support Fail-safe for Snowflake-managed Iceberg tables, because the table data is in external cloud storage that you manage. To protect Iceberg table data, you need to configure data protection and recovery with your cloud provider.
After dropping a table¶
When you drop a table, you can use the UNDROP ICEBERG TABLE command to restore it within the data retention period.
When the retention period expires, Snowflake deletes table metadata and snapshots that it has written from your external volume location. Deletion occurs asynchronously and can take a few days to complete after the retention period has passed.
Note
For converted tables, Snowflake deletes only metadata that was generated after table conversion.
After snapshots expire¶
Snowflake deletes Iceberg metadata files related to expired snapshots after the data retention period passes. Deletion usually occurs 7-14 days after a snapshot expires.
Only previous table snapshots can expire. Snowflake never deletes metadata files that represent the latest (current) state of a table from your external cloud storage.
Tables that use an external catalog¶
For tables that use an external catalog, Snowflake uses the value of the DATA_RETENTION_TIME_IN_DAYS parameter to set a retention period for Snowflake Time Travel and undropping the table. When the retention period expires, Snowflake does not delete the Iceberg metadata or snapshots from your external cloud storage.
Snowflake sets DATA_RETENTION_TIME_IN_DAYS at the table level to the smaller of the following values:
The
history.expire.max-snapshot-age-ms
value in the current metadata file. Snowflake converts the value to days (rounding down).The following value, depending on your Snowflake account edition:
Standard Edition: 1 day.
Enterprise Edition or higher: 5 days.
You can’t manually change the value of DATA_RETENTION_TIME_IN_DAYS in Snowflake. To change the value, you must update
history.expire.max-snapshot-age-ms
in your metadata file and then refresh the table.
You can use the following table functions to retrieve information about the files registered to an externally managed Iceberg table or the most recent snapshot refresh history:
Time travel¶
With Snowflake Time Travel, you can use Snowflake to query historical data for a table.
You can also use a third-party compute engine to perform time travel queries on Snowflake-managed tables when you Sync a Snowflake-managed table with Snowflake Open Catalog or use the Snowflake Catalog SDK.
You can query any snapshots that were committed within the data retention period. To specify the data retention period, set the DATA_RETENTION_TIME_IN_DAYS object parameter.
When you delete table data or drop a table, Snowflake deletes objects after the table retention period expires. This might incur costs with your cloud storage provider for longer than the table’s lifetime.