Metadata and retention for Apache Iceberg™ tables¶
Snowflake handles metadata for Apache Iceberg™ tables according to the type of catalog you use (Snowflake or external).
Note
Specifying the default minimum number of snapshots with the history.expire.min-snapshots-to-keep
table property is not supported
for any type of Iceberg table.
Tables that use Snowflake as the catalog¶
Snowflake manages the metadata life cycle for this table type, and deletes old metadata, manifest lists, and manifest files based on the retention period for the table data and snapshots.
To set the retention period for table data and snapshots, set the DATA_RETENTION_TIME_IN_DAYS parameter at the account, database, schema, or table level.
Creation¶
Snowflake generates metadata for version 2 of the Apache Iceberg specification on a periodic basis, and writes the metadata to files on your external volume. Each new metadata file contains all DML or DDL changes since the last Snowflake-generated metadata file was created.
You can also create metadata on demand by using the SYSTEM$GET_ICEBERG_TABLE_INFORMATION function. For instructions, see Generate snapshots of DML changes.
For information about locating metadata files, see Data and metadata directories.
Viewing metadata creation history¶
To access a full history of metadata generation attempts, view the query history for your account and filter the results. Search for the SYSTEM$GET_ICEBERG_TABLE_INFORMATION function name in the SQL text.
Snowflake internally uses the same SYSTEM$GET_ICEBERG_TABLE_INFORMATION function to generate table metadata. Attempts made by Snowflake
appear under the user called SYSTEM in the query history. The STATUS column in the query history
indicates whether metadata was successfully generated.
For viewing options, see Monitor query activity with Query History.
Deletion¶
Snowflake deletes Iceberg metadata from your external cloud storage when the following events occur:
After you drop a table.
When the Iceberg metadata refers to snapshots or table data that has expired.
Deletion doesn’t occur immediately after the data retention period expires. As a result, metadata storage might incur costs with your cloud storage provider for longer than a table’s lifetime.
Warning
Snowflake does not support Fail-safe for Snowflake-managed Iceberg tables, because the table data is in external cloud storage that you manage. To protect Iceberg table data, you need to configure data protection and recovery with your cloud provider.
After dropping a table¶
When you drop a table, you can use the UNDROP ICEBERG TABLE command to restore it within the data retention period.
When the retention period expires, Snowflake deletes table metadata and snapshots that it has written from your external volume location. Deletion occurs asynchronously and can take a few days to complete after the retention period has passed.
Note
For converted tables, Snowflake deletes only metadata that was generated after table conversion.
After snapshots expire¶
Snowflake deletes Iceberg metadata files related to expired snapshots after the data retention period passes. Deletion usually occurs 7-14 days after a snapshot expires.
Only previous table snapshots can expire. Snowflake never deletes metadata files that represent the latest (current) state of a table from your external cloud storage.
Tables that use an external catalog¶
For tables that use an external catalog, Snowflake uses the value of the DATA_RETENTION_TIME_IN_DAYS parameter to set a retention period for Snowflake Time Travel and undropping the table. When the retention period expires, Snowflake does not delete the Iceberg metadata or snapshots from your external cloud storage.
Snowflake sets DATA_RETENTION_TIME_IN_DAYS at the table level to the smaller of the following values:
The
history.expire.max-snapshot-age-msvalue in the current metadata file. Snowflake converts the value to days (rounding down).The following value, depending on your Snowflake account edition:
Standard Edition: 1 day.
Enterprise Edition or higher: 5 days.
You can’t manually change the value of DATA_RETENTION_TIME_IN_DAYS in Snowflake. To change the value, you must update
history.expire.max-snapshot-age-ms in your metadata file and then refresh the table.
You can use the following table functions to retrieve information about the files registered to an externally managed Iceberg table or the most recent snapshot refresh history:
Delta-based tables¶
Note
If you want to use metadata writes for Delta-based Iceberg tables, the 2025_01 behavior change bundle must not be disabled in your account.
For Iceberg tables created from Delta table files, Snowflake automatically writes Iceberg metadata to your external storage if you configure your external volume with write access (see ALLOW_WRITES). For more information about the write location, see Data and metadata directories.
To prevent Snowflake from writing Iceberg metadata, you can set the ALLOW_WRITES parameter to FALSE on your external volume as long as no Snowflake-managed Iceberg tables use the same external volume.
Iceberg partitioning¶
This section describes Iceberg partitioning.
Snowflake supports the following partitioning use cases:
Reading from and writing to partitioned Iceberg tables.
Creating partitioned Iceberg tables that are Snowflake-managed or externally managed in a catalog-linked database or externally managed by an Iceberg REST catalog.
When you create a partitioned Iceberg table, you can enable hidden partitioning or partitioning with hierarchical paths, which is also called “Hive-style” partitioning.
Partitioning with hierarchical paths¶
With this option, Snowflake writes data to partitioned Iceberg tables by using a hierarchical path layout for Parquet data files. Partitioning information is included in the file paths and the values are based on transforms that you define when you create a table. This layout is also called “Hive-style” partitioning. You might use this option for interoperability between Snowflake and external engines that support partitioned writes with hierarchical paths.
Here’s an example of a data file stored under a hierarchical path:
s3://my-bucket/iceberg/db_sales/orders/data/country=US/year=2025/month=02/day=21/part-00023.parquet
For more information on the layout of the data and metadata directories for tables that use hierarchical paths, see File management.
Create a table with hierarchical paths¶
To create a partitioned Iceberg table with a hierarchical path layout, set the following properties in your regular CREATE ICEBERG TABLE statement:
Set PATH_LAYOUT = HIERARCHICAL.
Include the PARTITION BY clause with one or more partition transforms.
For an example of creating a partitioned Iceberg table with a hierarchical path layout in a catalog-linked database, see Create an Iceberg table in a catalog-linked database with hierarchical path layout.
Partitioning support matrix¶
The following table shows which features and actions are supported for each type of partitioned Iceberg table, and indicates compliance with version 2 of the Apache Iceberg specification. The table shows support for both hidden partitioning and partitioning with hierarchical paths.
Note
Version 3 of the Apache Iceberg specification isn’t supported when using partitioning.
CLD stands for catalog-linked database.
Snowflake managed |
Externally managed (CLD) |
Externally managed (non-CLD) |
Iceberg spec V2 compatibility |
Comment |
|
|---|---|---|---|---|---|
COPY commands with the ON_ERROR = ABORT_STATEMENT option |
❌ |
❌ |
❌ |
❌ |
|
COPY INTO <table> |
Limited support |
Limited support |
Limited support |
Limited support |
See Usage notes. |
CREATE ICEBERG TABLE … AS SELECT (CTAS) |
✔ |
✔ |
✔ |
✔ |
|
Cloning |
✔ |
✔ |
✔ |
✔ |
See usage notes: |
CREATE ICEBERG TABLE … LIKE |
✔ |
✔ |
✔ |
✔ |
See usage notes: |
Deletion vectors |
❌ |
❌ |
❌ |
N/A |
|
Clustering |
❌ |
❌ |
❌ |
❌ |
|
Partition evolution |
❌ |
Limited support |
Limited support |
Limited support |
We support partition evolution if it is done with an external engine. |
Partition transforms |
✔ |
✔ |
✔ |
✔ |
For the supported partition transforms, see: |
Positional deletes |
✔ |
✔ |
✔ |
✔ |
|
Snowpipe |
Limited support |
Limited support |
Limited support |
Limited support |
|
Snowpipe Streaming |
❌ |
❌ |
❌ |
❌ |
|
Sorting within partitions |
❌ |
❌ |
❌ |
❌ |
|
TARGET_FILE_SIZE |
✔ |
✔ |
✔ |
✔ |
Partitioning considerations¶
Consider the following before you use partitioned writes for Iceberg tables:
If you use an external engine to add, drop, or replace a partition field in an externally managed table, Snowflake writes data according to the latest partition specification.
The GET_DDL function doesn’t include the PARTITION BY clause in its output.
The sum of the sizes of the outputs for all partition transforms can’t exceed 1024 bytes for a single row.
Because partition evolution isn’t supported for Snowflake-managed tables, you must drop the table and create a new one with partitioning.
The DAY(), MONTH(), YEAR() partition transform parameters, which you specify within the PARTITION BY clause under table properties, are part of the Iceberg specification. For multiple days, months, or years, the partition expression parameter returns a partition for each calendar day, month, or year. For example, when the DAY() transform is used on a timestamp column that has 2 months of data, 61 partitions are created.
In contrast, the DAY(), MONTH(), YEAR() functions in Snowflake are part of the SQL standard. For multiple days, months, or years, these functions extract the corresponding day, month, or year part from a date or timestamp. For example, when the DAY() function is used on a timestamp column that has multiple months of data, this function returns a day of the month ranging from 1 to 31.
You can’t use the ALTER ICEBERG TABLE command to modify the PATH_LAYOUT property for an existing table.
For partitioning with hierarchical paths:
For
floatvalues, Snowflake and external engines might behave differently.Snowflake can’t guarantee that the paths Snowflake writes will match the paths that external query engines write.
Snowflake can’t guarantee this because when a query engine writes a hierarchical path, the query engine must serialize values into a string and insert the resulting value into the path. The Apache Iceberg table specification doesn’t define a standard serialization method, so different engines might implement different methods.
For example, Snowflake doesn’t encode the
~character but Apache Spark™ encodes this character as%7E.Snowflake always writes the hierarchical paths directly under the
/datadirectory in your external cloud storage.
Time travel¶
With Snowflake Time Travel, you can use Snowflake to query historical data for a table.
You can also use a third-party compute engine to perform time travel queries on Snowflake-managed tables when you Sync a Snowflake-managed table with Snowflake Open Catalog or use the Snowflake Catalog SDK.
You can query any snapshots that were committed within the data retention period. To specify the data retention period, set the DATA_RETENTION_TIME_IN_DAYS object parameter.
When you delete table data or drop a table, Snowflake deletes objects after the table retention period expires. This might incur costs with your cloud storage provider for longer than the table’s lifetime.