Configure storage for Iceberg tables

Snowflake tables typically use storage that Snowflake manages. In contrast, Iceberg tables in Snowflake use external storage that you configure and maintain.

This topic provides conceptual information and best practices to help you configure an external volume and manage storage for Iceberg tables.

Configure an external volume

You can create and configure an external volume to use with one or more Iceberg tables. You must create an external volume before you can create an Iceberg table.

For specific instructions, see the following topics:

Grant Snowflake access to your storage

To grant Snowflake access to your storage locations for Iceberg tables, you use the identity and access management service for your cloud provider. You grant an identity, or principal, limited access to your storage without exchanging secrets. This is the same access model that Snowflake uses for other integrations, including storage integrations.

Snowflake provisions a principal for your entire Snowflake account when you create an external volume. The principal is as follows, depending on your cloud provider:

Cloud provider

Snowflake-provisioned principal

Amazon Web Services (AWS)

IAM user

Google Cloud Platform (GCP)

Service account

Azure

Service principal

Snowflake authenticates directly with your storage provider, and the Snowflake-provisioned principal assumes a role that you specify. The role must have permission to perform operations on your storage location. For example, Snowflake can read from a storage location only if the role has permission to read from that storage location.

Snowflake requires permission to perform the following actions on Iceberg tables:

Snowflake-managed tables

Tables that use an external Iceberg catalog

Amazon S3

  • s3:GetBucketLocation

  • s3:GetObject

  • s3:ListBucket

  • s3:PutObject

  • s3:DeleteObject

  • s3:GetObjectVersion

  • s3:DeleteObjectVersion

  • s3:GetBucketLocation

  • s3:GetObject

  • s3:ListBucket

  • s3:GetObjectVersion

Google Cloud Storage

  • storage.objects.create

  • storage.objects.delete

  • storage.objects.get

  • storage.objects.list

  • storage.buckets.get

  • storage.objects.get

  • storage.objects.list

Azure Storage

All allowed actions for the Storage Blob Data Contributor role

All allowed actions for the Storage Blob Data Reader role

Note

The s3:PutObject permission grants write access to the external volume location. To completely configure write access, you must set the ALLOW_WRITES parameter of the external volume to TRUE (the default value).

For full instructions on granting Snowflake access to your storage for Iceberg tables, see the following topics:

Assign an active storage location

Each external volume supports a single active storage location. If you specify multiple storage locations in a CREATE EXTERNAL VOLUME statement, Snowflake assigns one location as the active location. The active location remains the same for the lifetime of the external volume.

Assignment occurs the first time you use the external volume in a CREATE ICEBERG TABLE statement. Snowflake uses the following logic to choose an active location:

  • If the STORAGE_LOCATIONS list contains one or more local storage locations, Snowflake uses the first local storage location in the list. A local storage location is with the same cloud provider and in the same region as your Snowflake account.

  • If the STORAGE_LOCATIONS list does not contain any local storage locations, Snowflake selects the first location in the list.

Note

  • Cross-cloud/cross-region Iceberg tables are supported only when you use an external Iceberg catalog. For more information, see Cross-cloud/cross-region support.

  • External volumes that were created before Snowflake version 7.44 might have used a different logic to select an active location.

Verify storage access

For Snowflake-managed tables, Snowflake verifies access to the active storage location on your external volume. The ALLOW_WRITES property of the external volume must be set to TRUE.

Verification occurs in the following situations:

  • The first time you specify that external volume in a CREATE ICEBERG TABLE statement for a Snowflake-managed table.

  • The first time you convert a table to use Snowflake as the Iceberg catalog.

Snowflake tries the following storage operations to verify the storage location.

  1. Writing a test file.

  2. Reading the file.

  3. Listing the contents of the file’s path.

  4. Deleting the file.

If any of the storage operations fails, the CREATE ICEBERG TABLE (or ALTER ICEBERG TABLE … CONVERT TO MANAGED) statement fails and you receive an error message.

Manage files in storage

This section explains how management of Iceberg table files in storage works, according to the type of Iceberg table.

Snowflake-managed tables

Important

  • Don’t allow other tools access to delete or overwrite objects that are associated with Snowflake-managed Iceberg tables.

  • Ensure that the Snowflake principal maintains access to your table storage. For more information, see Grant Snowflake access to your storage.

Though you configure and manage storage locations for Iceberg tables, Snowflake exclusively operates on the objects in your storage (data and metadata files) that belong to Snowflake-managed tables. Snowflake runs periodic maintenance on these table objects to optimize query performance and clean up deleted data.

Queries might fail if other tools delete or overwrite Snowflake-managed table objects. Similarly, queries on the table and Snowflake’s table maintenance operations will fail if you revoke the Snowflake principal’s access to your storage.

Snowflake deletes objects after the table retention period expires when Snowflake-managed table data is deleted or the table is dropped.

Metadata and retention

For Snowflake-managed tables, Snowflake generates metadata on a periodic basis and writes the metadata to files on your external volume.

Snowflake uses the value of the DATA_RETENTION_TIME_IN_DAYS parameter to determine the following:

  • When to expire old table snapshots to reduce the size of table metadata.

  • How long to retain table metadata to support Time Travel and undropping the table. When the retention period expires, Snowflake deletes any table metadata and snapshots that it has written for that table from your external volume location.

    Warning

    Snowflake does not support Fail-safe for Snowflake-managed Iceberg tables, because the table data is in external cloud storage that you manage. To protect Iceberg table data, you need to configure data protection and recovery with your cloud provider.

Tables that use an external catalog

Snowflake does not attempt to write or delete storage objects for externally managed Iceberg tables, or on external volumes with the ALLOW_WRITES property set to FALSE.

Metadata and retention

For tables that use an external catalog, Snowflake uses the value of the DATA_RETENTION_TIME_IN_DAYS parameter to set a retention period for Snowflake Time Travel and undropping the table. When the retention period expires, Snowflake does not delete the table’s Iceberg metadata or snapshots from your external cloud storage.

Snowflake sets DATA_RETENTION_TIME_IN_DAYS at the table level to the smaller of the following values:

  • The history.expire.max-snapshot-age-ms value in the current metadata file. Snowflake converts the value to days (rounding down).

  • The following value, depending on your Snowflake account edition:

    • Standard Edition: 1 day.

    • Enterprise Edition or higher: 5 days.

You can’t manually change the value of DATA_RETENTION_TIME_IN_DAYS in Snowflake. To change the value, you must update history.expire.max-snapshot-age-ms in your metadata file and then refresh the table.

Configure Time Travel

With Time Travel, you can query historical data within a table’s data retention period. Snowflake deletes objects after the table retention period expires when Snowflake-managed table data is deleted or the table is dropped. This might incur costs with your cloud storage provider for longer than the table’s lifetime.

You can configure Time Travel for Iceberg tables by using the DATA_RETENTION_TIME_IN_DAYS object parameter.

Snowflake uses the DATA_RETENTION_TIME_IN_DAYS parameter to handle metadata in different ways, depending on the type of Iceberg table. For more information, see Manage files in storage.

Note

Specifying the default minimum number of snapshots with the history.expire.min-snapshots-to-keep table property is not supported for any type of Iceberg table.

Enable storage access logs

To diagnose issues and audit access to the storage locations associated with an external volume, you can enable storage logging. Storage logs help you identify the cause of missing or corrupted files.

Enable logging with your storage provider. Because you own and manage storage for Iceberg tables, Snowflake can’t enable logging or auditing on your Iceberg storage locations.

To learn about storage access logs for your storage provider, see the following external topics:

Protect files with versioning and object retention

If your Iceberg table data is in a central data repository (or data lake) that is operated on by multiple tools and services, accidental deletion or corruption might occur. To protect Iceberg table data and ensure retrieval of accidentally deleted or overwritten data, use storage lifecycle management and versioning offered by your storage provider.

With lifecycle management, you can set retention and tracking rules for storage objects. To learn about lifecycle management for your storage provider, see the following external topics:

To support object recovery, you can also enable versioning for your external cloud storage.

Encrypt Iceberg table files

Snowflake can read Iceberg table files in storage that you encrypt using common server-side encryption (SSE) schemes. You should use your cloud service provider to manage encryption keys, and grant the Snowflake principal access to your keys if you use a customer-managed key.

For Amazon S3, Snowflake supports the following SSE options:

SSE option

Configuration

SSE with Amazon S3 managed keys (SSE-S3)

Specify ENCRYPTION = ( TYPE = 'AWS_SSE_S3' ) in the CREATE EXTERNAL VOLUME command.

SSE with AWS KMS keys (SSE-KMS)

Specify ENCRYPTION = ( TYPE = 'AWS_SSE_KMS' KMS_KEY_ID='my_key' ) in the CREATE EXTERNAL VOLUME command.

You must also grant privileges required for SSE-KMS encryption. For instructions, see Step 3 in Configure an external volume for Amazon S3.

For Google Cloud Storage, Snowflake supports the following SSE option:

SSE option

Configuration

SSE using keys stored in Google Cloud KMS

Specify ENCRYPTION = ( TYPE = 'GCS_SSE_KMS' KMS_KEY_ID = 'my_key' ) in the CREATE EXTERNAL VOLUME command.

You must also Grant the GCS service account permissions on the Google Cloud Key Management Service keys.

Set an external volume at the account, database, or schema level

To define which existing external volume to use for Iceberg tables, you can set the EXTERNAL_VOLUME parameter at the following levels:

Account:

Account administrators can use the ALTER ACCOUNT command to set the parameter for the account. If the value is set for the account, all Iceberg tables created in the account read from and write to this external volume by default.

Object:

Users can execute the appropriate CREATE <object> or ALTER <object> command to override the EXTERNAL_VOLUME parameter value at the database or schema level. The lowest-scoped declaration is used: schema > database > account.

In addition to the minimum privileges required to modify an object using the appropriate ALTER <object_type> command, a role must have the USAGE privilege on the external volume.

Note

Changes to the EXTERNAL_VOLUME parameter only apply to tables created after the change. Existing tables continue to use the external volume specified when they were created.

Example

The following statement sets an external volume (my_s3_vol) for a database named my_database_1:

ALTER DATABASE my_database_1
  SET EXTERNAL_VOLUME = 'my_s3_vol';
Copy

After setting an external volume at the database level, you can create an Iceberg table in that database without specifying an external volume. The following statement creates an Iceberg table in my_database_1 that uses Snowflake as the catalog and uses the default external volume (my_s3_vol) set for the database.

CREATE ICEBERG TABLE iceberg_reviews_table (
  id STRING,
  product_name STRING,
  product_id STRING,
  reviewer_name STRING,
  review_date DATE,
  review STRING
)
CATALOG = 'SNOWFLAKE'
BASE_LOCATION = 'my/product_reviews/';
Copy