Storage for Apache Iceberg™ tables

Snowflake tables typically use storage that Snowflake manages. In contrast, Apache Iceberg™ tables in Snowflake use external storage that you configure and maintain.

This topic provides conceptual information and best practices for Iceberg table storage.

Granting Snowflake access to your storage

To grant Snowflake access to your storage locations for Iceberg tables, you use the identity and access management service for your cloud provider. You grant an identity, or principal, limited access to your storage without exchanging secrets. This is the same access model that Snowflake uses for other integrations, including storage integrations.

Snowflake provisions a principal for your entire Snowflake account when you create an external volume. The principal is as follows, depending on your cloud provider:

Cloud provider

Snowflake-provisioned principal

Amazon Web Services (AWS)

IAM user

Google Cloud

Service account

Azure

Service principal

Snowflake authenticates directly with your storage provider, and the Snowflake-provisioned principal assumes a role that you specify. The role must have permission to perform operations on your storage location. For example, Snowflake can read from a storage location only if the role has permission to read from that storage location.

Snowflake requires permission to perform the following actions on Iceberg tables:

Snowflake-managed tables

Tables that use an external Iceberg catalog

Amazon S3

  • s3:GetBucketLocation

  • s3:GetObject

  • s3:ListBucket

  • s3:PutObject

  • s3:DeleteObject

  • s3:GetObjectVersion

  • s3:DeleteObjectVersion

  • s3:GetBucketLocation

  • s3:GetObject

  • s3:ListBucket

  • s3:GetObjectVersion

Google Cloud Storage

  • storage.objects.create

  • storage.objects.delete

  • storage.objects.get

  • storage.objects.list

  • storage.buckets.get

  • storage.objects.get

  • storage.objects.list

Azure Storage

All allowed actions for the Storage Blob Data Contributor role

All allowed actions for the Storage Blob Data Reader role

Note

The s3:PutObject permission grants write access to the external volume location. To completely configure write access, you must set the ALLOW_WRITES parameter of the external volume to TRUE (the default value).

Each external volume is associated with a particular Active storage location, and a single external volume can support multiple Iceberg tables. However, the number of external volumes you need depends on how you want to store, organize, and secure your table data.

You can use a single external volume if you want the data and metadata for all of your Snowflake-Iceberg tables in subdirectories under the same storage location (for example, in the same S3 bucket). To configure these directories for Snowflake-managed tables, see Data and metadata directories.

Alternatively, you can create multiple external volumes to secure various storage locations differently. For example, you might create the following external volumes:

  • A read-only external volume for externally managed Iceberg tables.

  • An external volume configured with read and write access for Snowflake-managed tables.

For full instructions on granting Snowflake access to your storage for Iceberg tables, see the following topics:

Active storage location

Each external volume supports a single active storage location. If you specify multiple storage locations in a CREATE EXTERNAL VOLUME statement, Snowflake assigns one location as the active location. The active location remains the same for the lifetime of the external volume.

Assignment occurs the first time you use the external volume in a CREATE ICEBERG TABLE statement. Snowflake uses the following logic to choose an active location:

  • If the STORAGE_LOCATIONS list contains one or more local storage locations, Snowflake uses the first local storage location in the list. A local storage location is one with the same cloud provider and in the same region as your Snowflake account.

  • If the STORAGE_LOCATIONS list does not contain any local storage locations, Snowflake selects the first location in the list.

Note

  • Cross-cloud/cross-region Iceberg tables are supported only when you use an external Iceberg catalog. For more information, see Cross-cloud/cross-region support.

  • External volumes that were created before Snowflake version 7.44 might have used different logic to select an active location.

Verifying storage access

To check that Snowflake can successfully authenticate to your storage provider, call the SYSTEM$VERIFY_EXTERNAL_VOLUME function.

SELECT SYSTEM$VERIFY_EXTERNAL_VOLUME('my_s3_external_volume');
Copy

For Snowflake-managed tables, Snowflake automatically verifies access to the active storage location on your external volume in the following situations:

  • The first time you specify that external volume in a CREATE ICEBERG TABLE statement for a Snowflake-managed table.

  • The first time you convert a table to use Snowflake as the Iceberg catalog.

The ALLOW_WRITES property of the external volume must be set to TRUE.

Snowflake tries the following storage operations to verify the storage location.

  1. Writing a test file.

  2. Reading the file.

  3. Listing the contents of the file’s path.

  4. Deleting the file.

If any one of the operations fails, the CREATE ICEBERG TABLE (or ALTER ICEBERG TABLE … CONVERT TO MANAGED) statement fails and you receive an error message.

File management

This section explains how management of Iceberg table files in storage works, according to the type of Iceberg table.

Snowflake-managed tables

Important

  • Don’t allow other tools access to delete or overwrite objects that are associated with Snowflake-managed Iceberg tables.

  • Ensure that the Snowflake principal maintains access to your table storage. For more information, see Granting Snowflake access to your storage.

Though you configure and manage storage locations for Iceberg tables, Snowflake exclusively operates on the objects in your storage (data and metadata files) that belong to Snowflake-managed tables. Snowflake runs periodic maintenance on these table objects to optimize query performance and clean up deleted data.

Queries might fail if other tools delete or overwrite Snowflake-managed table objects. Similarly, queries on the table and Snowflake’s table maintenance operations will fail if you revoke the Snowflake principal’s access to your storage.

Snowflake deletes objects after the table retention period expires when Snowflake-managed table data is deleted or the table is dropped.

Data and metadata directories

For Snowflake-managed tables, Snowflake writes Parquet data files and table metadata to the following paths in your external cloud storage:

  • STORAGE_BASE_URL/BASE_LOCATION/data/

  • STORAGE_BASE_URL/BASE_LOCATION/metadata/

Where:

  • STORAGE_BASE_URL is the base URL for the active storage location associated with your external volume.

  • BASE_LOCATION is the name of a directory under a relative path from your external volume (specified in your CREATE ICEBERG TABLE statement).

Specifying an empty string for BASE_LOCATION

If you specify an empty string ('') for the BASE_LOCATION, Snowflake creates the data/ and metadata/ directories right under your STORAGE_BASE_URL.

For example: STORAGE_BASE_URL/data/

Using the same location for multiple tables

If you use the same storage location and specify the same base location for multiple tables, Snowflake writes the data for all of those tables to the same data/ directory. Similarly, Snowflake writes the metadata for all of those tables to the same metadata/ directory.

Organizing table storage with BASE_LOCATION

To organize files in storage for multiple Iceberg tables under the same STORAGE_BASE_URL, consider using the table name as the BASE_LOCATION in your CREATE ICEBERG TABLE statement. This way, Snowflake writes data and metadata to a directory with the same name as the table.

For example:

CREATE OR REPLACE ICEBERG TABLE iceberg_table_1 (
  col_1 int,
  col_2 string
)
  CATALOG = 'SNOWFLAKE'
  EXTERNAL_VOLUME = 'iceberg_external_volume'
  BASE_LOCATION = 'iceberg_table_1';

CREATE OR REPLACE ICEBERG TABLE iceberg_table_2 (
  col_1 int,
  col_2 string
)
  CATALOG = 'SNOWFLAKE'
  EXTERNAL_VOLUME = 'iceberg_external_volume'
  BASE_LOCATION = 'iceberg_table_2';
Copy

The statement results in the following directory structure in your external cloud storage:

STORAGE_BASE_URL
|-- iceberg_table_1
|   |-- data/
|   |-- metadata/
|-- iceberg_table_2
|   |-- data/
|   |-- metadata/
Copy

Tables that use an external catalog

Snowflake doesn’t write or delete storage objects for externally managed Iceberg tables or on external volumes with the ALLOW_WRITES property set to FALSE.

To access your table data and metadata, Snowflake assumes the access control role that you configure for your external volume. You grant the role permission to access a storage location (in a bucket or container). All of your table data and metadata files must be in that location. For example, if your storage location is an S3 bucket, all of your data and metadata files must exist somewhere in that bucket.

Additionally, converting a table does not rewrite any data or metadata files. Snowflake writes to an Iceberg table only after you convert a table to use Snowflake as the catalog.

Enabling storage access logs

To diagnose issues and audit access to the storage locations associated with an external volume, you can enable storage logging. Storage logs help you identify the cause of missing or corrupted files.

Enable logging with your storage provider. Because you own and manage storage for Iceberg tables, Snowflake can’t enable logging or auditing on your Iceberg storage locations.

To learn about storage access logs for your storage provider, see the following external topics:

Protecting files with versioning and object retention

If your Iceberg table data is in a central data repository (or data lake) that is operated on by multiple tools and services, accidental deletion or corruption might occur. To protect Iceberg table data and ensure retrieval of accidentally deleted or overwritten data, use storage lifecycle management and versioning offered by your storage provider.

With lifecycle management, you can set retention and tracking rules for storage objects. To learn about lifecycle management for your storage provider, see the following external topics:

To support object recovery, you can also enable versioning for your external cloud storage.

Encrypting table files

Snowflake can read Iceberg table files in storage that you encrypt using common server-side encryption (SSE) schemes. You should use your cloud service provider to manage encryption keys, and grant the Snowflake principal access to your keys if you use a customer-managed key.

For Amazon S3, Snowflake supports the following SSE options:

SSE option

Configuration

SSE with Amazon S3 managed keys (SSE-S3)

Specify ENCRYPTION = ( TYPE = 'AWS_SSE_S3' ) in the CREATE EXTERNAL VOLUME command.

SSE with AWS KMS keys (SSE-KMS)

Specify ENCRYPTION = ( TYPE = 'AWS_SSE_KMS' KMS_KEY_ID='my_key' ) in the CREATE EXTERNAL VOLUME command.

You must also grant privileges required for SSE-KMS encryption. For instructions, see Step 3 in Configure an external volume for Amazon S3.

For Google Cloud Storage, Snowflake supports the following SSE option:

SSE option

Configuration

SSE using keys stored in Google Cloud KMS

Specify ENCRYPTION = ( TYPE = 'GCS_SSE_KMS' KMS_KEY_ID = 'my_key' ) in the CREATE EXTERNAL VOLUME command.

You must also Grant the GCS service account permissions on the Google Cloud Key Management Service keys.