Storage for Apache Iceberg™ tables¶
Snowflake tables typically use storage that Snowflake manages. In contrast, Apache Iceberg™ tables in Snowflake use external storage that you configure and maintain.
This topic provides conceptual information and best practices for Iceberg table storage.
Granting Snowflake access to your storage¶
To grant Snowflake access to your storage locations for Iceberg tables, you use the identity and access management service for your cloud provider. You grant an identity, or principal, limited access to your storage without exchanging secrets. This is the same access model that Snowflake uses for other integrations, including storage integrations.
Snowflake provisions a principal for your entire Snowflake account when you create an external volume. The principal is as follows, depending on your cloud provider:
Cloud provider |
Snowflake-provisioned principal |
---|---|
Amazon Web Services (AWS) |
|
Google Cloud |
|
Azure |
Snowflake authenticates directly with your storage provider, and the Snowflake-provisioned principal assumes a role that you specify. The role must have permission to perform operations on your storage location. For example, Snowflake can read from a storage location only if the role has permission to read from that storage location.
Snowflake requires permission to perform the following actions on Iceberg tables:
Snowflake-managed tables |
Tables that use an external Iceberg catalog |
|
---|---|---|
Amazon S3 |
|
|
Google Cloud Storage |
|
|
Azure Storage |
All allowed actions for the Storage Blob Data Contributor role |
All allowed actions for the Storage Blob Data Reader role |
Note
The s3:PutObject
permission grants write access to the external volume location.
To completely configure write access, you must set the ALLOW_WRITES
parameter of the external volume to TRUE
(the default value).
Each external volume is associated with a particular Active storage location, and a single external volume can support multiple Iceberg tables. However, the number of external volumes you need depends on how you want to store, organize, and secure your table data.
You can use a single external volume if you want the data and metadata for all of your Snowflake-Iceberg tables in subdirectories under the same storage location (for example, in the same S3 bucket). To configure these directories for Snowflake-managed tables, see Data and metadata directories.
Alternatively, you can create multiple external volumes to secure various storage locations differently. For example, you might create the following external volumes:
A read-only external volume for externally managed Iceberg tables.
An external volume configured with read and write access for Snowflake-managed tables.
For full instructions on granting Snowflake access to your storage for Iceberg tables, see the following topics:
Active storage location¶
Each external volume supports a single active storage location. If you specify multiple storage locations in a CREATE EXTERNAL VOLUME statement, Snowflake assigns one location as the active location. The active location remains the same for the lifetime of the external volume.
Assignment occurs the first time you use the external volume in a CREATE ICEBERG TABLE statement. Snowflake uses the following logic to choose an active location:
If the
STORAGE_LOCATIONS
list contains one or more local storage locations, Snowflake uses the first local storage location in the list. A local storage location is one with the same cloud provider and in the same region as your Snowflake account.If the
STORAGE_LOCATIONS
list does not contain any local storage locations, Snowflake selects the first location in the list.
Note
Cross-cloud/cross-region Iceberg tables are supported only when you use an external Iceberg catalog. For more information, see Cross-cloud/cross-region support.
External volumes that were created before Snowflake version 7.44 might have used different logic to select an active location.
Verifying storage access¶
For Snowflake-managed tables, Snowflake verifies access to the active storage location
on your external volume. The ALLOW_WRITES
property of the external volume must be set to TRUE
.
Verification occurs in the following situations:
The first time you specify that external volume in a CREATE ICEBERG TABLE statement for a Snowflake-managed table.
The first time you convert a table to use Snowflake as the Iceberg catalog.
Snowflake tries the following storage operations to verify the storage location.
Writing a test file.
Reading the file.
Listing the contents of the file’s path.
Deleting the file.
If any of the storage operations fails, the CREATE ICEBERG TABLE (or ALTER ICEBERG TABLE … CONVERT TO MANAGED) statement fails and you receive an error message.
File management¶
This section explains how management of Iceberg table files in storage works, according to the type of Iceberg table.
Snowflake-managed tables¶
Important
Don’t allow other tools access to delete or overwrite objects that are associated with Snowflake-managed Iceberg tables.
Ensure that the Snowflake principal maintains access to your table storage. For more information, see Granting Snowflake access to your storage.
Though you configure and manage storage locations for Iceberg tables, Snowflake exclusively operates on the objects in your storage (data and metadata files) that belong to Snowflake-managed tables. Snowflake runs periodic maintenance on these table objects to optimize query performance and clean up deleted data.
Queries might fail if other tools delete or overwrite Snowflake-managed table objects. Similarly, queries on the table and Snowflake’s table maintenance operations will fail if you revoke the Snowflake principal’s access to your storage.
Snowflake deletes objects after the table retention period expires when Snowflake-managed table data is deleted or the table is dropped.
Data and metadata directories¶
For Snowflake-managed tables, Snowflake writes Parquet data files and table metadata to the following paths in your external cloud storage:
STORAGE_BASE_URL/BASE_LOCATION/data/
STORAGE_BASE_URL/BASE_LOCATION/metadata/
Where:
STORAGE_BASE_URL
is the base URL for the active storage location associated with your external volume.BASE_LOCATION
is the name of a directory under a relative path from your external volume (specified in your CREATE ICEBERG TABLE statement).
Specifying an empty string for BASE_LOCATION
If you specify an empty string (''
) for the BASE_LOCATION
, Snowflake creates the data/
and metadata/
directories right under your STORAGE_BASE_URL
.
For example: STORAGE_BASE_URL/data/
Using the same location for multiple tables
If you use the same storage location and specify the same base location for multiple tables, Snowflake writes the
data for all of those tables to the same data/
directory.
Similarly, Snowflake writes the metadata for all of those tables to the same metadata/
directory.
Organizing table storage with BASE_LOCATION
To organize files in storage for multiple Iceberg tables under the same STORAGE_BASE_URL
,
consider using the table name as the BASE_LOCATION
in your CREATE ICEBERG TABLE statement. This way, Snowflake writes data and
metadata to a directory with the same name as the table.
For example:
CREATE OR REPLACE ICEBERG TABLE iceberg_table_1 (
col_1 int,
col_2 string
)
CATALOG = 'SNOWFLAKE'
EXTERNAL_VOLUME = 'iceberg_external_volume'
BASE_LOCATION = 'iceberg_table_1';
CREATE OR REPLACE ICEBERG TABLE iceberg_table_2 (
col_1 int,
col_2 string
)
CATALOG = 'SNOWFLAKE'
EXTERNAL_VOLUME = 'iceberg_external_volume'
BASE_LOCATION = 'iceberg_table_2';
The statement results in the following directory structure in your external cloud storage:
STORAGE_BASE_URL
|-- iceberg_table_1
| |-- data/
| |-- metadata/
|-- iceberg_table_2
| |-- data/
| |-- metadata/
Tables that use an external catalog¶
Snowflake doesn’t write or delete storage objects for externally managed Iceberg tables
or on external volumes with the ALLOW_WRITES
property set to FALSE
.
To access your table data and metadata, Snowflake assumes the access control role that you configure for your external volume. You grant the role permission to access a storage location (in a bucket or container). All of your table data and metadata files must be in that location. For example, if your storage location is an S3 bucket, all of your data and metadata files must exist somewhere in that bucket.
Additionally, converting a table does not rewrite any data or metadata files. Snowflake writes to an Iceberg table only after you convert a table to use Snowflake as the catalog.
Enabling storage access logs¶
To diagnose issues and audit access to the storage locations associated with an external volume, you can enable storage logging. Storage logs help you identify the cause of missing or corrupted files.
Enable logging with your storage provider. Because you own and manage storage for Iceberg tables, Snowflake can’t enable logging or auditing on your Iceberg storage locations.
To learn about storage access logs for your storage provider, see the following external topics:
Protecting files with versioning and object retention¶
If your Iceberg table data is in a central data repository (or data lake) that is operated on by multiple tools and services, accidental deletion or corruption might occur. To protect Iceberg table data and ensure retrieval of accidentally deleted or overwritten data, use storage lifecycle management and versioning offered by your storage provider.
With lifecycle management, you can set retention and tracking rules for storage objects. To learn about lifecycle management for your storage provider, see the following external topics:
To support object recovery, you can also enable versioning for your external cloud storage.
To enable versioning for Amazon S3, see Enabling versioning on buckets.
To enable versioning for Google Cloud Storage, see Use Object Versioning.
To enable versioning for Azure, see Enable blob versioning.
Encrypting table files¶
Snowflake can read Iceberg table files in storage that you encrypt using common server-side encryption (SSE) schemes. You should use your cloud service provider to manage encryption keys, and grant the Snowflake principal access to your keys if you use a customer-managed key.
For Amazon S3, Snowflake supports the following SSE options:
SSE option |
Configuration |
---|---|
SSE with Amazon S3 managed keys (SSE-S3) |
Specify |
SSE with AWS KMS keys (SSE-KMS) |
Specify You must also grant privileges required for SSE-KMS encryption. For instructions, see Step 3 in Configure an external volume for Amazon S3. |
For Google Cloud Storage, Snowflake supports the following SSE option:
SSE option |
Configuration |
---|---|
SSE using keys stored in Google Cloud KMS |
Specify You must also Grant the GCS service account permissions on the Google Cloud Key Management Service keys. |