Write support for externally managed Apache Iceberg™ tables¶
Write support for externally managed Apache Iceberg™ tables expands interoperability between Snowflake and third-party systems, allowing you to use Snowflake for data engineering workloads with Iceberg even when you use an external Iceberg catalog.
With this preview, you can now use Snowflake to create Iceberg tables managed by external Iceberg catalogs such as AWS Glue and Snowflake Open Catalog. You can perform write operations on these tables using Snowflake.
Key use cases include the following:
Building complex data engineering pipelines with Iceberg tables: Writing to Iceberg tables in external catalogs from Snowflake allows you to use Snowpark or Snowflake SQL to build complex pipelines that ingest, transform, and process data for Iceberg tables. You can query the data by using Snowflake or other engines. Similarly, you can use Snowflake partner tools to build your Iceberg data engineering pipelines.
Making your data available to the Iceberg ecosystem: The ability to write to Iceberg tables in external catalogs lets you make your data available to the Iceberg ecosystem. You can query data that’s already in Snowflake and write it to Iceberg tables. To keep your Iceberg tables in sync with your Snowflake tables, you can use operations like INSERT INTO … SELECT FROM to do the following:
Copy existing data from a standard Snowflake table into an Iceberg table.
Insert data using Snowflake streams.
This feature is supported for Iceberg tables that meet the following requirements:
Must be managed by an Iceberg REST catalog like Open Catalog.
Must store data in Amazon S3 or Google Cloud Storage.
Workflow¶
Use the workflow in this section to get started with this feature:
Choose from the following options:
Create a catalog-linked database. With this option, you can write to auto-discovered Iceberg tables in your catalog, or use the catalog-linked database to create additional Iceberg tables.
Create an Iceberg table (in a catalog-linked database or a standard Snowflake database).
After you complete these steps, you can perform write operations on your Iceberg tables.
Considerations¶
Consider the following when you use write support for externally managed Iceberg tables:
To write to an externally managed Iceberg table, you must have the OWNERSHIP privilege on the table.
Snowflake provides Data Definition Language (DDL) and Data Manipulation Language (DML) commands for externally managed tables. However, you configure metadata and data retention using your external catalog and the tools provided by your external storage provider. For more information, see Tables that use an external catalog.
If you use a catalog-linked database, you can use the CREATE ICEBERG TABLE syntax with column definitions to create a table in Snowflake and in your remote catalog. If you use a standard Snowflake database (not linked to a catalog), you must first create a table in your remote catalog. After that, you can use the CREATE ICEBERG TABLE (Iceberg REST catalog) syntax to create an Iceberg table in Snowflake and write to it.
Partitioned writes are supported. For more information, see Partitioning for Apache Iceberg™ tables.
The following features aren’t currently supported when you write to externally managed Iceberg tables:
Catalog-vended credentials.
Azure storage.
Server-side encryption (SSE) for external volumes.
Conversion to Snowflake-managed tables.
External Iceberg catalogs that don’t conform to the Iceberg REST protocol.
Row-level deletes (merge-on-read).
For creating schemas in a catalog-linked database, be aware of the following:
The CREATE SCHEMA command creates a corresponding namespace in your remote catalog only when you use a catalog-linked database.
The ALTER and CLONE options aren’t supported.
Delimiters aren’t supported for schema names. Only alphanumeric schema names are supported.
Configuring an external volume for writes to externally managed tables¶
To enable writes to externally managed tables, you must use an external volume configured with both read and write permissions for the Snowflake service principal (the same permissions required for Snowflake-managed tables). For more information about the required permissions, see Granting Snowflake access to your storage.
Cloud provider |
Required actions |
---|---|
Amazon S3 |
|
Google Cloud Storage |
|
Note
The s3:PutObject
permission grants write access to the external volume location.
To completely configure write access, you must set the ALLOW_WRITES
parameter of the external volume to TRUE
(the default value).
You must specify the storage location (STORAGE_BASE_URL) where your Iceberg REST catalog writes Iceberg data and metadata.
For example, the following statement creates an external volume for S3 with encryption that allows write access to a bucket named
my-iceberg-tables
:
CREATE OR REPLACE EXTERNAL VOLUME my_external_volume
STORAGE_LOCATIONS =
(
(
NAME = 'my-s3-us-west-2'
STORAGE_PROVIDER = 'S3'
STORAGE_BASE_URL = 's3://my-iceberg-tables/'
STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::123456789012:role/my-write-access-role'
)
)
ALLOW_WRITES = TRUE;
For complete instructions, see the following topics:
Configuring a catalog integration¶
For writing to externally managed Iceberg tables, you must configure a catalog integration to connect Snowflake to your remote catalog.
Your remote catalog must comply with the open source Apache Iceberg REST OpenAPI specification, such as Open Catalog or the AWS Glue Iceberg REST catalog.
AWS Glue¶
Note
If you currently use a catalog integration for AWS Glue, you must create a new REST catalog integration for the AWS Glue Iceberg REST endpoint.
Create an IAM policy for Snowflake to access the AWS Glue Data Catalog.
Attach the policy to an IAM role, which you specify when you create the catalog integration. For instructions, see Creating IAM policies and Modifying a role permissions policy in the AWS Identity and Access Management User Guide.
The following example policy (in JSON format) provides the required permissions for read and write access to all of the tables in all databases.
{ "Version": "2012-10-17", "Statement": [ { "Sid": "AllowGlueCatalogTableAccess", "Effect": "Allow", "Action": [ "s3:GetObject", "s3:PutObject", "glue:GetCatalog", "glue:GetDatabase", "glue:GetDatabases", "glue:CreateDatabase", "glue:DeleteDatabase", "glue:GetTable", "glue:GetTables", "glue:CreateTable", "glue:UpdateTable", "glue:DeleteTable" ], "Resource": [ "arn:aws:glue:*:<accountid>:table/*/*", "arn:aws:glue:*:<accountid>:catalog", "arn:aws:glue:*:<accountid>:database/*", "arn:aws:s3:<external_volume_path>" ] } ] }
Note
The policy must provide access to your storage location in order to write metadata to the table location.
The
"arn:aws:glue:*:<accountid>:database/*"
line in theResource
element of this policy specifies all databases. This is required if you want to create a new database in Glue from Snowflake with the CREATE SCHEMA command. To limit access to a single database, you can specify the database by name. For more information about defining resources, see Resource types defined by AWS Glue.If you use encryption for AWS Glue, you must modify the policy to add AWS Key Management Service (AWS KMS) permissions. For more information, see Setting up encryption in AWS Glue.
Create a catalog integration for the AWS Glue Iceberg REST endpoint using the CREATE CATALOG INTEGRATION (Apache Iceberg™ REST) command. Specify the IAM role that you configured. For
CATALOG_NAME
, use your AWS account ID.CREATE CATALOG INTEGRATION glue_rest_catalog_int CATALOG_SOURCE = ICEBERG_REST TABLE_FORMAT = ICEBERG CATALOG_NAMESPACE = 'rest_catalog_integration' REST_CONFIG = ( CATALOG_URI = 'https://glue.us-west-2.amazonaws.com/iceberg' CATALOG_API_TYPE = AWS_GLUE CATALOG_NAME= '123456789012' ) REST_AUTHENTICATION = ( TYPE = SIGV4 SIGV4_IAM_ROLE = 'arn:aws:iam::123456789012:role/my-role' SIGV4_SIGNING_REGION = 'us-west-2' ) ENABLED = TRUE;
Snowflake Open Catalog or Iceberg REST¶
To create a catalog integration for Open Catalog or Iceberg REST, see the instructions in the following topics:
Create a catalog-linked database¶
Snowflake supports creating writable externally managed tables in a catalog-linked database, which is a Snowflake database that you sync with an external Iceberg REST catalog. You can also write to Iceberg tables that Snowflake automatically discovers in your remote catalog. For more information, see Use a catalog-linked database for Apache Iceberg™ tables.
Note
Alternatively, you can create writable externally managed Iceberg tables in a standard Snowflake database.
The following example creates a catalog-linked database that uses an external volume:
CREATE DATABASE my_linked_db
LINKED_CATALOG = (
CATALOG = 'my_catalog_int'
),
EXTERNAL_VOLUME = 'my_external_vol';
Note
Using a catalog-linked database with vended credentials isn’t currently supported for writable externally managed tables.
Use CREATE SCHEMA to create namespaces in your external catalog¶
To create a namespace for organizing Iceberg tables in your external catalog, you can use the CREATE SCHEMA command with a catalog-linked database. The command creates a namespace in your linked Iceberg REST catalog and a corresponding schema in your Snowflake database.
CREATE SCHEMA <name>;
Where:
name
Specifies the identifier for the new schema (namespace in the remote catalog).
Note
Schema names must be alphanumeric and can’t include delimiters.
DROP SCHEMA¶
You can also use the DROP SCHEMA command to simultaneously drop a schema from your catalog-linked database and its corresponding namespace from your remote catalog.
DROP SCHEMA my_schema;
Create an Iceberg table¶
Creating an externally managed Iceberg table that you can write to from Snowflake varies, depending on the kind of database you use:
If you use a catalog-linked database, you can use the CREATE ICEBERG TABLE (catalog-linked database) syntax to create a table and register it in your remote catalog. For instructions, see Create an Iceberg table in a catalog-linked database.
If you use a standard Snowflake database (not linked to a catalog), you must first create a a table in your remote catalog. Then, you can use the CREATE ICEBERG TABLE (Iceberg REST catalog) syntax to create an Iceberg table in Snowflake and write to it. For instructions, see Create an Iceberg table in a standard Snowflake database.
Create an Iceberg table in a catalog-linked database¶
To create a table from Snowflake and in your external catalog, you can use the CREATE ICEBERG TABLE command with a catalog-linked database.
CREATE ICEBERG TABLE (catalog-linked database) syntax¶
CREATE ICEBERG TABLE [ IF NOT EXISTS ] <table_name>
[
--Column definition
<col_name> <col_type>
-- Additional column definitions
[ , <col_name> <col_type> [ ... ] ]
]
[ PARTITION BY ( partitionExpression [, partitionExpression , ...] ) ]
[ EXTERNAL_VOLUME = '<external_volume_name>' ]
[ CATALOG = <catalog_integration_name> ]
[ AUTO_REFRESH = { TRUE | FALSE } ]
[ REPLACE_INVALID_CHARACTERS = { TRUE | FALSE } ]
[ COMMENT = '<string_literal>' ]
Required parameters¶
table_name
Specifies the identifier (name) for the table; must be unique for the schema in which you create the table.
In addition, the identifier must start with an alphabetic character and cannot contain spaces or special characters unless the entire identifier string is enclosed in double quotes (for example,
"My object"
). Identifiers enclosed in double quotes are also case-sensitive.For more information, see Identifier requirements.
col_name
Specifies a column identifier (name). All the requirements for table identifiers also apply to column identifiers.
For more information, see Identifier requirements and Reserved & limited keywords.
Note
In addition to the standard reserved keywords, the following keywords can’t be used as column identifiers because they are reserved for ANSI-standard context functions:
CURRENT_DATE
CURRENT_ROLE
CURRENT_TIME
CURRENT_TIMESTAMP
CURRENT_USER
For the list of reserved keywords, see Reserved & limited keywords.
col_type
Specifies the data type for the column.
For information about the data types that can be specified for table columns, see Data types for Apache Iceberg™ tables.
Optional parameters¶
EXTERNAL_VOLUME = 'external_volume_name'
Specifies the identifier for your external volume.
You must specify an external volume if you haven’t set one at the account, database, or schema level, or if you’re not using a catalog-linked database.
CATALOG = 'catalog_integration_name'
Specifies the identifier (name) of the catalog integration for this table.
You must specify a catalog integration if you haven’t set one at the account, database, or schema level. Otherwise, the Iceberg table defaults to the catalog integration for the schema, database, or account.
The schema takes precedence over the database, and the database takes precedence over the account.
PARTITION BY ( partitionExpression [, partitionExpression , ...] )
Specifies one or more partition transforms for the table. For more information, see Partitioning for Apache Iceberg™ tables.
AUTO_REFRESH = '{ TRUE | FALSE }'
Specifies whether the table data will be automatically refreshed.
Note
Using AUTO_REFRESH with INFER_SCHEMA isn’t supported.
REPLACE_INVALID_CHARACTERS = { TRUE | FALSE }
Specifies whether to replace invalid UTF-8 characters with the Unicode replacement character (ďż˝) in query results. You can only set this parameter for tables that use an external Iceberg catalog.
TRUE
replaces invalid UTF-8 characters with the Unicode replacement character.FALSE
leaves invalid UTF-8 characters unchanged. Snowflake returns a user error message when it encounters invalid UTF-8 characters in a Parquet data file.
If not specified, the Iceberg table defaults to the parameter value for the schema, database, or account. The schema takes precedence over the database, and the database takes precedence over the account.
Default:
FALSE
COMMENT 'string_literal'
Specifies a comment for the column or the table.
Comments can be specified at the column level or the table level. The syntax for each is slightly different.
Example¶
The following example creates a writable Iceberg table using the previously created external volume and catalog integration for AWS Glue REST:
CREATE OR REPLACE ICEBERG TABLE my_iceberg_table (
first_name string,
last_name string,
amount int,
create_date date
)
EXTERNAL_VOLUME = 'my_external_volume'
CATALOG = 'glue_rest_catalog_int';
When you run the command, Snowflake creates a new Iceberg table in your remote catalog and a linked, writable externally managed table in Snowflake.
Create an Iceberg table in a standard Snowflake database¶
If using a standard Snowflake database, you must first create a table in your remote catalog. For example, you might use Spark to write an Iceberg table to Open Catalog.
After you create the table in your remote catalog, use the CREATE ICEBERG TABLE (Iceberg REST catalog) command to create an Iceberg table object in Snowflake. For the CATALOG_TABLE_NAME, specify the name of the table as it appears in your remote catalog. For example:
CREATE OR REPLACE ICEBERG TABLE my_iceberg_table
EXTERNAL_VOLUME = 'my_external_volume'
CATALOG = 'my_rest_catalog_integration'
CATALOG_TABLE_NAME = 'my_remote_table_name';
When you run the command, Snowflake creates a writable externally managed table in Snowflake that is linked to the existing table in your remote catalog.
Writing to Iceberg tables¶
You can use the following DML commands for externally managed Iceberg tables:
COPY INTO <table>. For more information, see Load data into Apache Iceberg™ tables.
You can also use the Snowpark API to process Iceberg tables.
Examples¶
You can use the following basic examples to get started with writing to Iceberg tables.
INSERT¶
Use INSERT to insert values into an Iceberg table:
INSERT INTO my_iceberg_table VALUES (1, 'a');
INSERT INTO my_iceberg_table VALUES (2, 'b');
INSERT INTO my_iceberg_table VALUES (3, 'c');
UPDATE¶
Use UPDATE to update the values in an Iceberg table:
UPDATE my_iceberg_table
SET a = 10
WHERE b = 'b';
DELETE¶
Use DELETE to remove values from an Iceberg table:
DELETE my_iceberg_table
WHERE b = 'a';
MERGE¶
Use MERGE on an Iceberg table:
MERGE INTO my_iceberg_table USING my_snowflake_table
ON my_iceberg_table.a = my_snowflake_table.a
WHEN MATCHED THEN
UPDATE SET my_iceberg_table.b = my_snowflake_table.b
WHEN NOT MATCHED THEN
INSERT VALUES (my_snowflake_table.a, my_snowflake_table.b);
COPY INTO <table>¶
Use COPY INTO <table> to load data into an Iceberg table.
COPY INTO customer_iceberg_ingest
FROM @my_parquet_stage
FILE_FORMAT = 'my_parquet_format'
LOAD_MODE = ADD_FILES_COPY
PURGE = TRUE
MATCH_BY_COLUMN_NAME = CASE_SENSITIVE;
For more information, see Load data into Apache Iceberg™ tables for more information.
Change Data Capture using streams¶
A table stream tracks changes made to rows in a source table for Change Data Capture (CDC). The source table can be a standard Snowflake table, a Snowflake-managed Iceberg table, or an externally managed Iceberg table. You can insert the changes into an externally managed Iceberg table using the INSERT INTO… SELECT FROM… command.
CREATE OR REPLACE STREAM my_stream ON TABLE my_snowflake_table;
//...
INSERT INTO my_iceberg_table(id,name)
SELECT id, name
FROM my_stream;
Using Snowpark¶
Create a function to copy data into an Iceberg table from a Snowflake table by using Snowpark Python.
def copy_into_iceberg():
try:
df = session.table("my_snowflake_table")
df.write.save_as_table("my_iceberg_table")
except Exception as e:
print(f"Error processing {table_name}: {e}")
Altering tables¶
You can use the ALTER ICEBERG TABLE command to add, rename, or drop table columns.
ALTER ICEBERG TABLE (external writes) syntax¶
ALTER TABLE [ IF EXISTS ] <name>
ADD [ COLUMN ] [ IF NOT EXISTS ] <col_name> <col_type>
| RENAME COLUMN <col_name> TO <new_col_name>
| DROP [ COLUMN ] [ IF EXISTS ] <col1_name> [, <col2_name> ... ]
Parameters¶
name
Identifier for the table to alter. If the identifier contains spaces or special characters, the entire string must be enclosed in double quotes. Identifiers enclosed in double quotes are also case sensitive.
ADD [ COLUMN ] [ IF NOT EXISTS ] col_name col_data_type
Adds a new column.
ADD COLUMN operations can be performed on multiple columns in the same command.
If you aren’t sure if the column already exists, you can specify IF NOT EXISTS when adding the column. If the column already exists, ADD COLUMN has no effect on the existing column and doesn’t result in an error.
RENAME COLUMN col_name to new_col_name
Renames the specified column to a new name that’s not currently used for any other columns in the table.
When you rename a column, you must update other objects that reference it with the new name.
DROP COLUMN [ IF EXISTS ] col_name
Removes the specified column from the table.
If you aren’t sure whether the column exists, you can specify IF EXISTS when dropping the column. If the column doesn’t exist, DROP COLUMN has no effect and doesn’t result in an error.
Dropping tables¶
When you drop a table in Snowflake, Snowflake also instructs your external Iceberg REST catalog to drop the table.
DROP ICEBERG TABLE [IF EXISTS] <table_name>;
Where:
table_name
Specifies the identifier for the table to drop.
For example, the following command drops the my_iceberg_table
from Snowflake. At the same time, Snowflake makes a
call to your remote Iceberg catalog instructing it to drop the table and delete the table’s underlying data and metadata.
DROP ICEBERG TABLE my_iceberg_table;