Write support for externally managed Apache Iceberg™ tables¶

Write support for externally managed Apache Iceberg™ tables expands interoperability between Snowflake and third-party systems, allowing you to use Snowflake for data engineering workloads with Iceberg even when you use an external Iceberg catalog.

With this preview, you can now use Snowflake to create Iceberg tables managed by external Iceberg catalogs such as AWS Glue and Snowflake Open Catalog. You can perform write operations on these tables using Snowflake.

Key use cases include the following:

  • Building complex data engineering pipelines with Iceberg tables: Writing to Iceberg tables in external catalogs from Snowflake allows you to use Snowpark or Snowflake SQL to build complex pipelines that ingest, transform, and process data for Iceberg tables. You can query the data by using Snowflake or other engines. Similarly, you can use Snowflake partner tools to build your Iceberg data engineering pipelines.

  • Making your data available to the Iceberg ecosystem: The ability to write to Iceberg tables in external catalogs lets you make your data available to the Iceberg ecosystem. You can query data that’s already in Snowflake and write it to Iceberg tables. To keep your Iceberg tables in sync with your Snowflake tables, you can use operations like INSERT INTO … SELECT FROM to do the following:

    • Copy existing data from a standard Snowflake table into an Iceberg table.

    • Insert data using Snowflake streams.

This feature is supported for Iceberg tables that meet the following requirements:

  • Must be managed by an Iceberg REST catalog like Open Catalog.

  • Must store data in Amazon S3 or Google Cloud Storage.

Workflow¶

Use the workflow in this section to get started with this feature:

  1. Configure an external volume

  2. Create a catalog integration

  3. Choose from the following options:

  • Create a catalog-linked database. With this option, you can write to auto-discovered Iceberg tables in your catalog, or use the catalog-linked database to create additional Iceberg tables.

  • Create an Iceberg table (in a catalog-linked database or a standard Snowflake database).

After you complete these steps, you can perform write operations on your Iceberg tables.

Considerations¶

Consider the following when you use write support for externally managed Iceberg tables:

  • To write to an externally managed Iceberg table, you must have the OWNERSHIP privilege on the table.

  • Snowflake provides Data Definition Language (DDL) and Data Manipulation Language (DML) commands for externally managed tables. However, you configure metadata and data retention using your external catalog and the tools provided by your external storage provider. For more information, see Tables that use an external catalog.

  • If you use a catalog-linked database, you can use the CREATE ICEBERG TABLE syntax with column definitions to create a table in Snowflake and in your remote catalog. If you use a standard Snowflake database (not linked to a catalog), you must first create a table in your remote catalog. After that, you can use the CREATE ICEBERG TABLE (Iceberg REST catalog) syntax to create an Iceberg table in Snowflake and write to it.

  • Partitioned writes are supported. For more information, see Partitioning for Apache Iceberg™ tables.

  • The following features aren’t currently supported when you write to externally managed Iceberg tables:

    • Catalog-vended credentials.

    • Cross-cloud/cross-region support.

    • Azure storage.

    • Server-side encryption (SSE) for external volumes.

    • Conversion to Snowflake-managed tables.

    • External Iceberg catalogs that don’t conform to the Iceberg REST protocol.

    • Row-level deletes (merge-on-read).

  • For creating schemas in a catalog-linked database, be aware of the following:

    • The CREATE SCHEMA command creates a corresponding namespace in your remote catalog only when you use a catalog-linked database.

    • The ALTER and CLONE options aren’t supported.

    • Delimiters aren’t supported for schema names. Only alphanumeric schema names are supported.

Configuring an external volume for writes to externally managed tables¶

To enable writes to externally managed tables, you must use an external volume configured with both read and write permissions for the Snowflake service principal (the same permissions required for Snowflake-managed tables). For more information about the required permissions, see Granting Snowflake access to your storage.

Cloud provider

Required actions

Amazon S3

  • s3:GetBucketLocation

  • s3:GetObject

  • s3:ListBucket

  • s3:PutObject

  • s3:DeleteObject

  • s3:GetObjectVersion

  • s3:DeleteObjectVersion

Google Cloud Storage

  • storage.objects.create

  • storage.objects.delete

  • storage.objects.get

  • storage.objects.list

Note

The s3:PutObject permission grants write access to the external volume location. To completely configure write access, you must set the ALLOW_WRITES parameter of the external volume to TRUE (the default value).

You must specify the storage location (STORAGE_BASE_URL) where your Iceberg REST catalog writes Iceberg data and metadata.

For example, the following statement creates an external volume for S3 with encryption that allows write access to a bucket named my-iceberg-tables:

CREATE OR REPLACE EXTERNAL VOLUME my_external_volume
  STORAGE_LOCATIONS =
    (
      (
        NAME = 'my-s3-us-west-2'
        STORAGE_PROVIDER = 'S3'
        STORAGE_BASE_URL = 's3://my-iceberg-tables/'
        STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::123456789012:role/my-write-access-role'
      )
    )
  ALLOW_WRITES = TRUE;
Copy

For complete instructions, see the following topics:

Configuring a catalog integration¶

For writing to externally managed Iceberg tables, you must configure a catalog integration to connect Snowflake to your remote catalog.

Your remote catalog must comply with the open source Apache Iceberg REST OpenAPI specification, such as Open Catalog or the AWS Glue Iceberg REST catalog.

AWS Glue¶

Note

If you currently use a catalog integration for AWS Glue, you must create a new REST catalog integration for the AWS Glue Iceberg REST endpoint.

  1. Create an IAM policy for Snowflake to access the AWS Glue Data Catalog.

    Attach the policy to an IAM role, which you specify when you create the catalog integration. For instructions, see Creating IAM policies and Modifying a role permissions policy in the AWS Identity and Access Management User Guide.

    The following example policy (in JSON format) provides the required permissions for read and write access to all of the tables in all databases.

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Sid": "AllowGlueCatalogTableAccess",
          "Effect": "Allow",
          "Action": [
            "s3:GetObject",
            "s3:PutObject",
            "glue:GetCatalog",
            "glue:GetDatabase",
            "glue:GetDatabases",
            "glue:CreateDatabase",
            "glue:DeleteDatabase",
            "glue:GetTable",
            "glue:GetTables",
            "glue:CreateTable",
            "glue:UpdateTable",
            "glue:DeleteTable"
          ],
          "Resource": [
            "arn:aws:glue:*:<accountid>:table/*/*",
            "arn:aws:glue:*:<accountid>:catalog",
            "arn:aws:glue:*:<accountid>:database/*",
            "arn:aws:s3:<external_volume_path>"
          ]
        }
      ]
    }
    
    Copy

    Note

    • The policy must provide access to your storage location in order to write metadata to the table location.

    • The "arn:aws:glue:*:<accountid>:database/*" line in the Resource element of this policy specifies all databases. This is required if you want to create a new database in Glue from Snowflake with the CREATE SCHEMA command. To limit access to a single database, you can specify the database by name. For more information about defining resources, see Resource types defined by AWS Glue.

    • If you use encryption for AWS Glue, you must modify the policy to add AWS Key Management Service (AWS KMS) permissions. For more information, see Setting up encryption in AWS Glue.

  2. Create a catalog integration for the AWS Glue Iceberg REST endpoint using the CREATE CATALOG INTEGRATION (Apache Iceberg™ REST) command. Specify the IAM role that you configured. For CATALOG_NAME, use your AWS account ID.

    CREATE CATALOG INTEGRATION glue_rest_catalog_int
      CATALOG_SOURCE = ICEBERG_REST
      TABLE_FORMAT = ICEBERG
      CATALOG_NAMESPACE = 'rest_catalog_integration'
      REST_CONFIG = (
        CATALOG_URI = 'https://glue.us-west-2.amazonaws.com/iceberg'
        CATALOG_API_TYPE = AWS_GLUE
        CATALOG_NAME= '123456789012'
      )
      REST_AUTHENTICATION = (
        TYPE = SIGV4
        SIGV4_IAM_ROLE = 'arn:aws:iam::123456789012:role/my-role'
        SIGV4_SIGNING_REGION = 'us-west-2'
      )
      ENABLED = TRUE;
    
    Copy

Snowflake Open Catalog or Iceberg REST¶

To create a catalog integration for Open Catalog or Iceberg REST, see the instructions in the following topics:

Create a catalog-linked database¶

Snowflake supports creating writable externally managed tables in a catalog-linked database, which is a Snowflake database that you sync with an external Iceberg REST catalog. You can also write to Iceberg tables that Snowflake automatically discovers in your remote catalog. For more information, see Use a catalog-linked database for Apache Iceberg™ tables.

Note

Alternatively, you can create writable externally managed Iceberg tables in a standard Snowflake database.

The following example creates a catalog-linked database that uses an external volume:

CREATE DATABASE my_linked_db
LINKED_CATALOG = (
  CATALOG = 'my_catalog_int'
),
EXTERNAL_VOLUME = 'my_external_vol';
Copy

Note

Using a catalog-linked database with vended credentials isn’t currently supported for writable externally managed tables.

Use CREATE SCHEMA to create namespaces in your external catalog¶

To create a namespace for organizing Iceberg tables in your external catalog, you can use the CREATE SCHEMA command with a catalog-linked database. The command creates a namespace in your linked Iceberg REST catalog and a corresponding schema in your Snowflake database.

CREATE SCHEMA <name>;
Copy

Where:

name

Specifies the identifier for the new schema (namespace in the remote catalog).

Note

Schema names must be alphanumeric and can’t include delimiters.

DROP SCHEMA¶

You can also use the DROP SCHEMA command to simultaneously drop a schema from your catalog-linked database and its corresponding namespace from your remote catalog.

DROP SCHEMA my_schema;
Copy

Create an Iceberg table¶

Creating an externally managed Iceberg table that you can write to from Snowflake varies, depending on the kind of database you use:

Create an Iceberg table in a catalog-linked database¶

To create a table from Snowflake and in your external catalog, you can use the CREATE ICEBERG TABLE command with a catalog-linked database.

CREATE ICEBERG TABLE (catalog-linked database) syntax¶

CREATE ICEBERG TABLE [ IF NOT EXISTS ] <table_name>
  [
    --Column definition
    <col_name> <col_type>

    -- Additional column definitions
  [ , <col_name> <col_type> [ ... ] ]
  ]
  [ PARTITION BY ( partitionExpression [, partitionExpression , ...] ) ]
  [ EXTERNAL_VOLUME = '<external_volume_name>' ]
  [ CATALOG = <catalog_integration_name> ]
  [ AUTO_REFRESH = { TRUE | FALSE } ]
  [ REPLACE_INVALID_CHARACTERS = { TRUE | FALSE } ]
  [ COMMENT = '<string_literal>' ]
Copy
Required parameters¶
table_name

Specifies the identifier (name) for the table; must be unique for the schema in which you create the table.

In addition, the identifier must start with an alphabetic character and cannot contain spaces or special characters unless the entire identifier string is enclosed in double quotes (for example, "My object"). Identifiers enclosed in double quotes are also case-sensitive.

For more information, see Identifier requirements.

col_name

Specifies a column identifier (name). All the requirements for table identifiers also apply to column identifiers.

For more information, see Identifier requirements and Reserved & limited keywords.

Note

In addition to the standard reserved keywords, the following keywords can’t be used as column identifiers because they are reserved for ANSI-standard context functions:

  • CURRENT_DATE

  • CURRENT_ROLE

  • CURRENT_TIME

  • CURRENT_TIMESTAMP

  • CURRENT_USER

For the list of reserved keywords, see Reserved & limited keywords.

col_type

Specifies the data type for the column.

For information about the data types that can be specified for table columns, see Data types for Apache Iceberg™ tables.

Optional parameters¶
EXTERNAL_VOLUME = 'external_volume_name'

Specifies the identifier for your external volume.

You must specify an external volume if you haven’t set one at the account, database, or schema level, or if you’re not using a catalog-linked database.

CATALOG = 'catalog_integration_name'

Specifies the identifier (name) of the catalog integration for this table.

You must specify a catalog integration if you haven’t set one at the account, database, or schema level. Otherwise, the Iceberg table defaults to the catalog integration for the schema, database, or account.

The schema takes precedence over the database, and the database takes precedence over the account.

PARTITION BY ( partitionExpression [, partitionExpression , ...] )

Specifies one or more partition transforms for the table. For more information, see Partitioning for Apache Iceberg™ tables.

AUTO_REFRESH = '{ TRUE | FALSE }'

Specifies whether the table data will be automatically refreshed.

Note

Using AUTO_REFRESH with INFER_SCHEMA isn’t supported.

REPLACE_INVALID_CHARACTERS = { TRUE | FALSE }

Specifies whether to replace invalid UTF-8 characters with the Unicode replacement character (ďż˝) in query results. You can only set this parameter for tables that use an external Iceberg catalog.

  • TRUE replaces invalid UTF-8 characters with the Unicode replacement character.

  • FALSE leaves invalid UTF-8 characters unchanged. Snowflake returns a user error message when it encounters invalid UTF-8 characters in a Parquet data file.

If not specified, the Iceberg table defaults to the parameter value for the schema, database, or account. The schema takes precedence over the database, and the database takes precedence over the account.

Default: FALSE

COMMENT 'string_literal'

Specifies a comment for the column or the table.

Comments can be specified at the column level or the table level. The syntax for each is slightly different.

Example¶

The following example creates a writable Iceberg table using the previously created external volume and catalog integration for AWS Glue REST:

CREATE OR REPLACE ICEBERG TABLE my_iceberg_table (
  first_name string,
  last_name string,
  amount int,
  create_date date
)
  EXTERNAL_VOLUME = 'my_external_volume'
  CATALOG = 'glue_rest_catalog_int';
Copy

When you run the command, Snowflake creates a new Iceberg table in your remote catalog and a linked, writable externally managed table in Snowflake.

Create an Iceberg table in a standard Snowflake database¶

If using a standard Snowflake database, you must first create a table in your remote catalog. For example, you might use Spark to write an Iceberg table to Open Catalog.

After you create the table in your remote catalog, use the CREATE ICEBERG TABLE (Iceberg REST catalog) command to create an Iceberg table object in Snowflake. For the CATALOG_TABLE_NAME, specify the name of the table as it appears in your remote catalog. For example:

CREATE OR REPLACE ICEBERG TABLE my_iceberg_table
  EXTERNAL_VOLUME = 'my_external_volume'
  CATALOG = 'my_rest_catalog_integration'
  CATALOG_TABLE_NAME = 'my_remote_table_name';
Copy

When you run the command, Snowflake creates a writable externally managed table in Snowflake that is linked to the existing table in your remote catalog.

Writing to Iceberg tables¶

You can use the following DML commands for externally managed Iceberg tables:

You can also use the Snowpark API to process Iceberg tables.

Examples¶

You can use the following basic examples to get started with writing to Iceberg tables.

INSERT¶

Use INSERT to insert values into an Iceberg table:

INSERT INTO my_iceberg_table VALUES (1, 'a');
INSERT INTO my_iceberg_table VALUES (2, 'b');
INSERT INTO my_iceberg_table VALUES (3, 'c');
Copy

UPDATE¶

Use UPDATE to update the values in an Iceberg table:

UPDATE my_iceberg_table
  SET a = 10
  WHERE b = 'b';
Copy

DELETE¶

Use DELETE to remove values from an Iceberg table:

DELETE my_iceberg_table
  WHERE b = 'a';
Copy

MERGE¶

Use MERGE on an Iceberg table:

MERGE INTO my_iceberg_table USING my_snowflake_table
  ON my_iceberg_table.a = my_snowflake_table.a
  WHEN MATCHED THEN
      UPDATE SET my_iceberg_table.b = my_snowflake_table.b
  WHEN NOT MATCHED THEN
      INSERT VALUES (my_snowflake_table.a, my_snowflake_table.b);
Copy

COPY INTO <table>¶

Use COPY INTO <table> to load data into an Iceberg table.

COPY INTO customer_iceberg_ingest
  FROM @my_parquet_stage
  FILE_FORMAT = 'my_parquet_format'
  LOAD_MODE = ADD_FILES_COPY
  PURGE = TRUE
  MATCH_BY_COLUMN_NAME = CASE_SENSITIVE;
Copy

For more information, see Load data into Apache Iceberg™ tables for more information.

Change Data Capture using streams¶

A table stream tracks changes made to rows in a source table for Change Data Capture (CDC). The source table can be a standard Snowflake table, a Snowflake-managed Iceberg table, or an externally managed Iceberg table. You can insert the changes into an externally managed Iceberg table using the INSERT INTO… SELECT FROM… command.

CREATE OR REPLACE STREAM my_stream ON TABLE my_snowflake_table;

//...

INSERT INTO my_iceberg_table(id,name)
  SELECT id, name
  FROM my_stream;
Copy

Using Snowpark¶

Create a function to copy data into an Iceberg table from a Snowflake table by using Snowpark Python.

def copy_into_iceberg():

  try:
      df = session.table("my_snowflake_table")

      df.write.save_as_table("my_iceberg_table")

  except Exception as e:
      print(f"Error processing {table_name}: {e}")
Copy

Altering tables¶

You can use the ALTER ICEBERG TABLE command to add, rename, or drop table columns.

ALTER ICEBERG TABLE (external writes) syntax¶

ALTER TABLE [ IF EXISTS ] <name>
    ADD [ COLUMN ] [ IF NOT EXISTS ] <col_name> <col_type>
  | RENAME COLUMN <col_name> TO <new_col_name>
  | DROP [ COLUMN ] [ IF EXISTS ] <col1_name> [, <col2_name> ... ]
Copy

Parameters¶

name

Identifier for the table to alter. If the identifier contains spaces or special characters, the entire string must be enclosed in double quotes. Identifiers enclosed in double quotes are also case sensitive.

ADD [ COLUMN ] [ IF NOT EXISTS ] col_name col_data_type

Adds a new column.

ADD COLUMN operations can be performed on multiple columns in the same command.

If you aren’t sure if the column already exists, you can specify IF NOT EXISTS when adding the column. If the column already exists, ADD COLUMN has no effect on the existing column and doesn’t result in an error.

RENAME COLUMN col_name to new_col_name

Renames the specified column to a new name that’s not currently used for any other columns in the table.

When you rename a column, you must update other objects that reference it with the new name.

DROP COLUMN [ IF EXISTS ] col_name

Removes the specified column from the table.

If you aren’t sure whether the column exists, you can specify IF EXISTS when dropping the column. If the column doesn’t exist, DROP COLUMN has no effect and doesn’t result in an error.

Dropping tables¶

When you drop a table in Snowflake, Snowflake also instructs your external Iceberg REST catalog to drop the table.

DROP ICEBERG TABLE [IF EXISTS] <table_name>;
Copy

Where:

table_name

Specifies the identifier for the table to drop.

For example, the following command drops the my_iceberg_table from Snowflake. At the same time, Snowflake makes a call to your remote Iceberg catalog instructing it to drop the table and delete the table’s underlying data and metadata.

DROP ICEBERG TABLE my_iceberg_table;
Copy