Manage Apache Iceberg™ tables¶

Manage Apache Iceberg™ tables in Snowflake:

Query a table
Use DML commands
Generate snapshots of DML changes
Maintain tables that use an external catalog
Refresh the table metadata
Retrieve storage metrics

You can also convert an Iceberg table that uses an external catalog into a table that uses Snowflake as the Iceberg catalog. To learn more, see Convert an Apache Iceberg™ table to use Snowflake as the catalog.

Query a table¶

To query an Iceberg table, a user must be granted or inherit the following privileges:

The USAGE privilege on the database and schema that contain the table
The SELECT privilege on the table

You can query an Iceberg table using a SELECT statement. For example:

SELECT col1, col2 FROM my_iceberg_table;

Copy

Use DML commands¶

Iceberg tables that use Snowflake as the catalog support full Data Manipulation Language (DML) commands, including the following:

Snowflake-managed tables also support efficient bulk loading using features such as COPY INTO <table> and Snowpipe. For more information, see Load data into Apache Iceberg™ tables.

Note

Snowflake also supports writing to externally managed Iceberg tables (preview). For more information, see Write support for externally managed Apache Iceberg™ tables and Writing to externally managed Iceberg tables.

Example: Update a table¶

You can use INSERT and UPDATE statements to modify Snowflake-managed Iceberg tables.

The following example inserts a new value into an Iceberg table named store_sales, then updates the cola column to 1 if the value is currently -99.

INSERT INTO store_sales VALUES (-99);

UPDATE store_sales
  SET cola = 1
  WHERE cola = -99;

Copy

Generate snapshots of DML changes¶

For tables that use Snowflake as the catalog, Snowflake automatically generates the Iceberg metadata. Snowflake writes the metadata to a folder named metadata on your external volume. To find the metadata folder, see Data and metadata directories.

Alternatively, you can call the SYSTEM$GET_ICEBERG_TABLE_INFORMATION function to generate Iceberg metadata for any new changes.

For tables that aren’t managed by Snowflake, the function returns information about the latest refreshed snapshot.

For example:

SELECT SYSTEM$GET_ICEBERG_TABLE_INFORMATION('db1.schema1.it1');

Copy

Output:

+-----------------------------------------------------------------------------------------------------------+
| SYSTEM$GET_ICEBERG_TABLE_INFORMATION('DB1.SCHEMA1.IT1')                                                   |
|-----------------------------------------------------------------------------------------------------------|
| {"metadataLocation":"s3://mybucket/metadata/v1.metadata.json","status":"success"}                         |
+-----------------------------------------------------------------------------------------------------------+

Use row-level deletes¶

Note

Supported for externally managed Iceberg tables only.

Snowflake supports querying externally managed Iceberg tables when you’ve configured row-level deletes for update, delete, and merge operations.

To configure row-level deletes, see Write properties in the Apache Iceberg documentation.

Copy-on-write vs. merge-on-read¶

Iceberg provides two modes for configuring how compute engines handle row-level operations for externally managed tables. Snowflake supports both of these modes.

The following table describes when you might want to use each mode:

Mode	Description
Copy-on-write (default)	This mode prioritizes read time and affects write speed. When you perform an update, delete, or merge operation, your compute engine rewrites the entire affected Parquet data file. This can result in slow writes, especially if you have large data files, but doesn’t impact read time. This is the default mode.
Merge-on-read	This mode prioritizes write speed and slightly affects read time. When you perform an update, delete, or merge operation, your compute engine creates a delete file that contains information about only the changed rows. When you read from a table, your query engine merges delete files with data files. Merging can increase read time. However, you can optimize read performance by scheduling regular compaction and table maintenance.

Mode

Description

Copy-on-write (default)

This mode prioritizes read time and affects write speed.

When you perform an update, delete, or merge operation, your compute engine rewrites the entire affected Parquet data file. This can result in slow writes, especially if you have large data files, but doesn’t impact read time.

This is the default mode.

Merge-on-read

This mode prioritizes write speed and slightly affects read time.

When you perform an update, delete, or merge operation, your compute engine creates a delete file that contains information about only the changed rows.

When you read from a table, your query engine merges delete files with data files. Merging can increase read time. However, you can optimize read performance by scheduling regular compaction and table maintenance.

For more information about row-level changes for Iceberg, see Row-level deletes in the Apache Iceberg documentation.

Considerations and limitations¶

Consider the following when you use row-level deletes with externally managed Iceberg tables:

Snowflake supports position deletes only.
For the best read performance when you use row-level deletes, perform regular compaction and table maintenance to remove old delete files. For information, see Maintain tables that use an external catalog.
Automated refresh isn’t currently supported when you use position deletes.
Excessive position deletes, especially dangling position deletes, might prevent table creation and refresh operations. To avoid this issue, perform table maintenance to remove extra position deletes.

The table maintenance method to use depends on your external Iceberg engine. For example, you can use the rewrite_data_files method for Spark with the delete-file-threshold or rewrite-all options. For more information, see rewrite_data_files in the Apache Iceberg™ documentation.

Maintain tables that use an external catalog¶

You can perform maintenance operations on Iceberg tables that use an external catalog.

Maintenance operations include the following:

Expiring snapshots
Removing old metadata files
Compacting data files

Important

To keep your Iceberg table in sync with external changes, it’s important to align your Snowflake refresh schedule with table maintenance. Refresh the table each time you perform a maintenance operation.

To learn about maintenance for Iceberg tables that aren’t managed by Snowflake, see Maintenance in the Apache Iceberg documentation.

Refresh the table metadata¶

When you use an external Iceberg catalog, you can refresh the table metadata using the ALTER ICEBERG TABLE … REFRESH command. Refreshing the table metadata synchronizes the metadata with the most recent table changes.

Note

We recommend setting up automated refresh for supported externally managed tables.

Refresh the metadata for a table¶

The following example manually refreshes the metadata for a table that uses an external catalog (for example, AWS Glue or Delta). Refreshing the table keeps the table in sync with any changes that have occurred in the remote catalog.

With this type of Iceberg table, you don’t specify a metadata file path in the command.

ALTER ICEBERG TABLE my_iceberg_table REFRESH;

Copy

To keep a table updated automatically, you can set up automated refresh. Use the ALTER ICEBERG TABLE command.

For example:

ALTER ICEBERG TABLE my_iceberg_table SET AUTO_REFRESH = TRUE;

Copy

Refresh the metadata for a table created from Iceberg files¶

The following example manually refreshes a table created from Iceberg metadata files in an external cloud storage location, specifying the relative path to a metadata file without the leading forward slash (/). The metadata file defines the data in the table after refreshing.

ALTER ICEBERG TABLE my_iceberg_table REFRESH 'metadata/v1.metadata.json';

Copy

Retrieve storage metrics¶

Snowflake does not bill your account for Iceberg table storage costs. However, you can track how much storage an Iceberg table occupies by querying the TABLE_STORAGE_METRICS and TABLES views in the Snowflake Information Schema or Account Usage schema.

The following example query joins the ACCOUNT_USAGE.TABLE_STORAGE_METRICS view with the ACCOUNT_USAGE.TABLES view, filtering on the TABLES.IS_ICEBERG column.

SELECT metrics.* FROM
  snowflake.account_usage.table_storage_metrics metrics
  INNER JOIN snowflake.account_usage.tables tables
  ON (
    metrics.id = tables.table_id
    AND metrics.table_schema_id = tables.table_schema_id
    AND metrics.table_catalog_id = tables.table_catalog_id
  )
  WHERE tables.is_iceberg='YES';

Copy