Automatically refresh Apache Iceberg™ tables¶
Configure automated metadata refreshes for new or existing externally managed Apache Iceberg™ tables. With automated refreshes, Snowflake polls your external Iceberg catalog in a continuous and serverless fashion to synchronize the metadata with the most recent remote changes.
Automatic refresh is supported for tables that use the following external catalog options:
AWS Glue
REST catalog that complies with the Apache Iceberg REST OpenAPI specification
Snowflake Open Catalog
Set the refresh interval on a catalog integration¶
When you run the CREATE CATALOG INTEGRATION command,
you can specify a value for the REFRESH_INTERVAL_SECONDS
parameter. Otherwise, the default refresh interval
is 30 seconds. Snowflake only polls the external catalog if there are Iceberg tables defined with the catalog integration.
The following example creates a catalog integration for AWS Glue, specifying a refresh interval of 60 seconds:
CREATE CATALOG INTEGRATION auto_refresh_catalog_integration
CATALOG_SOURCE = GLUE
CATALOG_NAMESPACE = 'my_catalog_namespace'
TABLE_FORMAT = ICEBERG
GLUE_AWS_ROLE_ARN = 'arn:aws:iam::123456789123:role/my-catalog-role'
GLUE_CATALOG_ID = '123456789123'
ENABLED = TRUE
REFRESH_INTERVAL_SECONDS = 60;
To update the refresh interval for a catalog integration, use the ALTER CATALOG INTEGRATION command.
For example:
ALTER CATALOG INTEGRATION auto_refresh_catalog_integration SET REFRESH_INTERVAL_SECONDS = 120;
Create an Iceberg table with automated refresh¶
Create an Iceberg table by using the CREATE ICEBERG TABLE command. To specify that the
table should use automated metadata refreshes, set AUTO_REFRESH = TRUE
.
The following example creates an Iceberg table that uses AWS Glue as the catalog, specifying
the catalog integration created previously (auto_refresh_catalog_integration
)
and the CATALOG_TABLE_NAME from AWS Glue.
CREATE OR REPLACE ICEBERG TABLE auto_refresh_iceberg_table
CATALOG_TABLE_NAME = 'myGlueTable'
CATALOG = 'auto_refresh_catalog_integration'
AUTO_REFRESH = TRUE;
Enable or turn off automated refresh¶
Note
If the table uses a catalog integration created before Snowflake version 8.22, you must use the ALTER CATALOG INTEGRATION command to set the
REFRESH_INTERVAL_SECONDS
parameter before you enable automated refresh on the table.Frequently toggling automated refresh on and off for an Iceberg table can slow metadata refreshes for the table.
Use the ALTER ICEBERG TABLE command to enable or turn off automated refresh for an existing Iceberg table.
For example:
ALTER ICEBERG TABLE my_iceberg_table SET AUTO_REFRESH = FALSE;
Retrieve the automated refresh status¶
To retrieve the automated refresh status for a table, call the SYSTEM$AUTO_REFRESH_STATUS function.
SELECT SYSTEM$AUTO_REFRESH_STATUS('my_iceberg_table');
The function returns details about the pipe that Snowflake uses to automate refreshes for the table, such as the execution state
and size of the snapshot queue.
An execution state of RUNNING
indicates that automated refresh is running as expected.
For more information, see SYSTEM$AUTO_REFRESH_STATUS.
Error recovery¶
When an error occurs during the automated refresh process, Snowflake updates the execution state to one of the following values:
STALLED
means that Snowflake is attempting to recover from the error. If recovery succeeds, the automated refresh process continues running as expected and the execution state transitions back to the healthyRUNNING
state.STOPPED
means the automated refresh process encountered an unrecoverable error, and automated refreshes for the table have been stopped.An unrecoverable error might occur, for example, when Snowflake can’t establish a direct lineage between the target snapshot and the current snapshot.
To recover from a
STOPPED
state, take the following actions:Turn off automated refresh on the table.
Perform a manual metadata refresh. For instructions, see Refresh the table metadata.
Re-enable automated refresh using an ALTER ICEBERG TABLE … SET AUTO_REFRESH statement.
Verify that automated refresh is in the
RUNNING
state by calling the SYSTEM$AUTO_REFRESH_STATUS function. You can also call the function multiple times to confirm that the number of queued snapshots (pendingSnapshotCount
) gradually decreases.
Billing¶
Snowflake uses Snowpipe to automate refreshes for Iceberg tables, so charges for automated refresh appear in the same line item on your bill as Snowpipe charges. There are no Snowpipe file charges for this feature.
You can estimate charges incurred by examining the Account Usage PIPE_USAGE_HISTORY view. Automated refresh pipes are listed under a NULL pipe name.
For more information about Iceberg table charges, see Iceberg table billing.
Considerations and limitations¶
Consider the following when you work with Iceberg tables that use automated refresh:
When automated refresh is enabled, you can’t manually refresh the table metadata. To perform a manual refresh, turn off automated refresh first.
For catalog integrations created before Snowflake version 8.22, you must manually set the
REFRESH_INTERVAL_SECONDS
parameter before you enable automated refresh on tables that depend on that catalog integration. For instructions, see ALTER CATALOG INTEGRATION … SET AUTO_REFRESH.Ensure that the new table snapshot is a direct child of the current table snapshot. Otherwise, automated refresh enters the
STOPPED
state. To recover automated refresh when this occurs, see Error recovery.If your table is empty, perform a manual refresh before you enable automated refresh to avoid undefined behavior.
Automated refresh isn’t supported when you use a catalog integration for object storage.
Iceberg version 1 manifests without a sequence number column aren’t currently supported.