Automatically refresh Apache Iceberg™ tables¶
Configure automated metadata refreshes for new or existing externally managed Apache Iceberg™ tables. With automated refreshes, Snowflake polls your external Iceberg catalog in a continuous and serverless fashion to synchronize the metadata with the most recent remote changes.
Automated refresh for Iceberg tables works differently from automated refresh for directory tables or external tables because it doesn’t rely on cloud provider notifications. Instead, you configure the feature according to the following steps:
Set a refresh interval on a catalog integration. Snowflake supports automated refresh for the following external Iceberg catalog options:
REST catalog that complies with the Apache Iceberg REST OpenAPI specification
Snowflake Open Catalog
Object storage (Delta Lake only)
AWS Glue
Create one or more Iceberg tables that use the catalog integration.
Control automated refresh for each table with the AUTO_REFRESH parameter.
This approach lets you centrally manage refresh settings through the catalog integration while you control individual tables as needed.
Set the refresh interval on a catalog integration¶
When you run the CREATE CATALOG INTEGRATION command,
you can specify a value for the REFRESH_INTERVAL_SECONDS
parameter. Otherwise, the default refresh interval
is 30 seconds. Snowflake only polls the external catalog if there are Iceberg tables defined with the catalog integration.
The following example creates a catalog integration for AWS Glue, specifying a refresh interval of 60 seconds:
CREATE CATALOG INTEGRATION auto_refresh_catalog_integration
CATALOG_SOURCE = GLUE
CATALOG_NAMESPACE = 'my_catalog_namespace'
TABLE_FORMAT = ICEBERG
GLUE_AWS_ROLE_ARN = 'arn:aws:iam::123456789123:role/my-catalog-role'
GLUE_CATALOG_ID = '123456789123'
ENABLED = TRUE
REFRESH_INTERVAL_SECONDS = 60;
To update the refresh interval for a catalog integration, use the ALTER CATALOG INTEGRATION command.
For example:
ALTER CATALOG INTEGRATION auto_refresh_catalog_integration SET REFRESH_INTERVAL_SECONDS = 120;
Create an Iceberg table with automated refresh¶
Create an Iceberg table by using the CREATE ICEBERG TABLE command. To specify that the
table should use automated metadata refreshes, set AUTO_REFRESH = TRUE
.
The following example creates an Iceberg table that uses AWS Glue as the catalog, specifying
the catalog integration created previously (auto_refresh_catalog_integration
)
and the CATALOG_TABLE_NAME from AWS Glue.
CREATE OR REPLACE ICEBERG TABLE auto_refresh_iceberg_table
CATALOG_TABLE_NAME = 'myGlueTable'
CATALOG = 'auto_refresh_catalog_integration'
AUTO_REFRESH = TRUE;
Enable or turn off automated refresh¶
Note
If the table uses a catalog integration created before Snowflake version 8.22, you must use the ALTER CATALOG INTEGRATION command to set the
REFRESH_INTERVAL_SECONDS
parameter before you enable automated refresh on the table.Frequently toggling automated refresh on and off for an Iceberg table can slow metadata refreshes for the table.
Use the ALTER ICEBERG TABLE command to enable or turn off automated refresh for an existing Iceberg table.
For example:
ALTER ICEBERG TABLE my_iceberg_table SET AUTO_REFRESH = FALSE;
Monitoring automated refresh status¶
SHOW ICEBERG TABLES¶
To get the automated refresh status for multiple tables, use the SHOW ICEBERG TABLES command.
SHOW ICEBERG TABLES;
The command output includes a column named auto_refresh_status
, which displays the same information as
the SYSTEM$AUTO_REFRESH_STATUS function for each table that you have access privileges on.
SYSTEM$AUTO_REFRESH_STATUS¶
To retrieve the automated refresh status for a specific table, call the SYSTEM$AUTO_REFRESH_STATUS function.
SELECT SYSTEM$AUTO_REFRESH_STATUS('my_iceberg_table');
The function returns details about the pipe that Snowflake uses to automate refreshes for the table, such as the execution state
and size of the snapshot queue.
An execution state of RUNNING
indicates that automated refresh is running as expected.
For more information, see SYSTEM$AUTO_REFRESH_STATUS.
ICEBERG_TABLE_SNAPSHOT_REFRESH_HISTORY¶
To retrieve metadata and snapshot information about the most recent refresh history for a specific table, use the ICEBERG_TABLE_SNAPSHOT_REFRESH_HISTORY function.
SELECT *
FROM TABLE(INFORMATION_SCHEMA.ICEBERG_TABLE_SNAPSHOT_REFRESH_HISTORY(
TABLE_NAME => 'my_iceberg_table'
));
Monitor automated refresh events¶
You can configure Snowflake to record an event that provides information about the status of automated refresh for an Iceberg table. Snowflake records the event in your active event table. For example, suppose that you have an event table associated with a database. When Snowflake automatically refreshes an Iceberg table in that database, Snowflake records an event to your event table.
Monitoring automated refresh events can help you gain insight into the following areas:
Automated refresh progress: Track how snapshots move through the automated refresh process.
Aggregated statistics: Review summarized statistics for automated refresh operations.
You can also configure alerts for the following critical conditions:
Refresh errors
High refresh latencies
Note
Logging events for automated refresh incurs costs. For more information, see Costs of telemetry data collection.
Snowflake records an event when automated refresh starts, completes, or results in error.
Set the severity level to capture events¶
To capture automated refresh events, you must set the LOG_LEVEL parameter at the Iceberg table level or account level. The LOG_LEVEL determines which events to capture based on the following values:
ERROR: Events that signal a change requiring human intervention to resolve.
WARN: Events that signal an issue that can be resolved without human intervention.
DEBUG: High-volume events.
Note
There is no default severity level. To capture events, you must set the severity level at either the account level or Iceberg table level.
For example, to capture DEBUG-level automated refresh events for a specific Iceberg table, use the following command:
ALTER ICEBERG TABLE <my_table_name> SET LOG_LEVEL = DEBUG;
For more information, see Setting levels for logging, metrics, and tracing.
Query your event table for automated refresh events¶
Before you can query for automated refresh events, you must set up an event table and set the severity level for event capture.
The following example shows how to retrieve Iceberg automated refresh events that are generated during snapshot processing:
SELECT record_type,
record:"name" event_name,
record:"severity_text" log_level,
resource_attributes:"snow.database.name" database_name,
resource_attributes:"snow.schema.name" schema_name,
resource_attributes:"snow.table.name" table_name,
resource_attributes:"snow.catalog.integration.name" catalog_integration_name,
record_attributes:"snow.snapshot.id" snapshot_id,
parse_json(value):metadata_file_location metadata_file_location,
parse_json(value):snapshot_state snapshot_state
FROM my_active_event_table
WHERE RECORD_TYPE='event' AND EVENT_NAME='iceberg_auto_refresh_snapshot_lifecycle';
Output:
+-------------+-----------------------------------------+-----------+---------------+-------------+------------+--------------------------+---------------+------------------------+----------------+
| RECORD_TYPE | EVENT_NAME | LOG_LEVEL | DATABASE_NAME | SCHEMA_NAME | TABLE_NAME | CATALOG_INTEGRATION_NAME | SNAPSHOT_ID | METADATA_FILE_LOCATION | SNAPSHOT_STATE |
+-------------+-----------------------------------------+-----------+---------------+-------------+------------+--------------------------+---------------+------------------------+----------------+
| EVENT | iceberg_auto_refresh_snapshot_lifecycle | DEBUG | TESTDB | TESTSH | TESTTABLE | glue_integration | 4281775564368 | metadata.json | started |
| EVENT | iceberg_auto_refresh_snapshot_lifecycle | DEBUG | TESTDB | TESTSH | TESTTABLE | glue_integration | 4281775564368 | metadata.json | completed |
+-------------+-----------------------------------------+-----------+---------------+-------------+------------+--------------------------+---------------+------------------------+----------------+
Error recovery¶
When an error occurs during the automated refresh process, Snowflake updates the execution state to one of the following values:
STALLED
means that Snowflake is attempting to recover from the error. If recovery succeeds, the automated refresh process continues running as expected and the execution state transitions back to the healthyRUNNING
state.STOPPED
means the automated refresh process encountered an unrecoverable error, and automated refreshes for the table have been stopped.An unrecoverable error might occur, for example, when Snowflake can’t establish a direct lineage between the target snapshot and the current snapshot.
To recover from a
STOPPED
state, take the following actions:Turn off automated refresh on the table.
Perform a manual metadata refresh. For instructions, see Refresh the table metadata.
Re-enable automated refresh using an ALTER ICEBERG TABLE … SET AUTO_REFRESH statement.
Verify that automated refresh is in the
RUNNING
state by calling the SYSTEM$AUTO_REFRESH_STATUS function. You can also call the function multiple times to confirm that the number of queued snapshots (pendingSnapshotCount
) gradually decreases.
Billing¶
Snowflake uses Snowpipe to automate refreshes for Iceberg tables, so charges for automated refresh appear in the same line item on your bill as Snowpipe charges. Using events to monitor automated refresh also incurs cost. For more information, see Costs of telemetry data collection.
There are no Snowpipe file charges for this feature.
You can estimate charges incurred by examining the Account Usage PIPE_USAGE_HISTORY view, which displays
the Iceberg table name in the pipe_name
column.
For Delta-based Iceberg tables, automated refresh pipes display a NULL pipe name.
For more information about Iceberg table charges, see Iceberg table billing.
Considerations and limitations¶
Consider the following when you work with Iceberg tables that use automated refresh:
For catalog integrations created before Snowflake version 8.22 (or 9.2 for Delta-based tables), you must manually set the
REFRESH_INTERVAL_SECONDS
parameter before you enable automated refresh on tables that depend on that catalog integration. For instructions, see ALTER CATALOG INTEGRATION … SET AUTO_REFRESH.For catalog integrations for object storage, automated refresh is only supported for integrations with
TABLE_FORMAT = DELTA
.For tables with frequent updates, using a shorter polling interval (
REFRESH_INTERVAL_SECONDS
) can cause performance degradation.