Automatically refresh Apache Iceberg™ tables

Configure automated metadata refreshes for new or existing externally managed Apache Iceberg™ tables. With automated refreshes, Snowflake polls your external Iceberg catalog in a continuous and serverless fashion to synchronize the metadata with the most recent remote changes.

Automatic refresh is supported for tables that use the following external catalog options:

  • AWS Glue

  • REST catalog that complies with the Apache Iceberg REST OpenAPI specification

  • Polaris Catalog

Set the refresh interval on a catalog integration

When you run the CREATE CATALOG INTEGRATION command, you can specify a value for the REFRESH_INTERVAL_SECONDS parameter. Otherwise, the default refresh interval is 30 seconds. Snowflake only polls the external catalog if there are Iceberg tables defined with the catalog integration.

The following example creates a catalog integration for AWS Glue, specifying a refresh interval of 60 seconds:

CREATE CATALOG INTEGRATION auto_refresh_catalog_integration
  CATALOG_SOURCE = GLUE
  CATALOG_NAMESPACE = 'my_catalog_namespace'
  TABLE_FORMAT = ICEBERG
  GLUE_AWS_ROLE_ARN = 'arn:aws:iam::123456789123:role/my-catalog-role'
  GLUE_CATALOG_ID = '123456789123'
  ENABLED = TRUE
  REFRESH_INTERVAL_SECONDS = 60;
Copy

To update the refresh interval for a catalog integration, use the ALTER CATALOG INTEGRATION command.

For example:

ALTER CATALOG INTEGRATION auto_refresh_catalog_integration SET REFRESH_INTERVAL_SECONDS = 120;
Copy

Create an Iceberg table with automated refresh

Create an Iceberg table by using the CREATE ICEBERG TABLE command. To specify that the table should use automated metadata refreshes, set AUTO_REFRESH = TRUE.

The following example creates an Iceberg table that uses AWS Glue as the catalog, specifying the catalog integration created previously (auto_refresh_catalog_integration) and the CATALOG_TABLE_NAME from AWS Glue.

CREATE OR REPLACE ICEBERG TABLE auto_refresh_iceberg_table
  CATALOG_TABLE_NAME = 'myGlueTable'
  CATALOG = 'auto_refresh_catalog_integration'
  AUTO_REFRESH = TRUE;
Copy

Enable or turn off automated refresh

Note

  • If the table uses a catalog integration created before Snowflake version 8.22, you must use the ALTER CATALOG INTEGRATION command to set the REFRESH_INTERVAL_SECONDS parameter before you enable automated refresh on the table.

  • Frequently toggling automated refresh on and off for an Iceberg table can slow metadata refreshes for the table.

Use the ALTER ICEBERG TABLE command to enable or turn off automated refresh for an existing Iceberg table.

For example:

ALTER ICEBERG TABLE my_iceberg_table SET AUTO_REFRESH = FALSE;
Copy

Retrieve the automated refresh status

To retrieve the automated refresh status for a table, call the SYSTEM$AUTO_REFRESH_STATUS function.

SELECT SYSTEM$AUTO_REFRESH_STATUS('my_iceberg_table');
Copy

The function returns details about the pipe that Snowflake uses to automate refreshes for the table, such as the execution state and size of the snapshot queue. An execution state of RUNNING indicates that automated refresh is running as expected. For more information, see SYSTEM$AUTO_REFRESH_STATUS.

Error recovery

When an error occurs during the automated refresh process, Snowflake updates the execution state to one of the following values:

  • STALLED means that Snowflake is attempting to recover from the error. If recovery succeeds, the automated refresh process continues running as expected and the execution state transitions back to the healthy RUNNING state.

  • STOPPED means the automated refresh process encountered an unrecoverable error, and automated refreshes for the table have been stopped.

    An unrecoverable error might occur, for example, when Snowflake can’t establish a direct lineage between the target snapshot and the current snapshot.

    To recover from a STOPPED state, take the following actions:

    1. Turn off automated refresh on the table.

    2. Perform a manual metadata refresh. For instructions, see Refresh the table metadata.

    3. Re-enable automated refresh using an ALTER ICEBERG TABLE … SET AUTO_REFRESH statement.

    4. Verify that automated refresh is in the RUNNING state by calling the SYSTEM$AUTO_REFRESH_STATUS function. You can also call the function multiple times to confirm that the number of queued snapshots (pendingSnapshotCount) gradually decreases.

Billing

Snowflake uses Snowpipe to automate refreshes for Iceberg tables, so charges for automated refresh appear in the same line item on your bill as Snowpipe charges. There are no Snowpipe file charges for this feature.

You can estimate charges incurred by examining the Account Usage PIPE_USAGE_HISTORY view. Automated refresh pipes are listed under a NULL pipe name.

For more information about Iceberg table charges, see Iceberg table billing.

Considerations and limitations

Consider the following when you work with Iceberg tables that use automated refresh:

  • When automated refresh is enabled, you can’t manually refresh the table metadata. To perform a manual refresh, turn off automated refresh first.

  • For catalog integrations created before Snowflake version 8.22, you must manually set the REFRESH_INTERVAL_SECONDS parameter before you enable automated refresh on tables that depend on that catalog integration. For instructions, see ALTER CATALOG INTEGRATION … SET AUTO_REFRESH.

  • Ensure that the new table snapshot is a direct child of the current table snapshot. Otherwise, automated refresh enters the STOPPED state. To recover automated refresh when this occurs, see Error recovery.

  • Automated refresh isn’t supported when you use a catalog integration for object storage.

  • Automated refresh does not currently support Azure Data Lake Storage Gen2.

  • Iceberg version 1 manifests without a sequence number column aren’t currently supported.