Understanding dynamic table initialization and refresh¶

A dynamic table’s content is defined by a query and automatically updates — called a refresh — when the underlying data changes. This process analyzes the query to keep the table current.

The following sections explain dynamic table refresh in more detail:

Section	Description
Understanding dynamic table initialization	Introduces initialization, or in other words, the initial data population when you create a dynamic table. You can specify when the initial refresh occurs.
Understanding manual and scheduled refresh options	An overview of dynamic table refresh. Dynamic tables refresh on a schedule unless manually refreshed.
Dynamic table refresh modes	Dynamic tables support two refresh modes: incremental and full. You can either set the refresh mode to AUTO or set it explicitly. Learn best practices for choosing refresh mode and how to view the refresh mode used.
How data is refreshed when a dynamic table depends on other dynamic tables	Learn how dynamic tables refresh in relation to their dependencies.
Understanding the effects of changes to columns in base tables	Learn about the impact of changes to base tables.
Best practices for dynamic table refresh	A list of best practices for dynamic table refresh.

Understanding dynamic table initialization¶

When you create a dynamic table, its initial refresh takes place either synchronously at creation or at a scheduled time. The initial data population, or initialization, depends on when this initial refresh occurs.

Dynamic tables refresh based on the specified target lag, which sets the maximum allowed delay between updates to the base tables and the dynamic table’s content. If you set INITIALIZE = ON_CREATE (default), the table is initialized immediately. If you set INITIALIZE = ON_SCHEDULE, initialization happens within the specified target lag timeframe.

For example, consider a dynamic table, DT1, with a target lag of 30 minutes. The initial data population for DT1 can occur as follows:

If DT1 is set to refresh synchronously at creation (ON_CREATE), it initializes at creation.
If DT1 is set to refresh at a scheduled time (ON_SCHEDULE), it initializes within 30 minutes.

In scenarios with downstream dependencies, refresh behavior depends on the dependencies. For example, if dynamic table DT1 has a downstream target lag and DT2, which depends on DT1, has a 30-minute target lag, DT1 refreshes only when DT2 refreshes.

For DT1:

If set to refresh synchronously at creation, it initializes immediately. If initialization fails, the creation process stops, providing immediate feedback on any errors.
If set to refresh at a scheduled time, initialization depends on when DT2 refreshes.

Initialization can take some time, depending on how much data is scanned. To track progress, see Troubleshoot dynamic table creation.

Understanding manual and scheduled refresh options¶

Dynamic tables are refreshed on a schedule that’s determined by the target lag. Every time a dynamic table is read, the data freshness is within the time period defined by the target lag.

You can manually refresh your dynamic tables to get the latest data using the ALTER DYNAMIC TABLE … REFRESH command or Snowsight. For more information, see Manually refresh dynamic tables.

Dynamic table refresh timeouts are controlled by the STATEMENT_TIMEOUT_IN_SECONDS parameter, which sets the maximum allowed duration at the account or warehouse level before a refresh is automatically canceled.

How target lag affects scheduled refreshes¶

Target lag controls the frequency of scheduled refreshes. To manually manage refreshes, set your dynamic table’s target lag to DOWNSTREAM and ensure that all downstream dynamic tables are also set to DOWNSTREAM.

Setting the entire Directed Acyclic Graph (DAG)’s target lag to DOWNSTREAM essentially disables scheduled refreshes because the final dynamic table controls the refresh schedule. If no dynamic table has a time-based target lag, the pipeline is suspended for scheduled refreshes. In this case, manually refreshing the most downstream table automatically refreshes any upstream dependencies.

Setting the target lag to DOWNSTREAM doesn’t specify exact times. Instead, Snowflake picks a refresh cadence to attempt to keep the lag under the target value. For example, a dynamic table with a target lag of 4 hours might refresh every 3.5 hours.

As a workaround, you can use a Task with a CRON schedule. For example:

CREATE TASK my_task
  SCHEDULE = 'USING CRON <expr> <time_zone>'
  AS
    ALTER DYNAMIC TABLE my_dynamic_table REFRESH;

Copy

Dynamic table refresh modes¶

Dynamic tables support two refresh modes: incremental and full. You can either set the refresh mode to AUTO or set it explicitly:

AUTO refresh mode: When using the AUTO parameter, Snowflake automatically selects the most cost- and time-effective refresh mode based on query complexity, supported constructs, operators, functions, and expected performance. This decision is made only once at the time of table creation. If incremental refresh is unsupported or inefficient, Snowflake chooses full refresh instead.

For example, if a dynamic table references a view and the view’s definition changes asynchronously, the refresh mode remains unchanged. If the original decision was incremental but becomes unsupported (for example, due to an upstream view change), the refresh will fail with an error like Dynamic table can no longer be refreshed incrementally because an upstream view changed.

To change the refresh mode, recreate the dynamic table using the CREATE OR REPLACE DYNAMIC TABLE command.
Incremental refresh mode: This mode analyzes the dynamic table’s query and calculates changes since the last refresh. It then merges these changes into the table.
Full refresh mode: This mode executes the dynamic table’s query and completely replaces the previously materialized results.

After you create a dynamic table, you can monitor the table to determine whether incremental or full refreshes are used to update that table.

Best practices for choosing dynamic table refresh modes¶

The best mode for your dynamic tables’ performance depends on data change volume and query complexity. Additionally, testing different refresh modes with a dedicated warehouse helps isolate costs and improve performance tuning based on actual workloads.

AUTO refresh mode: The system attempts to apply incremental refresh by default. If it’s unsupported or inefficient, the dynamic table switches to a full refresh. You can’t modify the refresh mode after creation; to change it, recreate the dynamic table using the CREATE OR REPLACE DYNAMIC TABLE command.
- AUTO helps optimize refresh behavior without manual tuning, however, for consistent behavior, explicitly set the refresh mode on all production tables. AUTO behavior might change between Snowflake releases, which can cause unexpected changes in performance if used in production pipelines.
Incremental refresh: Updates the dynamic table with only the changes since the last refresh, making it ideal for large datasets with frequent small updates.
- Best for queries compatible with incremental refresh (for example, deterministic functions, simple joins, and basic expressions in SELECT, WHERE, and GROUP BY). If unsupported features are present, and the refresh mode is set to incremental, Snowflake will fail to create the dynamic table.
- A key practice for optimizing performance with incremental refresh is to limit change volume to about 5% of the source data and to cluster your data by the grouping keys to reduce processing overhead.
- Consider that certain combinations of operations, like aggregations atop many joins, might not run efficiently.
Full refresh: Reprocesses the entire dataset and updates the dynamic table with the complete query result. Use for complex queries or when significant data changes require a complete update.
- Useful when incremental refresh isn’t supported due to complex queries, non-deterministic functions, or major changes in the data.

To determine the best mode for your use case, experiment with automatic recommendations and the concrete refresh modes (full and incremental).

Important

Dynamic tables in incremental refresh mode can’t be downstream from dynamic tables with full refresh mode. This is because incremental refresh mode is incompatible with the complete row changes that occur during each refresh of an upstream full refresh table.

To verify the refresh mode for your dynamic tables, see View dynamic table refresh mode.

View dynamic table refresh mode¶

You set the refresh mode when you create a dynamic table. After creation, you can check if your dynamic table uses incremental or full refreshes using one of the following methods, with a role that has the necessary privileges:

Execute the SHOW DYNAMIC TABLES command:

SHOW DYNAMIC TABLES;

Copy

In the output:

The text column shows the user-specified refresh mode.
The refresh_mode column shows the actual refresh mode.
The refresh_mode_reason shows why the actual refresh mode was chosen.

How data is refreshed when a dynamic table depends on other dynamic tables¶

When a dynamic table’s lag is set as a time measure, the automated refresh process schedules refreshes to best meet the target lag times.

In order to keep data consistent in cases when one dynamic table depends on another, the process refreshes all dynamic tables in an account at compatible times. The timing of less frequent refreshes coincides with the timing of more frequent refreshes. If refreshes take too long, the scheduler may skip refreshes to try to stay up to date. However, snapshot isolation is preserved.

For example, suppose that dynamic table DT1 has a target lag of two minutes and queries dynamic table DT2, which has a target lag of one minute. The process might determine that DT1 should be refreshed every 96 seconds, and DT2 every 48 seconds. As a result, the process might apply the following schedule:

Specific Point in Time	Dynamic Tables Refreshed
2022-12-01 00:00:00	DT1, DT2
2022-12-01 00:00:48	DT2
2022-12-01 00:01:36	DT1, DT2
2022-12-01 00:02:24	DT2

This means that at any given time, when you query a set of dynamic tables that depend on each other, you are querying the same “snapshot” of the data across these tables.

Note that the target lag of a dynamic table cannot be shorter than the target lag of the dynamic tables it depends on. For example, suppose that:

DT1 queries dynamic tables DT2 and DT3.
DT2 has a target lag of five minutes.
DT3 has a target lag of one minute.

This means that the target lag time for DT1 must not be shorter than five minutes (that is, not shorter than the longer of the lag times for DT2 and DT3).

If you set the lag for DT1 to five minutes, the process sets up a refresh schedule with these goals:

Refresh DT3 often enough to keep its lag below one minute.
Refresh DT1 and DT2 together and often enough to keep their lags below five minutes.
Ensure that the refresh for DT1 and DT2 coincides with a refresh of DT3 to ensure snapshot isolation.

Important

Understanding the effects of changes to columns in base tables¶

When the underlying objects associated with a dynamic table change, the following behaviors apply:

Change	Impact
New column added to the base table. Existing unused column removed in the base table.	None. If a new column is added to the base table or an unused column is deleted, no action occurs and refreshes continue as before.
Underlying base table is recreated with identical column names and types. Underlying base table column is recreated with the same name and type.	Reinitialization: The first refresh after recreation is initialization.
Changes to underlying base table for dynamic tables created with `SELECT *` from base table.	The dynamic table fails to refresh and must be recreated to respond to the change.
Changes to underlying base table for dynamic tables created with a column definition.	No impact to the dynamic table.

Best practices for dynamic table refresh¶

Use dedicated warehouses for refreshes¶

Dynamic tables require a virtual warehouse to perform refreshes. To get a clear understanding of costs related to your dynamic table pipelines, you should test your dynamic tables using dedicated warehouses such that the virtual warehouse consumption attributed to dynamic tables can be isolated.

For more information, see Understanding cost for dynamic tables.

Use downstream lag¶

Downstream lag indicates that the dynamic table should refresh when other dependent dynamic tables require refreshing. You should use downstream lag as a best practice because of its ease of use and cost effectiveness. Without downstream lag, managing a chain of complex dynamic tables would require individually assigning each table its own target lag and managing the associated constraints, instead of only monitoring the data freshness of the final table.

For more information, see Understanding dynamic table target lag.