Understanding dynamic table refresh¶
Dynamic table content is based on the results of a specific query. When the underlying data on which the dynamic table is based on changes, the table is updated to reflect those changes. These updates are referred to as a refresh. This process is automated, and it involves analyzing the query that underlies the table.
Dynamic table refresh timeouts are determined by the STATEMENT_TIMEOUT_IN_SECONDS parameter, defining the maximum duration on the account or the warehouse before it is automatically canceled.
The following sections explain dynamic table refresh in more detail:
Dynamic table refresh modes¶
Dynamic tables support two refresh modes: incremental and full.
Incremental refresh: This mode analyzes the dynamic table’s query and calculates changes since the last refresh. It then merges these changes into the table. For information about supported queries, see Supported queries in incremental refresh.
Full refresh: This mode involves executing the query for the dynamic table and completely replacing the previous materialized results.
You can either set the refresh mode explicitly or set it to AUTO
. With AUTO
, Snowflake intelligently
chooses the most cost- and time-effective refresh mode based on factors like query complexity, supported
constructs, operators, functions, and expected performance. If incremental refresh isn’t supported or
is likely to perform poorly, Snowflake automatically selects full refresh instead. For more information,
see Limitations on incremental refresh.
We strongly recommend using AUTO
for most use cases, because it allows Snowflake to optimize refresh
behavior without manual tuning.
After you create a dynamic table, you can monitor the table to determine whether incremental or full refreshes are used to update that table.
View dynamic table refresh mode¶
The query in the definition of a dynamic table determines its refresh mode. After creation, you can check if your dynamic table uses incremental or full refreshes.
Using a role with the necessary privileges, you can verify the refresh mode using one of the following methods:
Execute the SHOW DYNAMIC TABLES command:
SHOW DYNAMIC TABLES;
In the output:
The
text
column shows the user-specified refresh mode.The
refresh_mode
column shows the actual refresh mode.The
refresh_mode_reason
shows why the actual refresh mode was chosen.
In the navigation menu, select Monitoring » Dynamic Tables, and then select your dynamic table.
You can view the refresh mode for the dynamic table in the object header at the top of the page. For full refreshes, the refresh mode reason is visible when hovering over the mode.
Supported queries in incremental refresh¶
The following table describes the expressions, keywords, and clauses that currently support incremental refresh. For a list of queries that do not support incremental refresh, see Limitations on support for incremental refresh.
Keyword/Clause |
Support for Incremental Refreshes |
---|---|
WITH |
Common table expressions (CTE) that use incremental refresh supported features in the subquery. |
Expressions in SELECT |
Expressions including those using deterministic built-in functions and immutable user-defined functions. |
FROM |
Source tables, views, Snowflake-managed Apache Iceberg™ tables, and other dynamic tables. Subqueries outside of FROM clauses (for example, WHERE EXISTS) are not supported. |
OVER |
All window functions. |
WHERE/HAVING/QUALIFY |
Filters with the same expressions that are valid in SELECT. |
JOIN (and other expressions for joining tables) |
Supported join types for incremental refresh include inner joins, outer-equi joins, cross joins, and lateral flatten (only the non-static FLATTEN table function). You can specify any number of tables in the join, and updates to all tables in the join are reflected in the results of the query. Selecting the flatten SEQ column from a lateral flatten join is not supported for incremental refresh. |
UNION ALL |
Supported with incremental refresh mode. |
GROUP BY |
Supported with incremental refresh mode. |
Important
If a query uses expressions that are not supported for incremental refresh, the automated refresh process uses a full refresh, which may incur an additional cost. To determine which refresh mode is used, see Determine whether an incremental or full refresh is used.
Replacing an IMMUTABLE user-defined function (UDF) while it’s in use by a dynamic table that uses incremental refresh results in failed refreshes. VOLATILE UDFs are not supported with incremental refresh.
Currently, lateral joins are not supported with incremental refresh. However, you can use
LATERAL with FLATTEN() by setting the refresh mode to INCREMENTAL
.
For example:
CREATE TABLE persons
AS
SELECT column1 AS id, parse_json(column2) AS entity
FROM values
(12712555,
'{ name: { first: "John", last: "Smith"},
contact: [
{ business:[
{ type: "phone", content:"555-1234" },
{ type: "email", content:"j.smith@example.com" } ] } ] }'),
(98127771,
'{ name: { first: "Jane", last: "Doe"},
contact: [
{ business:[
{ type: "phone", content:"555-1236" },
{ type: "email", content:"j.doe@example.com" } ] } ] }') v;
CREATE DYNAMIC TABLE example
TARGET_LAG = DOWNSTREAM
WAREHOUSE = mywh
REFRESH_MODE = INCREMENTAL
AS
SELECT p.id, f.value, f.path
FROM persons p,
LATERAL FLATTEN(input => p.entity) f;
Note
Selecting the flatten SEQ column from a lateral flatten join is not supported for incremental refresh.
How operators incrementally refresh¶
The following table outlines how each operator is incrementalized (that is, how it’s transformed into a new query fragment that generates changes instead of full results) and its performance and other important factors to consider.
Operator |
Incrementalization |
Considerations |
---|---|---|
SELECT <scalar expressions> |
Incrementalized by applying expressions to changed rows. |
Performs well, no special considerations. |
WHERE <scalar expressions> |
Incrementalized by evaluating the predicate on each changed row, and including only those for which the predicate is true. |
Generally performs well. Cost scales linearly with size of changes. Refreshing a dynamic table with a highly selective WHERE expression may require warehouse uptime, even if the resulting dynamic table doesn’t change. This is because a warehouse may be required to determine which changes in the sources satisfy the predicate. |
FROM <base table> |
Incrementalized by scanning micro-partitions that were added to or removed from the table since the last refresh. |
Cost scales linearly with the volume of data in the added or removed micro-partitions. Recommendations:
|
<query> UNION ALL <query> |
Incrementalized by taking union-all of changes on each side. |
Performs well, no special considerations. |
WITH <CTE list> <query> |
Incrementalized by computing the changes of each common table expression. |
WITH makes complex queries easier to read but be cautious of making the definition of a single dynamic table too complex. For more information, see Chain together pipelines of dynamic tables and Optimizing incremental refresh mode performance for complex dynamic tables. |
Scalar Aggregates |
Scalar aggregates are currently not incrementalized efficiently. When their input changes, they’re fully recomputed. |
|
GROUP BY <keys> |
Incrementalized by recomputing aggregates for every grouping key that changed. |
Ensure the source data is clustered by the grouping keys and the changes comprise a small fraction (roughly <5%) of the grouping keys. If the grouping key contains a compound expression rather than a base column, incremental refreshes might have to scan a large amount of data. To reduce the size of these scans, materialize the expression in one dynamic table, and then apply the grouping operation on the materialized column in another dynamic table. For instance, take the following compound statement: CREATE DYNAMIC TABLE sums
AS
SELECT date_trunc(minute, ts), sum(c1) FROM table
GROUP BY 1;
The above statement can be optimized as follows: CREATE DYNAMIC TABLE intermediate
AS
SELECT date_trunc(minute, ts) ts_min, c1 FROM table;
CREATE DYNAMIC TABLE sums
AS
SELECT ts_min, sum(c1) FROM intermediate
GROUP BY 1;
|
DISTINCT |
Equivalent to GROUP BY ALL with no aggregate functions. |
Often represents a substantial optimization opportunity. It’s a common practice to apply DISTINCT liberally throughout queries to avoid accidentally introducing duplicates. In incremental refresh, DISTINCT operations consume resources on a recurring basis because duplicates have to be checked during every refresh. When optimizing performance, finding and removing redundant DISTINCTs can be an easy win. You can do this by eliminating duplicates further upstream and considering join cardinalities carefully. |
<fn> OVER <window> |
Incrementalized by recomputing the window function for every partition key that changed. |
Ensure there’s a PARTITION BY clause in your query and the source data is clustered by partition keys. Also ensure the changes comprise a small fraction (roughly <5%) of the partitions. |
<left> INNER JOIN <right> |
Incrementalized by joining the changes on the left side with the right, then joining the changes on the right side with the left. |
If one of the sides of the join is small, performance is likely good. If one of the sides of the join changes frequently, clustering the other side by the join key might improve performance. |
<left> [{LEFT | RIGHT | FULL }] OUTER JOIN <right> |
Incrementalized by factoring into an inner-join union-all-ed with one or two NOT EXISTS to compute NULLs for non-matches. This factored query is then incrementalized. The inner join is incrementalized as shown. The not-exists are incrementalized by checking if the changed keys on one side already existed on the other side. |
Recommendations:
|
Supported non-deterministic functions in full refresh¶
The following non-deterministic functions are supported in dynamic tables. Note that these functions are only supported for full refreshes. For a list of what is not supported for incremental refresh, see Limitations on support for incremental refresh.
VOLATILE user-defined functions
Sequence functions (e.g.,
SEQ1
,SEQ2
)The following context functions:
The following date and time functions (with their respective aliases):
The following Cortex functions:
How data is refreshed when dynamic tables depend on other dynamic tables¶
When a dynamic table’s lag is set as a time measure, the automated refresh process schedules refreshes to best meet the target lag times.
Note
Target lag is not a guarantee. Instead, it is a target that Snowflake attempts to meet. Data in dynamic tables is refreshed as closely as possible within the target lag. However, target lag may be exceeded due to factors such as warehouse size, data size, query complexity, and similar factors.
In order to keep data consistent in cases when one dynamic table depends on another, the process refreshes all dynamic tables in an account at compatible times. The timing of less frequent refreshes coincides with the timing of more frequent refreshes. If refreshes take too long, the scheduler may skip refreshes to try to stay up to date. However, snapshot isolation is preserved.
For example, suppose that dynamic table A has a target lag of two minutes and queries dynamic table B that has a target lag of one minute. The process might determine that A should be refreshed every 96 seconds, and B every 48 seconds. As a result, the process might apply the following schedule:
Specific Point in Time |
Dynamic Tables Refreshed |
---|---|
2022-12-01 00:00:00 |
A, B |
2022-12-01 00:00:48 |
B |
2022-12-01 00:01:36 |
A, B |
2022-12-01 00:02:24 |
B |
This means that at any given time, when you query a set of dynamic tables that depend on each other, you are querying the same “snapshot” of the data across these tables.
Note that the target lag of a dynamic table cannot be shorter than the target lag of the dynamic tables it depends on. For example, suppose that:
Dynamic table A queries the dynamic tables B and C.
Dynamic table B has a target lag of five minutes.
Dynamic table C has a target lag of one minute.
This means that the target lag time for A must not be shorter than five minutes (that is, not shorter than the longer of the lag times for B and C).
If you set the lag for A to five minutes, the process sets up a refresh schedule with these goals:
Refresh C often enough to keep its lag below one minute.
Refresh A and B together and often enough to keep their lags below five minutes.
Ensure that the refresh for A and B coincides with a refresh of C to ensure snapshot isolation.
Important
Dynamic tables in incremental refresh mode can’t be downstream from dynamic tables with full refresh mode. This is because incremental refresh mode is incompatible with the complete row changes that occur during each refresh of an upstream full refresh table.