Understanding dynamic table refresh

Dynamic table content is based on the results of a specific query. When the underlying data on which the dynamic table is based on changes, the table is updated to reflect those changes. These updates are referred to as a refresh. This process is automated, and it involves analyzing the query that underlies the table.

Dynamic table refresh timeouts are determined by the STATEMENT_TIMEOUT_IN_SECONDS parameter, defining the maximum duration on the account or the warehouse before it is automatically canceled.

The following sections explain dynamic table refresh in more detail:

Dynamic table refresh modes

The dynamic table refresh process operates in one of two ways:

  1. Incremental refresh: This automated process analyzes the dynamic table’s query and calculates changes since the last refresh. It then merges these changes into the table. See Supported queries in incremental refresh for details on supported queries.

  2. Full refresh: When the automated process can’t perform an incremental refresh, it conducts a full refresh. This involves executing the query for the dynamic table and completely replacing the previous materialized results.

The constructs used in the query determine whether an incremental refresh can be used. After you create a dynamic table, you can monitor the table to determine whether incremental or full refreshes are used to update that table.

Understanding target lag

Dynamic table refresh is triggered based on how out of date the data might be, or what is commonly referred to as target lag. When you set the target lag for a dynamic table, it’s measured relative to the base tables at the root of the graph, not the dynamic tables directly upstream. So, consider the time needed to refresh each dynamic table in a chain to the root. If you don’t, some refreshes might be skipped, leading to a higher actual lag.

To see the graph of tables connected to your dynamic table, see Use Snowsight to examine the graph of dynamic tables.

Target lag is specified in one of following ways:

  1. Measure of freshness: Defines the maximum amount of time that the dynamic table’s content should lag behind updates to the base tables.

    The following example sets the product dynamic table to refresh and maintain freshness every hour:

    ALTER DYNAMIC TABLE product SET TARGET_LAG = '1 hour';
    
    Copy
  2. Downstream: Specifies that the dynamic table should be refreshed on demand when other dynamic tables that depend on it need to refresh. Updates are inferred from upstream database objects. Downstream dynamic tables are only updated when required by upstream consumers.

    In the following example, product is based on other dynamic tables and is set to refresh based on the target lag of its downstream dynamic tables:

    ALTER DYNAMIC TABLE product SET TARGET_LAG = DOWNSTREAM;
    
    Copy

Target lag is inversely proportional to the dynamic table’s refresh frequency: frequent refreshes imply a lower lag.

Consider the following example where Dynamic Table 2 (DT2) is defined based on Dynamic Table 1 (DT1). DT2 must read from DT1 to materialize its contents. In addition, a report consumes DT2 data via a query.

Simple example of two dynamic tables, DT2 which is defined based on DT1.

The following results are possible, depending on how each dynamic table specifies its lag:

Dynamic Table 1 (DT1)

Dynamic Table 2 (DT2)

Refresh results

TARGET_LAG = DOWNSTREAM

TARGET_LAG = 10minutes

DT2 is updated at most every 10 minutes. DT1 infers its lag from DT2 and is updated every time DT2 requires updates.

TARGET_LAG = 10minutes

TARGET_LAG = DOWNSTREAM

This scenario should be avoided. The report query will not receive any data. DT 1 is frequently refreshed and DT2 is not refreshed as there’s no dynamic table that’s based on DT2.

TARGET_LAG = 5minutes

TARGET_LAG = 10minutes

DT2 is updated approximately every 10 minutes with data from DT1 that is at most 5 minutes old.

TARGET_LAG = DOWNSTREAM

TARGET_LAG = DOWNSTREAM

DT2 is not refreshed periodically because DT1 has no downstream children with a defined lag.

Supported queries in incremental refresh

The following table describes the expressions, keywords, and clauses that currently support incremental refresh. For a list of queries that do not support incremental refresh, see Limitations on support for incremental refresh.

Keyword/Clause

Support for Incremental Refreshes

WITH

Common table expressions (CTE) that use incremental refresh supported features in the subquery.

Expressions in SELECT

Expressions including those using deterministic built-in functions and immutable user-defined functions.

FROM

Source tables, views, and other dynamic tables. Subqueries outside of FROM clauses (for example, WHERE EXISTS) are not supported.

OVER

All window functions.

WHERE/HAVING/QUALIFY

Filters with the same expressions that are valid in SELECT.

JOIN (and other expressions for joining tables)

Supported join types for incremental refresh include inner joins, outer-equi joins, and cross joins. You can specify any number of tables in the join, and updates to all tables in the join are reflected in the results of the query.

UNION ALL

Dynamic tables support UNION ALL.

GROUP BY

Dynamic tables support GROUP BY.

Important

If a query uses expressions that are not supported for incremental refresh, the automated refresh process uses a full refresh, which may incur an additional cost. To determine which refresh mode is used, see Determine whether an incremental or full refresh is used.

Replacing an IMMUTABLE user-defined function (UDF) while it’s in use by a dynamic table that uses incremental refresh results in undefined behavior in that table. VOLATILE UDFs are not supported with incremental refresh.

How operators incrementally refresh

The following table outlines how each operator is incrementalized (that is, how it’s transformed into a new query fragment that generates changes instead of full results) and its performance and other important factors to consider.

Operator

Incrementalization

Considerations

SELECT <scalar expressions>

Incrementalized by applying expressions to changed rows.

Performs well, no special considerations.

WHERE <scalar expressions>

Incrementalized by evaluating the predicate on each changed row, and including only those for which the predicate is true.

Generally performs well. Cost scales linearly with size of changes.

Highly selective WHERE expressions may require warehouse uptime whenever the sources change, even if the resulting dynamic tables don’t change. A warehouse may be required to determine which changes satisfy the predicate.

FROM <base table>

Incrementalized by scanning micro-partitions that were added to or removed from the table since the last refresh.

Cost scales linearly with the volume of data in the added or removed micro-partitions.

Recommendations:

  • Limit change volume per refresh to about 5% of source table.

  • Be cautious of DMLs that affect many micro-partitions.

<query> UNION ALL <query>

Incrementalized by taking union-all of changes on each side.

Performs well, no special considerations.

WITH <CTE list> <query>

Incrementalized by computing the changes of each common table expression.

WITH makes complex queries easier to read. Be cautious of making the definition of a single dynamic table too complex. For more information, see Chain together pipelines of dynamic tables and Optimizing incremental refresh mode performance for complex dynamic tables.

Scalar Aggregates

Scalar aggregates are currently not incrementalized efficiently. When their input changes, they’re fully recomputed.

If a dynamic table contains a scalar aggregate, only set it to incremental refresh mode if that aggregate reads from a table that changes infrequently, and the rest of the query incrementalizes well.

For example, assuming that important_attrs rarely changes, the following query is a good candidate for incremental refresh:

SELECT * FROM product WHERE attr IN
  (SELECT attr FROM important_attrs)
Copy

GROUP BY <keys>

Incrementalized by recomputing aggregates for every grouping key that changed.

Ensure the source data is clustered by the grouping keys and the changes comprise a small fraction (roughly <5%) of the grouping keys.

DISTINCT

Equivalent to GROUP BY ALL with no aggregate functions.

Often represents a substantial optimization opportunity.

It’s a common practice to apply DISTINCT liberally throughout queries to avoid accidentally introducing duplicates. In incremental refresh, DISTINCT operations consume resources on a recurring basis because duplicates have to be checked during every refresh.

When optimizing performance, finding and removing redundant DISTINCTs can be an easy win. You can do this by eliminating duplicates further upstream and considering join cardinalities carefully.

<fn> OVER <window>

Incrementalized by recomputing the window function for every partition key that changed.

Ensure there’s a PARTITION BY clause in your query and the source data is clustered by partition keys. Also ensure the changes comprise a small fraction (roughly <5%) of the partitions.

<left> INNER JOIN <right>

Incrementalized by joining the changes on the left side with the right, then joining the changes on the right side with the left.

If one of the sides of the join is small, performance is likely unimpacted. If one of the sides of the join changes frequently, clustering the other side by the join key might improve performance.

<left> [{LEFT | RIGHT | FULL }] OUTER JOIN <right>

Incrementalized by factoring into an inner-join union-all-ed with one or two NOT EXISTS to compute NULLs for non-matches. This factored query is then incrementalized.

The inner join is incrementalized as shown. The not-exists are incrementalized by checking if the changed keys on one side already existed on the other side.

Recommendations:

  • If one of the sides of the join changes frequently, clustering the other side by the join key might improve performance.

  • Put the table that changes more frequently on the left side.

  • Try to minimize changes on the side opposite the OUTER. So for LEFT OUTER, minimize changes on the right side.

  • For FULL joins, locality is very important.

Supported non-deterministic functions in full refresh

The following non-deterministic functions are supported in dynamic tables. Note that these functions are only supported for full refreshes. For a list of what is not supported for incremental refresh, see Limitations on support for incremental refresh.

How data is refreshed when dynamic tables depend on other dynamic tables

When a dynamic table lag is specified as a measure of time, the automated refresh process determines the schedule for refreshes based on the target lag times of the dynamic tables. The process chooses a schedule that best meets the target lag times of the tables.

Note

Target lag is not a guarantee. Instead, it is a target that Snowflake attempts to meet. Data in dynamic tables is refreshed as closely as possible within the target lag. However, target lag may be exceeded due to factors such as warehouse size, data size, query complexity, and similar factors.

In order to keep data consistent in cases when one dynamic table depends on another, the process refreshes all dynamic tables in an account at compatible times. The timing of less frequent refreshes coincides with the timing of more frequent refreshes.

For example, suppose that dynamic table A has a target lag of two minutes and queries dynamic table B that has a target lag of one minute. The process might determine that A should be refreshed every 96 seconds, and B every 48 seconds. As a result, the process might apply the following schedule:

Specific Point in Time

Dynamic Tables Refreshed

2022-12-01 00:00:00

A, B

2022-12-01 00:00:48

B

2022-12-01 00:01:36

A, B

2022-12-01 00:02:24

B

This means that at any given time, when you query a set of dynamic tables that depend on each other, you are querying the same “snapshot” of the data across these tables.

Note that the target lag of a dynamic table cannot be shorter than the target lag of the dynamic tables it depends on. For example, suppose that:

  • Dynamic table A queries the dynamic tables B and C.

  • Dynamic table B has a target lag of five minutes.

  • Dynamic table C has a target lag of one minute.

This means that the target lag time for A must not be shorter than five minutes (that is, not shorter than the longer of the lag times for B and C).

If you set the lag for A to five minutes, the process sets up a refresh schedule with these goals:

  • Refresh C often enough to keep its lag below one minute.

  • Refresh A and B together and often enough to keep their lags below five minutes.

  • Ensure that the refresh for A and B coincides with a refresh of C to ensure snapshot isolation.

Note: If refreshes take too long, the scheduler may skip refreshes to try to stay up to date. However, snapshot isolation is preserved.