Best practices for dynamic tables¶
This topic provides best practices and important considerations when creating and managing dynamic tables.
General best practices:
Best practices for creating dynamic tables:
Best practices for dynamic table refresh:
Best practices for optimizing performance:
General best practices¶
Use the MONITOR privilege for viewing metadata¶
For scenarios where the user only needs to view the metadata and Information Schema of a dynamic table (for example, roles held by data scientists), use a role that has the MONITOR privilege on that dynamic table. While the OPERATE privilege grants this access, it also includes the capability to alter dynamic tables, making MONITOR the more suitable option for scenarios where a user does not need to alter a dynamic table.
For more information, see Dynamic table access control.
Simplify compound expressions in grouping keys¶
If a grouping key contains a compound expression rather than a base column, materialize the expression in one dynamic table and then apply the grouping operation on the materialized column in another dynamic table. For more information, see How operators incrementally refresh.
Use dynamic tables to implement slowly changing dimensions¶
Dynamic tables can be used to implement Type 1 and 2 slowly changing dimensions (SCDs). When reading from a change stream, use window functions over per-record keys ordered by a change timestamp. Using this method, dynamic tables seamlessly handle insertions, deletions, and updates that occur out of order to simplify SCD creation. For more information, see Slowly Changing Dimensions with Dynamic Tables.
Best practices for creating dynamic tables¶
Chain together pipelines of dynamic tables¶
When defining a new dynamic table, rather than defining a large dynamic table with many nested statements, use small dynamic tables with pipelines instead.
You can set up a dynamic table to query other dynamic tables. For instance, imagine a scenario where your data pipeline extracts data from a staging table to update various dimension tables (e.g., customer, product, date and time). Additionally, your pipeline updates an aggregate sales table based on the information from these dimension tables. By configuring the dimension tables to query the staging table and the aggregate sales table to query the dimension tables, you create a cascade effect similar to a task graph.
In this setup, the refresh for the aggregate sales table executes only after the refreshes for the dimension tables have successfully completed. This ensures data consistency and meets lag targets. Through an automated refresh process, any changes in the source tables trigger refreshes in all dependent tables at the appropriate times.
Use a “controller” dynamic table for complex task graphs¶
When you have a complex graph of dynamic tables with many roots and leaves and you want to perform operations (e.g. changing lag, manual refresh, suspension) on the full task graph with a single command, do the following:
Set the value for the
TARGET_LAG
of all of your dynamic tables toDOWNSTREAM
.Create a “controller” dynamic table that reads from all of the leaves in your task graph. To ensure this controller doesn’t consume resources, do the following:
CREATE DYNAMIC TABLE controller TARGET_LAG = <target_lag> WAREHOUSE = <warehouse> AS SELECT 1 A FROM <leaf1>, …, <leafN> LIMIT 0;
Use the controller to control the whole graph. For example:
Set a new target lag for the task graph.
ALTER DYNAMIC TABLE controller SET TARGET_LAG = <new_target_lag>Manually refresh the task graph.
ALTER DYNAMIC TABLE controller REFRESH
About cloning pipelines of dynamic tables¶
Clone all elements of the dynamic table pipeline in the same clone command to avoid reinitializations of your pipeline. You can do this by consolidating all elements of the pipeline (e.g. base tables, view, and dynamic tables) in the same schema or database. For more information, see Known limitations for dynamic tables.
Use transient dynamic tables to reduce storage cost¶
Transient dynamic tables maintain data reliably over time and support Time Travel within the data retention period, but don’t retain data beyond the fail-safe period. By default, dynamic table data is retained for 7 days in fail-safe storage. For dynamic tables with high refresh throughput, this can significantly increase storage consumption. Therefore, you should make a dynamic table transient only if its data doesn’t need the same level of data protection and recovery provided by permanent tables.
You can create a transient dynamic table or clone existing dynamic tables to transient dynamic tables using the CREATE DYNAMIC TABLE statement.
Best practices for dynamic table refresh¶
Use dedicated warehouses for refreshes¶
Dynamic tables require a virtual warehouse to perform refreshes. To get a clear understanding of costs related to your dynamic table pipelines, you should test your dynamic tables using dedicated warehouses such that the virtual warehouse consumption attributed to dynamic tables can be isolated. For more information, see Understanding cost for dynamic tables.
Use downstream lag¶
Downstream lag indicates that the dynamic table should refresh when other dependent dynamic tables require refreshing. You should use downstream lag as a best practice because of its ease of use and cost effectiveness. Without downstream lag, managing a chain of complex dynamic tables would require individually assigning each table its own target lag and managing the associated constraints, instead of only monitoring the data freshness of the final table. For more information, see Understanding target lag.
Set the refresh mode for all production dynamic tables¶
A dynamic table’s actual refresh mode is determined at
creation time and is immutable afterward. If not specified explicitly, the refresh mode defaults to AUTO
,
which selects a refresh mode based on various factors such as query complexity, or unsupported constructs,
operators, or functions.
To determine the best mode for your use case, experiment with refresh modes and automatic recommendations.
For consistent behavior across Snowflake releases, explicitly set the refresh mode on all production
tables. The behavior of AUTO
might change between Snowflake releases, which can cause unexpected
changes in performance if used in production pipelines.
To verify the refresh mode for your dynamic tables, see View dynamic table refresh mode.
Best practices for optimizing performance¶
To optimize your dynamic tables’ performance, you should understand the system, experiment with ideas, and iterate based on results. For example:
Develop ways to improve your data pipeline based on your cost, data lag, and response time needs.
Implement the following actions:
Start with a small, fixed dataset to quickly develop queries.
Test performance with data in motion.
Scale the dataset to verify that it meets your needs.
Adjust your workload based on findings.
Repeat as needed, prioritizing tasks with the greatest performance impact.
Additionally, use downstream lag to manage refresh dependencies between tables efficiently, ensuring that refreshes happen only when necessary. For more information, see the performance documentation.
Choosing between refresh modes¶
To determine the best mode for your use case, experiment with automatic recommendations and the concrete refresh modes (full and incremental). The best mode for your dynamic tables’ performance depends on data change volume and query complexity. Additionally, testing different refresh modes with a dedicated warehouse helps isolate costs and improve performance tuning based on actual workloads.
To verify the refresh mode for your dynamic tables, see View dynamic table refresh mode.
AUTO refresh mode: The system attempts to apply incremental refresh by default. When incremental refresh isn’t supported or might not perform well, the dynamic table automatically selects full refresh instead.
For consistent behavior, explicitly set the refresh mode on all production tables. The behavior of
AUTO
might change between Snowflake releases, which can cause unexpected changes in performance if used in production pipelines.
Incremental refresh: Updates the dynamic table with only the changes since the last refresh, making it ideal for large datasets with frequent small updates.
Best for queries compatible with incremental refresh (for example, deterministic functions, simple joins, and basic expressions in
SELECT
,WHERE
, andGROUP BY
). If unsupported features are present, and the refresh mode is set to incremental, Snowflake will fail to create the dynamic table.A key practice for optimizing performance with incremental refresh is to limit change volume to about 5% of the source data and to cluster your data by the grouping keys to reduce processing overhead.
Consider that certain combinations of operations, like aggregations atop many joins, might not run efficiently.
Full refresh: Reprocesses the entire dataset and updates the dynamic table with the complete query result. Use for complex queries or when significant data changes require a complete update.
Useful when incremental refresh isn’t supported due to complex queries, non-deterministic functions, or major changes in the data.
For more information, see How refresh mode affects dynamic table performance.
Full refresh performance¶
Full refresh dynamic tables perform similarly to CREATE TABLE … AS SELECT (also referred to as CTAS). They can be optimized like any other Snowflake query.
Incremental refresh performance¶
To help achieve optimal incremental refresh performance for your dynamic tables:
Keep changes between refreshes minimal, ideally less than 5% of the total dataset, for both sources and the dynamic table.
Consider the number of micro-partitions modified, not just the row count. The amount of work an incremental refresh must do is proportional to the size of these micro-partitions, not only the rows that changed.
Minimize grouping operations like joins, GROUP BYs, and PARTITION BYs in your query. Break down large Common Table Expressions (CTEs) into smaller parts and create a dynamic table for each. Avoid overwhelming a single dynamic table with excessive aggregations or joins.
Ensure data locality by aligning table changes with query keys (e.g., for joins, GROUP BYs, PARTITION BYs). If your tables aren’t naturally clustered by these keys, consider enabling automatic clustering.