Best practices for dynamic tables¶
This topic provides best practices and important considerations when creating and managing dynamic tables.
General best practices:
Best practices for creating dynamic tables:
Best practices for dynamic table refresh:
Best practices for optimizing performance:
General best practices¶
Use the MONITOR privilege for viewing metadata¶
For scenarios where the user only needs to view the metadata and Information Schema of a dynamic table (for example, roles held by data scientists), use a role that has the MONITOR privilege on that dynamic table. While the OPERATE privilege grants this access, it also includes the capability to alter dynamic tables, making MONITOR the more suitable option for scenarios where a user does not need to alter a dynamic table.
For more information, see Dynamic table access control.
Simplify compound expressions in grouping keys¶
If a grouping key contains a compound expression rather than a base column, materialize the expression in one dynamic table and then apply the grouping operation on the materialized column in another dynamic table. For more information, see How operators incrementally refresh.
Use dynamic tables to implement slowly changing dimensions¶
Dynamic tables can be used to implement Type 1 and 2 slowly changing dimensions (SCDs). When reading from a change stream, use window functions over per-record keys ordered by a change timestamp. Using this method, dynamic tables seamlessly handle insertions, deletions, and updates that occur out of order to simplify SCD creation. For more information, see Slowly Changing Dimensions with Dynamic Tables.
Best practices for creating dynamic tables¶
Chain together pipelines of dynamic tables¶
When defining a new dynamic table, rather than defining a large dynamic table with many nested statements, use small dynamic tables with pipelines instead.
You can set up a dynamic table to query other dynamic tables. For instance, imagine a scenario where your data pipeline extracts data from a staging table to update various dimension tables (e.g., customer, product, date and time). Additionally, your pipeline updates an aggregate sales table based on the information from these dimension tables. By configuring the dimension tables to query the staging table and the aggregate sales table to query the dimension tables, you create a cascade effect similar to a task graph.
In this setup, the refresh for the aggregate sales table executes only after the refreshes for the dimension tables have successfully completed. This ensures data consistency and meets lag targets. Through an automated refresh process, any changes in the source tables trigger refreshes in all dependent tables at the appropriate times.
Use a “controller” dynamic table for complex task graphs¶
When you have a complex graph of dynamic tables with many roots and leaves and you want to perform operations (e.g. changing lag, manual refresh, suspension) on the full task graph with a single command, do the following:
Set the value for the
TARGET_LAG
of all of your dynamic tables toDOWNSTREAM
.Create a “controller” dynamic table that reads from all of the leaves in your task graph. To ensure this controller doesn’t consume resources, do the following:
CREATE DYNAMIC TABLE controller TARGET_LAG = <target_lag> WAREHOUSE = <warehouse> AS SELECT 1 A FROM <leaf1>, …, <leafN> LIMIT 0;
Use the controller to control the whole graph. For example:
Set a new target lag for the task graph.
ALTER DYNAMIC TABLE controller SET TARGET_LAG = <new_target_lag>Manually refresh the task graph.
ALTER DYNAMIC TABLE controller REFRESH
About cloning pipelines of dynamic tables¶
Clone all elements of the dynamic table pipeline in the same clone command to avoid reinitializations of your pipeline. You can do this by consolidating all elements of the pipeline (e.g. base tables, view, and dynamic tables) in the same schema or database. For more information, see Known limitations for dynamic tables.
Use transient dynamic tables to reduce storage cost¶
Transient dynamic tables maintain data reliably over time and support Time Travel within the data retention period, but don’t retain data beyond the fail-safe period. By default, dynamic table data is retained for 7 days in fail-safe storage. For dynamic tables with high refresh throughput, this can significantly increase storage consumption. Therefore, you should make a dynamic table transient only if its data doesn’t need the same level of data protection and recovery provided by permanent tables.
You can create a transient dynamic table or clone existing dynamic tables to transient dynamic tables using the CREATE DYNAMIC TABLE statement.
Best practices for dynamic table refresh¶
Use dedicated warehouses for refreshes¶
Dynamic tables require a virtual warehouse to perform refreshes. To get a clear understanding of costs related to your dynamic table pipelines, you should test your dynamic tables using dedicated warehouses such that the virtual warehouse consumption attributed to dynamic tables can be isolated. For more information, see Understanding cost for dynamic tables.
Use downstream lag¶
Downstream lag indicates that the dynamic table should refresh when other dependent dynamic tables require refreshing. You should use downstream lag as a best practice because of its ease of use and cost effectiveness. Without downstream lag, managing a chain of complex dynamic tables would require individually assigning each table its own target lag and managing the associated constraints, instead of only monitoring the data freshness of the final table. For more information, see Understanding target lag.
Set the refresh mode for all production dynamic tables¶
A dynamic table’s actual refresh mode is determined at
creation time and is immutable afterward. If not specified explicitly, the refresh mode defaults to AUTO
,
which selects a refresh mode based on various factors such as query complexity, or unsupported constructs,
operators, or functions.
To determine the best mode for your use case, experiment with refresh modes and automatic recommendations. For consistent behavior across Snowflake releases, you should explicitly set the refresh mode on all dynamic tables.
To verify the refresh mode for your dynamic tables, see View dynamic table refresh mode.
Best practices for optimizing performance¶
Performance engineering is like a science: understanding the system, experimenting with ideas, and iterating based on what works. To enhance your dynamic tables’ performance, follow these steps:
Read and understand the performance documentation.
Using your requirements for cost, data lag, and response times, brainstorm ways to improve your pipeline.
Try the following:
Begin with a small, fixed dataset to quickly develop queries.
Adjust the dataset to observe performance with data in motion.
Scale up the dataset to ensure it meets your needs.
Fine-tune your workload based on your findings.
Repeat steps 2 to 4 as needed. Avoid premature optimization; rather, prioritize tasks based on their impact by measuring performance.
Full refresh performance¶
Full refresh dynamic tables perform similarly to CREATE TABLE … AS SELECT (also referred to as CTAS). They can be optimized like any other Snowflake query.
Incremental refresh performance¶
To help achieve optimal incremental refresh performance for your dynamic tables:
Keep changes between refreshes minimal, ideally less than 5% of the total dataset, for both sources and the dynamic table.
Consider the number of micro-partitions modified, not just the row count. The amount of work an incremental refresh must do is proportional to the size of these micro-partitions, not only the rows that changed.
Minimize grouping operations like joins, GROUP BYs, and PARTITION BYs in your query. Break down large Common Table Expressions (CTEs) into smaller parts and create a dynamic table for each. Avoid overwhelming a single dynamic table with excessive aggregations or joins.
Ensure data locality by aligning table changes with query keys (e.g., for joins, GROUP BYs, PARTITION BYs). If your tables aren’t naturally clustered by these keys, consider enabling automatic clustering.