Best practices for dynamic tables

This topic provides best practices and important considerations when creating and managing dynamic tables.

General best practices

Use the MONITOR privilege for viewing metadata

For scenarios where the user only needs to view the metadata and Information Schema of a dynamic table (for example, roles held by data scientists), use a role that has the MONITOR privilege on that dynamic table. While the OPERATE privilege grants this access, it also includes the capability to alter dynamic tables, making MONITOR the more suitable option for scenarios where a user does not need to alter a dynamic table.

For more information, see Dynamic table access control.

Use dynamic tables to implement slowly changing dimensions

Dynamic tables can be used to implement Type 1 and 2 slowly changing dimensions (SCDs). When reading from a change stream, use window functions over per-record keys ordered by a change timestamp. Using this method, dynamic tables seamlessly handle insertions, deletions, and updates that occur out of order to simplify SCD creation. For more information, see Slowly Changing Dimensions with Dynamic Tables.

Best practices for creating dynamic tables

Chain together pipelines of dynamic tables

When defining a new dynamic table, rather than defining a large dynamic table with many nested statements, use small dynamic tables with pipelines instead.

You can set up a dynamic table to query other dynamic tables. For instance, imagine a scenario where your data pipeline extracts data from a staging table to update various dimension tables (e.g., customer, product, date and time). Additionally, your pipeline updates an aggregate sales table based on the information from these dimension tables. By configuring the dimension tables to query the staging table and the aggregate sales table to query the dimension tables, you create a cascade effect similar to a task graph.

In this setup, the refresh for the aggregate sales table executes only after the refreshes for the dimension tables have successfully completed. This ensures data consistency and meets lag targets. Through an automated refresh process, any changes in the source tables trigger refreshes in all dependent tables at the appropriate times.

Comparison between task graphs and dynamic tables DAGs

Use a “controller” dynamic table for complex task graphs

When you have a complex graph of dynamic tables with many roots and leaves and you want to perform operations (e.g. changing lag, manual refresh, suspension) on the full task graph with a single command, do the following:

  1. Set the value for the TARGET_LAG of all of your dynamic tables to DOWNSTREAM.

  2. Create a “controller” dynamic table that reads from all of the leaves in your task graph. To ensure this controller doesn’t consume resources, do the following:

    CREATE DYNAMIC TABLE controller
      TARGET_LAG = <target_lag>
      WAREHOUSE = <warehouse>
      AS
        SELECT 1 A FROM <leaf1>, …, <leafN> LIMIT 0;
    
    Copy
  3. Use the controller to control the whole graph. For example:

  • Set a new target lag for the task graph.

    ALTER DYNAMIC TABLE controller SET
      TARGET_LAG = <new_target_lag>
    
    Copy
  • Manually refresh the task graph.

    ALTER DYNAMIC TABLE controller REFRESH
    
    Copy

About cloning pipelines of dynamic tables

Clone all elements of the dynamic table pipeline in the same clone command to avoid reinitializations of your pipeline. You can do this by consolidating all elements of the pipeline (e.g. base tables, view, and dynamic tables) in the same schema or database. For more information, see Known limitations for dynamic tables.

Use transient dynamic tables to reduce storage cost

Transient dynamic tables maintain data reliably over time and support Time Travel within the data retention period, but don’t retain data beyond the fail-safe period. By default, dynamic table data is retained for 7 days in fail-safe storage. For dynamic tables with high refresh throughput, this can significantly increase storage consumption. Therefore, Snowflake recommends making a dynamic table transient only if its data doesn’t need the same level of data protection and recovery provided by permanent tables.

You can create a transient dynamic table or clone existing dynamic tables to transient dynamic tables using the CREATE DYNAMIC TABLE statement.

Best practices for dynamic table refresh

Use dedicated warehouses for refreshes

Dynamic tables require a virtual warehouse to perform refreshes. To get a clear understanding of costs related to your dynamic table pipelines, Snowflake recommends testing dynamic tables using dedicated warehouses such that the virtual warehouse consumption attributed to dynamic tables can be isolated. For more information, see Understanding cost for dynamic tables.

Use downstream lag

Downstream lag indicates that the dynamic table should refresh when other dependent dynamic tables require refreshing. Snowflake recommends using downstream lag as a best practice due to its ease of use and cost-effectiveness. Without downstream lag, managing a chain of complex dynamic tables would require individually assigning each table its own target lag and managing the associated constraints, instead of only monitoring the data freshness of the final table. For more information, see Understanding target lag.

Set the refresh mode for all production dynamic tables

A dynamic table’s actual refresh mode is determined at creation time and is immutable afterward. If not specified explicitly, the refresh mode defaults to AUTO, which selects a refresh mode based on various factors such as query complexity, or unsupported constructs, operators, or functions.

To determine the best mode for your use case, experiment with refresh modes and automatic recommendations. For consistent behavior across Snowflake releases, Snowflake recommends you explicitly set the refresh mode on all dynamic tables.

Note

When the 2024_04 behavior change bundle is enabled, Snowflake chooses the refresh mode that’s likely to perform best depending on your query definition. To learn more, see the 2024_04 release announcement.

For information about the refresh mode and the reason the refresh mode was chosen, see the output from the SHOW DYNAMIC TABLES command.

Best practices for optimizing performance

Performance engineering is like a science: understanding the system, experimenting with ideas, and iterating based on what works. To enhance your dynamic tables’ performance, follow these steps:

  1. Read and understand the performance documentation.

  2. Using your requirements for cost, data lag, and response times, brainstorm ways to improve your pipeline.

  3. Try the following:

    1. Begin with a small, fixed dataset to quickly develop queries.

    2. Adjust the dataset to observe performance with data in motion.

    3. Scale up the dataset to ensure it meets your needs.

  4. Fine-tune your workload based on your findings.

  5. Repeat steps 2 to 4 as needed. Avoid premature optimization; rather, prioritize tasks based on their impact by measuring performance.

Full refresh performance

Full refresh dynamic tables perform similarly to CREATE TABLE … AS SELECT (also referred to as CTAS). If you’re unsure about refresh performance, opting for a full refresh is usually the more straightforward option.

Incremental refresh performance

To achieve optimal incremental refresh performance for your dynamic tables:

  • Keep changes between refreshes minimal, ideally less than 5% of the total dataset, for both sources and the dynamic table.

  • Consider the number of micro-partitions altered, not just the row count.

  • Minimize grouping operations like joins, GROUP BYs, and PARTITION BYs in your query. Break down large Common Table Expressions (CTEs) into smaller parts and create a dynamic table for each. Avoid overwhelming a single dynamic table with excessive aggregations or joins.

  • Ensure data locality by aligning table changes with query keys (e.g., for joins, GROUP BYs, PARTITION BYs). If your tables aren’t naturally clustered by these keys, consider enabling automatic clustering.