Dynamic tables

Dynamic tables are the building blocks of declarative data transformation pipelines. They significantly simplify data engineering in Snowflake and provide a reliable, cost-effective, and automated way to transform your data for consumption. Instead of defining data transformation steps as a series of tasks and then monitoring dependencies and scheduling, you can simply define the end state of the transformation using dynamic tables and leave the complex pipeline management to Snowflake.

This topic introduces dynamic table concepts and how to transform data in a continuous data pipeline using dynamic tables.

A dynamic table materializes the results of a query that you specify. Instead of creating a separate target table and writing code to transform and update the data in that table, you can define the target table as a dynamic table, and you can specify the SQL statement that performs the transformation. An automated process updates the materialized results automatically through regular refreshes.

Because the content of a dynamic table is fully determined by the given query, the content cannot be changed by using DML. You don’t insert, update, or delete the rows in a dynamic table. The automated refresh process materializes the query results into a dynamic table.

The next sections explain the concepts behind dynamic tables:

An example

The example in Transforming Loaded JSON Data on a Schedule uses streams and tasks to transform and insert new data into two target tables (name and visits) as the data is streamed into a landing table (raw).

The following examples demonstrate how to perform the same transformation using dynamic tables.

SQL Statements for Streams and Tasks

SQL Statements for Dynamic Tables

-- Create a landing table to store
-- raw JSON data.
CREATE OR REPLACE TABLE raw
  (var VARIANT);

-- Create a stream to capture inserts
-- to the landing table.
CREATE OR REPLACE STREAM rawstream1
  ON TABLE raw;

-- Create a table that stores the names
-- of office visitors from the raw data.
CREATE OR REPLACE TABLE names
  (id INT,
   first_name STRING,
   last_name STRING);

-- Create a task that inserts new name
-- records from the rawstream1 stream
-- into the names table.
-- Execute the task every minute when
-- the stream contains records.
CREATE OR REPLACE TASK raw_to_names
  WAREHOUSE = mywh
  SCHEDULE = '1 minute'
  WHEN
    SYSTEM$STREAM_HAS_DATA('rawstream1')
  AS
    MERGE INTO names n
      USING (
        SELECT var:id id, var:fname fname,
        var:lname lname FROM rawstream1
      ) r1 ON n.id = TO_NUMBER(r1.id)
      WHEN MATCHED AND metadata$action = 'DELETE' THEN
        DELETE
      WHEN MATCHED AND metadata$action = 'INSERT' THEN
        UPDATE SET n.first_name = r1.fname, n.last_name = r1.lname
      WHEN NOT MATCHED AND metadata$action = 'INSERT' THEN
        INSERT (id, first_name, last_name)
          VALUES (r1.id, r1.fname, r1.lname);
Copy
-- Create a landing table to store
-- raw JSON data.
CREATE OR REPLACE TABLE raw
  (var VARIANT);

-- Create a dynamic table containing the
-- names of office visitors from
-- the raw data.
-- Try to keep the data up to date within
-- 1 minute of real time.
CREATE OR REPLACE DYNAMIC TABLE names
  TARGET_LAG = '1 minute'
  WAREHOUSE = mywh
  AS
    SELECT var:id::int id, var:fname::string first_name,
    var:lname::string last_name FROM raw;
Copy

As this example shows, when creating a dynamic table, you specify the query for the results that you want to see. For the incremental refresh of the data, you don’t need to create a stream to track changes and write a task to examine those changes and apply the changes to the target table. The automated refresh process does this for you based on the query that you specify.

When to use dynamic tables

There are several methods of transforming data in your pipeline (for example, streams and tasks, CTAS, your own custom solution). Dynamic tables are one option for transforming your data.

Dynamic tables are best used when:

  • You don’t want to write code to track data dependencies and manage data refresh.

  • You don’t need, or want to avoid, the complexity of streams and tasks.

  • You do need to materialize the results of a query of multiple base tables.

  • You need to build multiple tables to transform data via an ETL pipeline.

  • You don’t need fine-grained refresh schedule control and you just want to specify the target data freshness for your pipelines.

  • You don’t need to use unsupported dynamic query constructs such as stored procedures, non-deterministic functions not listed in Supported non-deterministic functions in full refresh, or external functions, or need to use sources for dynamic tables that are external tables, streams, or materialized views.

Note

Dynamic tables can be used as the source of a stream. When used together, a stream based on a dynamic table works like any other stream. For additional information and examples see Streams and dynamic tables.

How dynamic tables work

When creating a dynamic table, you specify the query used to transform the data from one or more base objects or dynamic tables. An automated refresh process executes this query regularly and updates the dynamic table with the changes made to the base objects.

Comparison between streams / tasks and dynamic tables

This automated process computes the changes that were made to the base objects and merges those changes into the dynamic table. To perform this work, the process uses compute resources that you associate with the dynamic table. For more information on resources refer to Understanding the costs of dynamic tables.

When creating a dynamic table, you specify a target “freshness” for the data (a target lag). For example, you can specify that the data should be at most five minutes behind the updates to the base table. Based on this target freshness, the automated process sets up refreshes so that the data in the dynamic table is kept up to date within this target (that is, within five minutes of updates to the base table).

If the data does not need to be as fresh, you can specify a longer target freshness time to reduce costs. For example, if the data in the target table just needs to be at most one hour behind the updates to the base tables, you can specify a target freshness of one hour (instead of five minutes) to reduce costs.

About chaining together pipelines of dynamic tables

You can set up a dynamic table to query other dynamic tables.

For example, suppose that your data pipeline retrieves data from a staging table to update separate dimension tables for customer, product, and date and time data. Your pipeline also updates a table containing aggregate sales data, based on the dimension tables.

You can set up the dimension tables as dynamic tables that query the staging table. You can then set up the aggregate sales table as a dynamic table that queries the dimension tables.

This is similar to defining a task graph. In a task graph, the task that updates the aggregate sales table runs only if the tasks to update the dimension tables have run to completion without errors.

Comparison between task graphs and dynamic tables DAGs

If a dynamic table queries another dynamic table, the automated refresh process updates all dependent dynamic tables at the appropriate time in order to ensure that their lag targets are met and the data is consistent.

Dynamic tables and time travel

Snowflake Time Travel enables accessing historical data (i.e. data that has been changed or deleted) at any point within a defined period. Time Travel behaves identically for Dynamic Tables as it does for traditional tables.

For more information refer to Snowflake Time Travel & Fail-safe.

Dynamic tables and replication

Replication support for dynamic tables enables you to copy data from a primary database to a secondary database for either disaster recovery or data sharing. It can serve as either a failover preparation strategy for disaster recovery or as a means of sharing data across deployments for read-only purposes.

Replicated dynamic tables behave differently depending on if the primary database that contains the dynamic table is replicated in a replication group or a failover group. For more information, see Replication and dynamic tables.