Use primary keys to optimize dynamic table pipelines¶
Snowflake can use primary keys to track row-level changes in dynamic tables without relying on change-tracking columns. This enables incremental refresh for pipelines that run insert overwrite workloads including full refresh dynamic tables, which normally block downstream incremental processing.
Primary keys are especially effective when an INSERT OVERWRITE is performed on a base table where only a small fraction of the data is actually changed. In these cases, primary key-based change tracking processes only the changed rows instead of recomputing the entire table. A primary key provides a stable row identifier that persists across overwrites.
For conceptual background, see Understanding primary keys in dynamic tables.
Improve performance for INSERT OVERWRITE workloads¶
When a base table is periodically rewritten through INSERT OVERWRITE, standard change-tracking columns are reset and a dynamic table consuming the base table will see a set of inserts and deletes for all rows in the base table.
In the following example, an external process rewrites the dimension_table periodically, but
most rows remain the same:
When the dimension table is rewritten through INSERT OVERWRITE, Snowflake uses the primary key to identify which dimension rows actually changed and refreshes only the affected facts, rather than recomputing the entire join.
Enable incremental refresh downstream of a full refresh dynamic table¶
Normally, a dynamic table with REFRESH_MODE = INCREMENTAL can’t read from a dynamic table
with REFRESH_MODE = FULL. When the full refresh dynamic table has a system-derived unique key,
you can explicitly set the refresh mode to INCREMENTAL.
Example: Use a base table primary key¶
Create a base table with a primary key and set the RELY property so Snowflake uses it
for row-level change tracking:
Create a full refresh dynamic table that reads from the base table. Because the base table has a reliable primary key, Snowflake can derive an unique key from the base table and register it as an unique constraint for the dynamic table:
Create an incremental dynamic table downstream. This works because the upstream table has a system-derived reliable unique key:
Example: Use a query-derived primary key¶
When a dynamic table’s query includes a GROUP BY clause, Snowflake automatically derives an unique key from the grouping columns. Downstream tables can use this derived key for primary key-based change tracking and enable incremental refreshes.
The daily_sales table has a derived unique key on (sale_day, product_id) because the
GROUP BY guarantees one row per combination. A downstream table can refresh incrementally:
Check system-derived unique keys on a dynamic table¶
To see whether a dynamic table has a derived unique key, use the SHOW UNIQUE KEYS command:
If the output contains a unique key, the dynamic table supports primary key-based change
tracking. Downstream dynamic tables can use REFRESH_MODE = INCREMENTAL to read from it,
even if it uses full refresh mode.
You can also verify support by creating a downstream dynamic table with
REFRESH_MODE = INCREMENTAL. If the upstream table doesn’t have a reliable unqiue key,
the creation fails with an error.