Tutorial: Use primary keys to optimize dynamic table pipelines

Introduction

This tutorial shows you how to use primary keys to enable efficient incremental refresh in a dynamic table pipeline where the base table is periodically rewritten through INSERT OVERWRITE. You’ll build two dimension-fact join pipelines with the same data and query definition, one with a primary key and one without, and compare how they handle a dimension table rewrite.

About dimension-fact joins with INSERT OVERWRITE

In many data pipelines, dimension tables are periodically rewritten by external processes using INSERT OVERWRITE. This is common with ETL connectors and batch data loads. The rewrite replaces all rows in the table, even when only a small fraction of the data actually changed.

Without primary keys, Snowflake can’t determine what changed across a rewrite and treats every row as new. This forces the pipeline to recompute everything in each refresh cycle.

When the dimension table has a primary key with the RELY property, Snowflake uses the key to identify which rows actually changed between rewrites. The dynamic table then processes only the affected rows, even though the underlying data was fully replaced.

What you’ll learn

In this tutorial, you’ll learn how to complete the following tasks:

  • Create two dimension tables with the same data: one with a primary key and one without.
  • Build two dynamic table pipelines with the same join query and compare their behavior.
  • Simulate a dimension table rewrite with INSERT OVERWRITE where only 10% of rows change.
  • Compare the incremental refresh performance of both dynamic tables on the same data.

Prerequisites

You need access to a Snowflake environment with the following resources:

  • A warehouse for compute resources. We recommend using an x-small warehouse.
  • The privileges required to create databases, schemas, and dynamic tables. For more information, see Access control privileges.

If you don’t have a user with the necessary permissions, ask someone who does to create one for you. Users with the ACCOUNTADMIN role can create new users and grant them the required privileges.

Note

For the best experience, complete this tutorial in Snowsight so that you can quickly view the query history and monitor your dynamic table performance.

Step 1: Create the source data

Start by setting up two dimension tables (one with a primary key, one without) and a shared fact table.

Create a database and schema for the tutorial:

CREATE DATABASE IF NOT EXISTS temp;
CREATE SCHEMA IF NOT EXISTS temp.tutorial;

USE SCHEMA temp.tutorial;

Create a dimension table with a primary key and the RELY property. The RELY property tells Snowflake that it can trust the primary key for optimizations such as change tracking:

CREATE OR REPLACE TABLE dimension_products_with_pk (
    product_id INT PRIMARY KEY RELY,
    product_name VARCHAR(200),
    category VARCHAR(100),
    price NUMBER(10, 2)
) CHANGE_TRACKING = TRUE;

Create a second dimension table with the same schema but no primary key:

CREATE OR REPLACE TABLE dimension_products_no_pk (
    product_id INT,
    product_name VARCHAR(200),
    category VARCHAR(100),
    price NUMBER(10, 2)
) CHANGE_TRACKING = TRUE;

Create a fact table for order transactions:

CREATE OR REPLACE TABLE fact_orders (
    order_id INT,
    product_id INT,
    quantity INT,
    order_date TIMESTAMP_NTZ
) CHANGE_TRACKING = TRUE;

Insert sample data into the dimension table with a primary key. This generates 100,000 products across 10 categories:

INSERT INTO dimension_products_with_pk (product_id, product_name, category, price)
  SELECT
      SEQ4() + 1 AS product_id,
      'Product ' || LPAD(TO_VARCHAR(SEQ4() + 1), 6, '0') AS product_name,
      'Category ' || LPAD(TO_VARCHAR(MOD(SEQ4(), 10) + 1), 2, '0') AS category,
      ROUND(5.00 + MOD(SEQ4(), 500) * 0.50, 2) AS price
  FROM TABLE(GENERATOR(ROWCOUNT => 100000));

Copy the same data into the dimension table without a primary key:

INSERT INTO dimension_products_no_pk
  SELECT * FROM dimension_products_with_pk;

Insert sample order data. This generates 10 million orders that reference the products:

INSERT INTO fact_orders (order_id, product_id, quantity, order_date)
  SELECT
      SEQ4() + 1 AS order_id,
      MOD(SEQ4(), 100000) + 1 AS product_id,
      MOD(SEQ4(), 10) + 1 AS quantity,
      DATEADD(SECOND, SEQ4(), '2025-01-01 00:00:00') AS order_date
  FROM TABLE(GENERATOR(ROWCOUNT => 10000000));

Step 2: Create dynamic tables for comparison

In this step, you will create two dynamic table pipelines with the same join query. The only difference is whether the dimension table has a primary key. This lets you directly compare their refresh performance on the same data.

Note

Replace my_warehouse with the name of your warehouse.

Pipeline with primary key

Create a dynamic table that joins the dimension table (with primary key) to the fact table. Because the dimension table has a reliable primary key, this dynamic table can use incremental refresh:

CREATE OR REPLACE DYNAMIC TABLE dt_enriched_orders_with_pk
  TARGET_LAG = DOWNSTREAM
  WAREHOUSE = <my_warehouse>
  REFRESH_MODE = INCREMENTAL
AS
  SELECT
      f.order_id,
      f.product_id,
      d.product_name,
      d.category,
      f.quantity,
      d.price,
      f.quantity * d.price AS order_total,
      f.order_date
  FROM fact_orders f
  INNER JOIN dimension_products_with_pk d ON f.product_id = d.product_id;

Pipeline without primary key

Create the same join query using the dimension table without a primary key. Without a primary key, Snowflake can’t track row-level changes across INSERT OVERWRITE rewrites, so this dynamic table recomputes the entire join on each refresh:

CREATE OR REPLACE DYNAMIC TABLE dt_enriched_orders_no_pk
  TARGET_LAG = DOWNSTREAM
  WAREHOUSE = <my_warehouse>
  REFRESH_MODE = INCREMENTAL
AS
  SELECT
      f.order_id,
      f.product_id,
      d.product_name,
      d.category,
      f.quantity,
      d.price,
      f.quantity * d.price AS order_total,
      f.order_date
  FROM fact_orders f
  INNER JOIN dimension_products_no_pk d ON f.product_id = d.product_id;

Refresh both pipelines

Perform the initial refresh for both dynamic tables to establish a baseline:

ALTER DYNAMIC TABLE dt_enriched_orders_with_pk REFRESH;
ALTER DYNAMIC TABLE dt_enriched_orders_no_pk REFRESH;

Step 3: Simulate a dimension table rewrite and compare

Now simulate the common scenario where an external process rewrites the dimension table through INSERT OVERWRITE with a small percentage of rows changed.

Rewrite the dimension tables

Rewrite both dimension tables with INSERT OVERWRITE. This updates the price for 10% of products (those in Category 01) while keeping the rest identical:

INSERT OVERWRITE INTO dimension_products_with_pk
  SELECT
      product_id,
      product_name,
      category,
      CASE
          WHEN category = 'Category 01' THEN ROUND(price * 1.10, 2)
          ELSE price
      END AS price
  FROM dimension_products_with_pk;

INSERT OVERWRITE INTO dimension_products_no_pk
  SELECT
      product_id,
      product_name,
      category,
      CASE
          WHEN category = 'Category 01' THEN ROUND(price * 1.10, 2)
          ELSE price
      END AS price
  FROM dimension_products_no_pk;

Even though every row was rewritten in both tables, only about 10,000 products (Category 01) actually have different values. The pipeline with a primary key can detect this; the pipeline without a primary key can’t.

Refresh and compare performance

Refresh both dynamic tables to pick up the changes:

ALTER DYNAMIC TABLE dt_enriched_orders_no_pk REFRESH;

Check the execution time and scan metrics:

  1. Navigate to Transformation » Dynamic Tables.

    Dynamic Table selection in the Snowsight UI
  2. Filter the list by selecting the temp database, then select dt_enriched_orders_no_pk.

    Dynamic table list filtered by database name in the Snowsight UI
  3. Select the Refresh History tab and notice the REFRESH DURATION value for the most recent refresh.

    Refresh history showing refreshes for the suboptimal dynamic table

    Because the dimension table has no primary key, Snowflake can’t distinguish changed rows from unchanged rows after the INSERT OVERWRITE. Every row in the rewritten table looks like a new insertion, so the engine treats all 100,000 dimension rows as changed and must re-join them against the entire 10-million-row fact table. This produces far more inserted and deleted rows and a much higher refresh duration, even though only 10% of the dimension data actually changed.

  4. Now refresh the optimized dt_enriched_orders_with_pk dynamic table:

    ALTER DYNAMIC TABLE dt_enriched_orders_with_pk REFRESH;
  5. Repeat the previous steps to check the Refresh History for the optimized table:

    Refresh history showing refreshes for the optimized dynamic table

    Compare the two refresh operations. The optimized dt_enriched_orders_with_pk dynamic table should complete significantly faster than the suboptimal dt_enriched_orders_no_pk dynamic table. In the example results, the suboptimal dynamic table took 34 seconds and updated 20 million rows in total, while the optimized table took only 12 seconds and updated only 2 million rows in total.

The results show the difference between the two approaches:

  • dt_enriched_orders_with_pk: Uses the primary key to identify the ~10% of dimension rows that actually changed, then processes only the orders that reference those products. The rows_inserted and rows_deleted counts reflect just the affected rows, and the refresh duration is significantly lower.
  • dt_enriched_orders_no_pk: Can’t determine what changed in the rewrite, so it reprocesses the entire 10-million-row join. The row counts and refresh duration are much higher.

Tip

The performance difference increases when a) the fact table grows and b) fewer dimension rows actually changed. In production pipelines where dimension tables contain millions of rows and only a small fraction changes on each load cycle, primary key-based change tracking can reduce refresh times by an order of magnitude.

Clean up

To delete all objects created for this tutorial, run the following DROP statement:

DROP DATABASE temp;

Summary and additional resources

In this tutorial, you used primary keys to enable an efficient incremental refresh in a dynamic table pipeline where the dimension table is periodically rewritten through INSERT OVERWRITE. By comparing two pipelines with the same data and query, you saw how a primary key lets Snowflake identify only the rows that actually changed and process just those changes through the join.

Along the way, you completed the following tasks:

  • Created two dimension tables with the same data: one with a primary key (RELY) and one without.
  • Built two dynamic table pipelines with the same join query and compared their behavior after an INSERT OVERWRITE rewrite.
  • Simulated a dimension table rewrite with INSERT OVERWRITE, changing 10% of the rows, and compared the refresh performance of both pipelines.

Key concepts demonstrated:

For more information about dynamic tables and optimization techniques, explore the following resources: