How workload affects dynamic table performance¶
Optimizing the workload in your pipeline helps reduce cost and improve performance in general. Snowflake recommends starting your optimization by tuning your workload because it’s the most impactful way to improve the performance of your dynamic table refreshes.
Queries¶
Take a closer look at the queries in your dynamic tables and look for ways to lessen their workload, such as the following:
Adding filters to reduce the amount of data being scanned.
Eliminating duplicates earlier in the pipeline and subsequently avoiding uses of
DISTINCT
.Reducing the use of
NULL
and ensuring referential integrity in your source data so you can replace outer joins with inner joins.Avoiding materializing large columns that are read infrequently.
Locality¶
Consider the layout of your data in your tables. For good performance, the data your queries access should be close together, known as locality.
For instance, if you need to efficiently search for rows by a specific column, the table should be clustered by that column. Ideally, the locality of your tables should align with your queries’ structure, clustering by JOIN, GROUP BY, and PARTITION BY keys when possible. To benefit from clustering, the sequence of clustering keys must share a common prefix with the sequence of partitioning keys.
Optimizing locality involves tradeoffs. It’s challenging to optimize locality across many columns simultaneously, so focus on the most impactful ones for the best performance. For more information, see Understanding Snowflake Table Structures.