Dynamic table query performance¶
This section discusses how to design your pipeline to get good performance at query time. Your data takes various forms as it moves through different systems:
Source data: Initially, data is generated by real-world entities and collected in frontline systems. This data is then ingested into Snowflake via ETL processes.
Raw data: After ingestion, the data is stored in Snowflake tables, where it’s transformed into a form more suitable for analysis.
Modeled data: These transformations result in a set of models, which present familiar concepts to consumers for analysis.
Your pipeline comprises these steps. Dynamic tables work in the transformation steps, but you should consider their performance in the context of your overall pipeline.
Understanding materialization¶
The performance of analysis queries is determined by the design and implementation of the transformations from raw to modeled data. While defining these transformations as a set of views over raw data is technically correct, it might not meet performance and cost needs.
To address this, some or all data models might need to be materialized by precomputing their results, storing them for quick access, and keeping them up to date. Dynamic tables make this materialization easy, but you still have to decide which of your models to materialize.
Materialization boundary¶
The division between the materialized and unmaterialized part of the transformation is referred to as the materialization boundary. When deciding on the materialization boundary, the following factors are typically relevant:
Lag or Freshness: The freshness of the data you provide, which is how out of date your results are. The materialization boundary typically doesn’t have a strong effect on this factor.
Response time: Materializing more of your pipieline reduces response time. Freshness is always at least as long as the response time of your queries, but can be much longer.
Cost: The cost of your workload is associated with the following:
Cost of materialization: This cost scales with the amount of data in the sources and the complexity of the transformations.
Cost of computing the unmaterialized transformations during analysis: This cost scales with the number of analytical queries and their complexity.
Cost of storage: This cost includes both the raw and materialized data.
Materializing more modeled data speeds up response times and lowers analysis cost but can raise materialization costs. Finding the best materialization boundary requires balancing the above factors. Typically, you can get good results by materializing the least amount of data that still meets your response time requirements.
How to materialize¶
After you’ve set your materialization boundary, you can create dynamic tables and views accordingly. You can optimize the performance of the unmaterialized transformations just like any other Snowflake query. After you query them, regular and dynamic tables perform similarly, letting you use standard techniques like normalization, pre-aggregation, and clustering to boost performance.
For more ideas, see the query performance checklist.