2024 Performance Improvements¶
Important
Performance improvements often target specific query patterns or workloads. These improvements might or might not have a material impact on a specific workload.
The following performance improvements were introduced in 2024.
Released |
Description |
Impact |
---|---|---|
December 2024 |
Improved sharing of common or similar parts of a query. |
Reduces query execution time for queries with multiple WITH clauses. |
December 2024 |
Improved scaling of document pre-processing and inference in Document AI. |
Decreases processing time of documents. |
November 2024 |
Top-k pruning for queries that contain aggregate functions. |
Expands top-k pruning to include queries that contain aggregate functions. |
October 2024 |
Improved performance for queries that have equivalent (or similar) subqueries or sub-expressions. |
Reduces query execution time by eliminating duplicate parts of a query plan. |
October 2024 |
Improved handling of skew. |
Reduces query execution time by automatically detecting and resolving skew in the build side of joins. |
October 2024 |
Search Optimization Update: Support for join queries. (General Availability) |
Improves the performance of join queries that have a small number of distinct values on the build side of the join. |
October 2024 |
Improved metadata replication. |
Reduces the time spent in the SECONDARY_UPLOADING_INVENTORY, PRIMARY_UPLOADING_METADATA, and SECONDARY_DOWNLOADING_METADATA phases of a replication refresh by optimizing serverless compute allocation. This improvement targets refreshes with larger metadata sizes. |
September 2024 |
Improved cloning operations through parallelization. |
Reduces the time it takes to clone objects, especially for databases and schemas with extensive metadata. |
September 2024 |
Improved replication refreshes through parallelization. |
Reduces the overall refresh time when replicating large volumes of data. |
August 2024 |
Improved performance for LIMIT queries. |
Reduces compilation and execution time for queries that use a LIMIT clause to
return |
July 2024 |
Improved table column synchronization for replication. |
Reduces the time spent in the SECONDARY_DOWNLOADING_METADATA phase of a refresh operation. |
July 2024 |
Improved warehouse utilization for queries that scan only a small amount of micro-partitions when compared to the compute resources that are available to the virtual warehouse. |
Faster execution for queries with expensive operations when scanning data from a small number of micro-partitions, which is common in BI and dashboard use cases. |
July 2024 |
Improved query processing that:
|
Faster execution for some queries with LIMIT clauses and GROUP BY statements. |
June 2024 |
Improved single instruction, multiple data (SIMD) processing. |
|
May 2024 |
Improved efficiency of Automatic Clustering. |
Reduces the cost of Automatic Clustering because it works more efficiently. |
May 2024 |
Improved object replication. |
Reduces the time spent in the SECONDARY_UPLOADING_INVENTORY and SECONDARY_DOWNLOADING_METADATA phases of a refresh operation by optimizing the synchronization of some objects and the authorization mechanism for replication operations. |
May 2024 |
Reduced the latency for loading most Parquet files by up to 50% when the file format option,
USE_VECTORIZED_SCANNER, is set to |
The vectorized scanner is well suited for the columnar format of a Parquet file and reduces the ingestion latency by downloading only relevant sections of the Parquet file into memory, such as the subset of selected columns. |
May 2024 |
Improved evaluation of aggregations so they are made at more intermediate join trees. |
Reduces query execution time for complex queries with aggregations by reducing the amount of data that needs to be processed at the earliest point possible. |
May 2024 |
Improved query execution times for queries that spend a significant amount of time communicating across virtual warehouse nodes. |
Increases throughput between compute resources in a warehouse. Each warehouse is a cluster of compute resources. |
May 2024 |
Improved top-k pruning for LIMIT and ORDER BY queries. |
Reduces execution time for top-k queries due to fewer scanned files and file header reads. Expands existing top-k improvements to include STRING/BINARY support in ORDER BY columns. Further increases pruning efficiency by sorting the scan set in order of largest/smallest files with respect to the value domain. |
May 2024 |
Improved join order decisions by calculating selectivity estimates with more granularity. |
Reduces compilation time and query execution time by calculating selectivity estimates at the micro-partition level. |
May 2024 |
Faster loading time for Python. |
Improves performance for Streamlit in Snowflake apps (including Streamlit apps within a Snowflake Native App), Python worksheets, Python UDFs, and stored procedures in Python. |
April 2024 |
Reduced lock/mutex contention. |
Reduces query execution times by improving scan performance in a variety of scenarios such as highly concurrent queries running on a warehouse. |
April 2024 |
Improved broadcast join decisions. |
Reduces query execution time and improves memory management by optimizing broadcast joins in scenarios like right-deep join trees. |
April 2024 |
Faster query results in Snowsight. |
Reduces the time it takes for query results to appear when run in Snowsight. Improvements are most noticeable for queries that return result sets larger than 10,000 rows. |
March 2024 |
Improved metadata replication. |
Reduces the time spent in the PRIMARY_UPLOADING_METADATA, SECONDARY_DOWNLOADING_METADATA, and SECONDARY_UPLOADING_INVENTORY phases for metadata. |
March 2024 |
Improved query performance as a result of more accurately calculating selectivity estimates in order to optimize the order of joins. |
Reduces execution time when there are mismatches between partition metadata and actual cardinality from join filters. |
March 2024 |
Improved performance for loading JSON files. |
Results in lower ingestion latency of up to 25% for many JSON loading scenarios. |
February 2024 |
Improved object replication. |
Reduces the time spent in the PRIMARY_UPLOADING_METADATA, SECONDARY_DOWNLOADING_METADATA, and SECONDARY_UPLOADING_INVENTORY phases of a refresh operation by optimizing portions of the snapshot operation and the way some objects are added to the replication inventory. |
February 2024 |
Support for the |
Ability to set the |
January 2024 |
Improved execution time for LIMIT 0 queries. |
Reduces execution time for queries that use a count of |
January 2024 |
General Availability of larger warehouses (5X-LARGE and 6X-LARGE) in Microsoft Azure regions, excluding Azure Government regions. |
Ability to use larger compute resources for memory-intensive queries compared to smaller warehouses. |