Top-K pruning for improved query performance¶
If a SELECT statement contains LIMIT and ORDER BY clauses, Snowflake ordinarily scans all eligible rows because any row might be part of the top-K results, where K is the value from the LIMIT clause. With top-K pruning, Snowflake stops scanning when it determines that none of the remaining rows can be in a result set that consists of K records.
Top-K pruning can improve the performance of SELECT statements that contain LIMIT and ORDER BY clauses. Queries on large tables benefit the most from top-K pruning.
Queries that use top-K pruning¶
Snowflake applies top-K pruning only when all of the following are true:
The query contains both an ORDER BY clause and a LIMIT clause.
The first column specified in the ORDER BY clause has one of the following data types:
An integer-representable data type (that is, an INTEGER type, a DATE type, or a TIMESTAMP type). Expressions that return integers, such as casts, are not supported.
A string or binary data type, including collated strings.
A field in a VARIANT column with a supported underlying type (that is, a type listed in the previous two bulleted list items) and cast to that underlying type.
If multiple columns are specified, Snowflake considers only the first column.
When the query contains a join, the ORDER BY column is a column from the larger table. In data warehousing, the larger table is often referred to as the fact table or probe side. The smaller table is referred to as the dimension table.
Queries with LIMIT clauses that are already fast (such as queries in which a full table scan is fast) might not benefit from top-K pruning. Queries that return fewer than K rows also don’t benefit.
Queries that contain ORDER BY … DESCENDING on a nullable field are pruned only if they also specify NULLS LAST.
Queries on VARIANT columns¶
This section provides examples of queries on a field in a VARIANT column to show the types of queries that can use top-K pruning.
Create a table with a VARIANT column and insert data:
This table is relatively small to provide an example, but remember that top-K pruning benefits larger tables.
The following queries on this table can use top-K pruning:
The following query can’t use top-K pruning because the value isn’t cast to the underlying data type:
The following query can’t use top-K pruning because the value is cast to a data type that is different from the underlying data type:
Queries that contain an aggregate function¶
Queries that contain an aggregate function are pruned only if they meet all of the following conditions:
They include a GROUP BY clause.
The first ORDER BY column is also a GROUP BY column.
For example, the following query can use top-K pruning because the first ORDER BY column c2 is also a GROUP BY
column and isn’t an aggregated column:
The following query can’t use top-K pruning because the first ORDER BY column agg_col is an aggregated column: