Top-K pruning for improved query performance

If a SELECT statement contains LIMIT and ORDER BY clauses, Snowflake ordinarily scans all eligible rows because any row might be part of the top-K results, where K is the value from the LIMIT clause. With top-K pruning, Snowflake stops scanning when it determines that none of the remaining rows can be in a result set that consists of K records.

Top-K pruning can improve the performance of SELECT statements that contain LIMIT and ORDER BY clauses. Queries on large tables benefit the most from top-K pruning.

Queries that use top-K pruning

Snowflake applies top-K pruning only when all of the following are true:

  • The query contains both an ORDER BY clause and a LIMIT clause.

  • The first column specified in the ORDER BY clause has one of the following data types:

    If multiple columns are specified, Snowflake considers only the first column.

  • When the query contains a join, the ORDER BY column is a column from the larger table. In data warehousing, the larger table is often referred to as the fact table or probe side. The smaller table is referred to as the dimension table.

Queries with LIMIT clauses that are already fast (such as queries in which a full table scan is fast) might not benefit from top-K pruning. Queries that return fewer than K rows also don’t benefit.

Queries that contain ORDER BY … DESCENDING on a nullable field are pruned only if they also specify NULLS LAST.

Queries on VARIANT columns

This section provides examples of queries on a field in a VARIANT column to show the types of queries that can use top-K pruning.

Create a table with a VARIANT column and insert data:

CREATE OR REPLACE TABLE variant_topk_test (var_col VARIANT);

INSERT INTO variant_topk_test
  SELECT PARSE_JSON(column1)
    FROM VALUES
      ('{"s": "aa", "i": 1}'),
      ('{"s": "bb", "i": 2}'),
      ('{"s": "cc", "i": 3}'),
      ('{"s": "dd", "i": 4}'),
      ('{"s": "ee", "i": 5}'),
      ('{"s": "ff", "i": 6}'),
      ('{"s": "gg", "i": 7}'),
      ('{"s": "hh", "i": 8}'),
      ('{"s": "ii", "i": 9}'),
      ('{"s": "jj", "i": 10}');
Copy

This table is relatively small to provide an example, but remember that top-K pruning benefits larger tables.

The following queries on this table can use top-K pruning:

SELECT * FROM variant_topk_test ORDER BY TO_VARCHAR(var_col:s) LIMIT 5;
Copy
SELECT * FROM variant_topk_test ORDER BY var_col:s::VARCHAR LIMIT 5;
Copy
SELECT * FROM variant_topk_test ORDER BY TO_NUMBER(var_col:i) LIMIT 5;
Copy
SELECT * FROM variant_topk_test ORDER BY var_col:i::NUMBER LIMIT 5;
Copy

The following query can’t use top-K pruning because the value isn’t cast to the underlying data type:

SELECT * FROM variant_topk_test ORDER BY var_col:s LIMIT 5;
Copy

The following query can’t use top-K pruning because the value is cast to a data type that is different from the underlying data type:

SELECT * FROM variant_topk_test ORDER BY var_col:i::VARCHAR LIMIT 5;
Copy

Queries that contain an aggregate function

Queries that contain an aggregate function are pruned only if they meet all of the following conditions:

  • They include a GROUP BY clause.

  • The first ORDER BY column is also a GROUP BY column.

  • The first ORDER BY column isn’t an aggregated column.

For example, the following query can use top-K pruning because the first ORDER BY column c2 is also a GROUP BY column and isn’t an aggregated column:

SELECT c1, c2, c3, COUNT(*) AS agg_col
  FROM mytable
  GROUP BY c1, c2, c3
  ORDER BY c2, c1, agg_col, c3
  LIMIT 5;
Copy

The following query can’t use top-K pruning because the first ORDER BY column agg_col is an aggregated column:

SELECT c1, c2, c3, COUNT(*) AS agg_col
  FROM mytable
  GROUP BY c1, c2, c3
  ORDER BY agg_col, c2, c1
  LIMIT 5;
Copy