Using the Query Hash to Identify Patterns and Trends in Queries¶

To identify, group, and analyze similar queries in the query history, you can use a hash of the query text. For example, you can:

Group queries by the query hash to identify patterns in expensive queries.
Determine the effects of performance improvements (for example, changes to clustering keys) on repeated queries.

In the following views and table functions, you can use the query_hash and query_parameterized_hash columns to get the hash of the query text:

ACCOUNT_USAGE views (1 year retention)
INFORMATION_SCHEMA table functions (7 days retention)
- QUERY_HISTORY table function
- TASK_HISTORY table function

You can use this hash to analyze repeated queries.

Using the Hash of the Query (`query_hash`)¶

The query_hash column contains a hash value that is computed, based on the canonicalized text of the SQL statement. Repeated queries that have exactly the same query text have the same query_hash values.

Repeated queries also have the same query_hash if their query text differs only in:

Case insensitive identifier, session variable, and stage name

Note that this does not include identifiers specified using IDENTIFIER() with bind variables. Bind variables with different values produce different query hashes.
White space
Comments

If any other part of the query text of two queries differ, those queries have different query_hash values.

For example, the following queries have the same query_hash value because they have exactly the same query text.

SELECT * FROM table1 WHERE table1.name = 'TIM'

Copy

SELECT * FROM table1 WHERE table1.name = 'TIM'

Copy

You can use the query_hash value to find patterns in query performance that might not be obvious otherwise. For example, although a query might not be excessively expensive during any single execution, a frequently repeated query could lead to high costs, based on the number of times it runs. You can use the query_hash value to identify the queries to focus on optimizing first.

For example, the following query uses the query_hash value to identify the query IDs for the 100 longest-running queries:

SELECT
    query_hash,
    COUNT(*),
    SUM(total_elapsed_time),
    ANY_VALUE(query_id)
  FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
  WHERE warehouse_name = 'MY_WAREHOUSE'
    AND DATE_TRUNC('day', start_time) >= CURRENT_DATE() - 7
  GROUP BY query_hash
  ORDER BY SUM(total_elapsed_time) DESC
  LIMIT 100;

Copy

Using the Hash of the Parameterized Query (`query_parameterized_hash`)¶

query_parameterized_hash contains a hash value that is computed based on the parameterized query, which means the version of the query after literals are parameterized. These literals must be used in the query predicate and must be used with one of the following comparison operators:

= (equal to)
!= (not equal to)
>= (greater than or equal to)
<= (less than or equal to)

Repeated queries (including those with different parameter values) have the same query_parameterized_hash value.

Repeated queries also have the same query_parameterized_hash if their query text differs only in:

Case insensitive identifier, session variable, and stage name

Note that this does not include identifiers specified using IDENTIFIER() with bind variables. Bind variables with different values produce different query hashes.
White space
Comments

Queries that have the same query_hash value also have the same query_parameterized_hash value, but not vice versa.

For example, the following queries have the same query_parameterized_hash value because the literal values are the only difference between the queries:

SELECT * FROM table1 WHERE table1.name = 'TIM'

Copy

SELECT * FROM table1 WHERE table1.name = 'AIHUA'

Copy

As is the case with the query_hash value, you can use the query_parameterized_hash value to find patterns in query performance that might not be obvious otherwise.

The following statement computes the average total_elapsed_time each day for all queries with a specific query_parameterized_hash value (cbd58379a88c37ed6cc0ecfebb053b03):

SELECT
    DATE_TRUNC('day', start_time),
    SUM(total_elapsed_time),
    ANY_VALUE(query_id)
  FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
  WHERE query_parameterized_hash = 'cbd58379a88c37ed6cc0ecfebb053b03'
    AND DATE_TRUNC('day', start_time) >= CURRENT_DATE() - 30
  GROUP BY DATE_TRUNC('day', start_time);

Copy

Checking the Version That Was Used to Generate the Hash¶

Over time, the logic used by Snowflake to generate the query hash can change. Changes to this logic can result in different hashes produced for the same query. For example, for a given query, the hash generated by version 1 of the logic might differ from the hash generated by version 2 of the logic.

The views and table function output that include the query_hash and query_parameterized_hash columns also include the following columns that specify the version of the logic used to produce the hashes:

query_hash_version
query_parameterized_hash_version

The version number in these columns is a NUMBER (for example, 1 for the first version of the logic, 2 for the second version of the logic, etc.).

If these columns contain different version numbers for different periods of time, you can use these version columns to identify the different hashes for the same query. For example:

...
WHERE (query_hash = 'hash_from_v1' AND query_hash_version = 1)
  OR (query_hash = 'hash_from_v2' AND query_hash_version = 2)

Copy

Using the Query Hash to Identify Patterns and Trends in Queries¶

Using the Hash of the Query (query_hash)¶

Using the Hash of the Parameterized Query (query_parameterized_hash)¶

Checking the Version That Was Used to Generate the Hash¶

Using the Hash of the Query (`query_hash`)¶

Using the Hash of the Parameterized Query (`query_parameterized_hash`)¶