Categories:

Hash Functions

HASH

Returns a signed 64-bit hash value. Note that HASH never returns NULL, even for NULL inputs.

Possible uses for the HASH function include:

  • Convert skewed data values to values that are likely to be more randomly or more evenly distributed.

    For example, you can hash a group of highly skewed values and generate a set of values that are more likely to be randomly distributed or evenly distributed.

  • Put data in buckets. Because hashing can convert skewed data values to closer-to-evenly distributed values, you can use hashing to help take skewed values and create approximately evenly-sized buckets.

    If hashing alone is not sufficient to get the number of distinct buckets that you want, you can combine hashing with the ROUND or WIDTH_BUCKET functions.

Note

HASH is a proprietary function that accepts a variable number of input expressions of arbitrary types and returns a signed value. It is not a cryptographic hash function and should not be used as such.

Cryptographic hash functions have a few properties which this function does not, for example:

  • The cryptographic hashing of a value cannot be inverted to find the original value.

  • Given a value, it is infeasible to find another value with the same cryptographic hash.

For cryptographic purposes, use the SHA families of functions (in String & Binary Functions).

See also:

HASH_AGG

Syntax

HASH( <expr> [ , <expr2> ... ] )

HASH(*)
Copy

Arguments

exprN

The expression can be a general expression of any Snowflake data type.

Returns

Returns a signed 64-bit value as NUMBER(19,0).

HASH never returns NULL, even for NULL inputs.

Usage Notes

  • HASH is stable in the sense that it guarantees:

    • Any two values of type NUMBER that compare equally will hash to the same hash value, even if the respective types have different precision and/or scale.

    • Any two values of type FLOAT that can be converted to NUMBER(38, 0) without loss of precision will hash to the same value. For example, the following all return the same hash value:

      • HASH(10::NUMBER(38,0))

      • HASH(10::NUMBER(5,3))

      • HASH(10::FLOAT)

    • Any two values of type TIMESTAMP_TZ that compare equally will hash to the same hash value, even if the timestamps are from different time zones.

    • This guarantee also applies to NUMBER, FLOAT, and TIMESTAMP_TZ values within a VARIANT column.

    • Note that this guarantee does not apply to other combinations of types, even if implicit conversions exist between the types. For example, with overwhelming probability, the following will not return the same hash values even though 10 = '10' after implicit conversion:

      • HASH(10)

      • HASH('10')

  • HASH(*) means to create a single hashed value based on all columns in the row.

  • Do not use HASH() to create unique keys. HASH() has a finite resolution of 64 bits, and is guaranteed to return non-unique values if more than 2^64 values are entered (e.g. for a table with more than 2^64 rows). In practice, if the input is on the order of 2^32 rows (approximately 4 billion rows) or more, the function is reasonably likely to return at least one duplicate value.

Collation Details

No impact.

  • Two strings that are identical but have different collation specifications have the same hash value. In other words, only the string, not the collation specification, affects the hash value.

  • Two strings that are different, but compare equal according to a collation, might have a different hash value. For example, two strings that are identical using punctuation-insensitive collation will normally have different hash values because only the string, not the collation specification, affects the hash value.

Examples

SELECT HASH(SEQ8()) FROM TABLE(GENERATOR(rowCount=>10));

----------------------+
     HASH(SEQ8())     |
----------------------+
 -6076851061503311999 |
 -4730168494964875235 |
 -3690131753453205264 |
 -7287585996956442977 |
 -1285360004004520191 |
 4801857165282451853  |
 -2112898194861233169 |
 1885958945512144850  |
 -3994946021335987898 |
 -3559031545629922466 |
----------------------+
Copy
SELECT HASH(10), HASH(10::number(38,0)), HASH(10::number(5,3)), HASH(10::float);

---------------------+------------------------+-----------------------+---------------------+
      HASH(10)       | HASH(10::NUMBER(38,0)) | HASH(10::NUMBER(5,3)) |   HASH(10::FLOAT)   |
---------------------+------------------------+-----------------------+---------------------+
 1599627706822963068 | 1599627706822963068    | 1599627706822963068   | 1599627706822963068 |
---------------------+------------------------+-----------------------+---------------------+
Copy
SELECT HASH(10), HASH('10');

---------------------+---------------------+
      HASH(10)       |     HASH('10')      |
---------------------+---------------------+
 1599627706822963068 | 3622494980440108984 |
---------------------+---------------------+
Copy
SELECT HASH(null), HASH(null, null), HASH(null, null, null);

---------------------+--------------------+------------------------+
     HASH(NULL)      |  HASH(NULL, NULL)  | HASH(NULL, NULL, NULL) |
---------------------+--------------------+------------------------+
 8817975702393619368 | 953963258351104160 | 2941948363845684412    |
---------------------+--------------------+------------------------+
Copy

The example below shows that even if the table contains multiple columns, HASH(*) returns a single value per row.

CREATE TABLE orders (order_ID INTEGER, customer_ID INTEGER, order_date ...);

...

SELECT HASH(*) FROM orders LIMIT 10;

----------------------+
       HASH(*)        |
----------------------+
 -3527903796973745449 |
 6296330861892871310  |
 6918165900200317484  |
 -2762842444336053314 |
 -2340602249668223387 |
 5248970923485160358  |
 -5807737826218607124 |
 428973568495579456   |
 2583438210124219420  |
 4041917286051184231  |
----------------------+
Copy