- Categories:
Aggregate functions (Similarity Estimation) , Window functions (Similarity Estimation)
MINHASH_COMBINE¶
Combines input MinHash states into a single MinHash output state. This Minhash state can then be input to the APPROXIMATE_SIMILARITY function to estimate the similarity with other MinHash states.
This allows use cases in which MINHASH is run over horizontal rowsets of the same table, producing a MinHash state for each rowset. These states can then be combined using MINHASH_COMBINE, producing the same output state as a single run of MINHASH over the entire table.
For more information about MinHash states, see Estimating Similarity of Two or More Sets.
- See also:
Syntax¶
Aggregate function
Window function
For details about the OVER clause, see Window function syntax and usage.
Arguments¶
stateAn expression that contains MinHash state information generated by a call to MINHASH. Input MinHash states must have arrays of equal length.
Usage notes¶
This function can be used as an aggregate function or a window function.
DISTINCT can be included as an argument, but has no effect.
Examples¶
Here is a more extensive example, showing the three related functions
MINHASH, MINHASH_COMBINE and APPROXIMATE_SIMILARITY. This
example creates 3 tables (ta, tb, and tc), two of which (ta and tb) are
similar, and two of which (ta and tc) are completely dissimilar.
Create and populate tables with values:
Calculate minhash info for the initial set of data:
Add more data to one of the tables:
Demonstrate the MINHASH_COMBINE function:
This query shows the approximate similarity of the two similar tables
(ta and tb):
This query shows the approximate similarity of the two very different tables
(ta and tc):