Categories:

Aggregate Functions (Cardinality Estimation) , Window Functions

APPROX_COUNT_DISTINCT¶

Uses HyperLogLog to return an approximation of the distinct cardinality of the input (i.e. HLL(col1, col2, ... ) returns an approximation of COUNT(DISTINCT col1, col2, ... )).

Aliases:

HLL.

Syntax¶

Aggregate function

APPROX_COUNT_DISTINCT( [ DISTINCT ] <expr1>  [ , ... ] )

APPROX_COUNT_DISTINCT(*)


Window function

APPROX_COUNT_DISTINCT( [ DISTINCT ] <expr1>  [ , ... ] ) OVER ( [ PARTITION BY <expr2> ] )

APPROX_COUNT_DISTINCT(*) OVER ( [ PARTITION BY <expr2> ] )


Arguments¶

expr1

This is the expression for which you want to know the number of distinct values.

expr2

This is the optional expression used to group rows into partitions.

Returns¶

The data type of the returned value is INTEGER.

Usage Notes¶

• Although the computation is an approximation, it is deterministic. When this function is called with the same input data, this function returns the same results.

• For information about NULL values and aggregate functions, see Aggregate Functions and NULL Values.

• When used as a window function:

• This function does not support:

• ORDER BY sub-clause in the OVER() clause.

• Window frames.

Examples¶

This example shows how to use APPROX_COUNT_DISTINCT and its alias HLL. This example calls both COUNT(DISTINCT i) and APPROX_COUNT_DISTINCT(i) to emphasize that the results of those two functions do not always match exactly.

The exact output of the following query might vary because APPROX_COUNT_DISTINCT() returns an approximation, not an exact value.

SELECT COUNT(i), COUNT(DISTINCT i), APPROX_COUNT_DISTINCT(i), HLL(i)
FROM sequence_demo;
+----------+-------------------+--------------------------+--------+
| COUNT(I) | COUNT(DISTINCT I) | APPROX_COUNT_DISTINCT(I) | HLL(I) |
|----------+-------------------+--------------------------+--------|
|     1024 |              1024 |                     1007 |   1007 |
+----------+-------------------+--------------------------+--------+