Categories:

Aggregate Functions (Frequency Estimation) , Window Functions

# APPROX_TOP_K_ESTIMATE¶

Returns the approximate most frequent values and their estimated frequency for the given Space-Saving state. (For more information about the Space-Saving summary, see Estimating Frequent Values.)

A Space-Saving state produced by APPROX_TOP_K_ACCUMULATE and APPROX_TOP_K_COMBINE can be used to compute a cardinality estimate using the APPROX_TOP_K_ESTIMATE function.

Thus, APPROX_TOP_K_ESTIMATE(APPROX_TOP_K_ACCUMULATE(…)) is equivalent to APPROX_TOP_K(…).

## Syntax¶

APPROX_TOP_K_ESTIMATE( <state> [ , <k> ] )


## Arguments¶

state

An expression that contains state information generated by a call to APPROX_TOP_K_ACCUMULATE or APPROX_TOP_K_COMBINE.

k

The number of values whose counts you want approximated. For example, if you want to see the top 10 most common values, then set k to 10.

If k is omitted, the default is 1.

The maximum value is 100000 (100,000), and is automatically reduced if items cannot fit in the output.

## Returns¶

Returns a value of type ARRAY.

## Examples¶

This example shows how to use the three related functions APPROX_TOP_K_ACCUMULATE, APPROX_TOP_K_ESTIMATE, and APPROX_TOP_K_COMBINE.

Note

This example uses more counters than distinct data values in order to get consistent results. In real-world applications, the number of distinct values is usually larger than the number of counters, so the approximations can vary.

This example generates one table with 8 rows that have values 1 - 8, and a second table with 8 rows that have values 5 - 12. Thus the most frequent values in the union of the two tables are the values 5-8, each of which has a count of 2.

Create a simple table and data:

-- Create a sequence to use to generate values for the table.
CREATE OR REPLACE SEQUENCE seq91;
CREATE OR REPLACE TABLE sequence_demo (c1 INTEGER DEFAULT seq91.nextval, dummy SMALLINT);
INSERT INTO sequence_demo (dummy) VALUES (0);

-- Double the number of rows a few times, until there are 8 rows:
INSERT INTO sequence_demo (dummy) SELECT dummy FROM sequence_demo;
INSERT INTO sequence_demo (dummy) SELECT dummy FROM sequence_demo;
INSERT INTO sequence_demo (dummy) SELECT dummy FROM sequence_demo;


Create a table that contains the “state” that represents the current approximate Top K information for the table named sequence_demo:

CREATE OR REPLACE TABLE resultstate1 AS (
SELECT approx_top_k_accumulate(c1, 50) AS rs1
FROM sequence_demo);


Now create a second table and add data. (In a more realistic situation, the user could have loaded more data into the first table and divided the data into non-overlapping sets based on the time that the data was loaded.)

CREATE OR REPLACE TABLE test_table2 (c1 INTEGER);
-- Insert data.
INSERT INTO test_table2 (c1) SELECT c1 + 4 FROM sequence_demo;


Get the “state” information for just the new data.

CREATE OR REPLACE TABLE resultstate2 AS
(SELECT approx_top_k_accumulate(c1, 50) AS rs1
FROM test_table2);


Combine the “state” information for the two batches of rows:

CREATE OR REPLACE TABLE combined_resultstate (c1) AS
SELECT approx_top_k_combine(rs1) AS apc1
FROM (
SELECT rs1 FROM resultstate1
UNION ALL
SELECT rs1 FROM resultstate2
)
;


Get the approximate Top K value of the combined set of rows:

SELECT approx_top_k_estimate(c1, 4) FROM combined_resultstate;


Output:

+------------------------------+
| APPROX_TOP_K_ESTIMATE(C1, 4) |
|------------------------------|
| [                            |
|   [                          |
|     5,                       |
|     2                        |
|   ],                         |
|   [                          |
|     6,                       |
|     2                        |
|   ],                         |
|   [                          |
|     7,                       |
|     2                        |
|   ],                         |
|   [                          |
|     8,                       |
|     2                        |
|   ]                          |
| ]                            |
+------------------------------+