- Categories:
Aggregate functions (Frequency Estimation) , Window function syntax and usage
APPROX_TOP_K_ESTIMATE¶
Returns the approximate most frequent values and their estimated frequency for the given Space-Saving state. (For more information about the Space-Saving summary, see Estimating Frequent Values.)
A Space-Saving state produced by APPROX_TOP_K_ACCUMULATE and APPROX_TOP_K_COMBINE can be used to compute a cardinality estimate using the APPROX_TOP_K_ESTIMATE function.
Thus, APPROX_TOP_K_ESTIMATE(APPROX_TOP_K_ACCUMULATE(…)) is equivalent to APPROX_TOP_K(…).
- See also:
APPROX_TOP_K , APPROX_TOP_K_ACCUMULATE , APPROX_TOP_K_COMBINE
Syntax¶
Arguments¶
stateAn expression that contains state information generated by a call to APPROX_TOP_K_ACCUMULATE or APPROX_TOP_K_COMBINE.
kThe number of values whose counts you want approximated. For example, if you want to see the top 10 most common values, then set
kto 10.If
kis omitted, the default is1.The maximum value is
100000(100,000), and is automatically reduced if items cannot fit in the output.
Returns¶
Returns a value of type ARRAY.
Usage notes¶
Decimal-float (DECFLOAT) values aren’t supported.
Examples¶
This example shows how to use the three related functions APPROX_TOP_K_ACCUMULATE, APPROX_TOP_K_ESTIMATE, and APPROX_TOP_K_COMBINE.
Note
This example uses more counters than distinct data values in order to get consistent results. In real-world applications, the number of distinct values is usually larger than the number of counters, so the approximations can vary.
This example generates one table with 8 rows that have values 1 - 8, and a second table with 8 rows that have values 5 - 12. Thus the most frequent values in the union of the two tables are the values 5-8, each of which has a count of 2.
Create a simple table and data:
Create a table that contains the “state” that represents the current
approximate Top K information for the table named sequence_demo:
Now create a second table and add data. (In a more realistic situation, the user could have loaded more data into the first table and divided the data into non-overlapping sets based on the time that the data was loaded.)
Get the “state” information for just the new data.
Combine the “state” information for the two batches of rows:
Get the approximate Top K value of the combined set of rows: