- Categories:
System functions (System Information)
SYSTEM$CLUSTERING_INFORMATION¶
Returns clustering information, including average clustering depth, for a table based on one or more columns in the table.
- See also:
Syntax¶
SYSTEM$CLUSTERING_INFORMATION( '<table_name>'
[ , { '( <expr1> [ , <expr2> ... ] )' | <number_of_errors> } ] )
Arguments¶
table_nameTable for which you want to return clustering information.
(expr1 [ , expr2 ... ])Column names or expressions for which clustering information is returned:
For a table with no clustering key, this argument is required. If this argument is omitted, an error is returned.
For a table with a clustering key, this argument is optional; if the argument is omitted, Snowflake uses the defined clustering key to return clustering information.
Even if only one column name or expression is passed, it must be inside parentheses.
Note
You can use this argument to return clustering information for any columns in the table, regardless of whether a clustering key is defined for the table.
In other words, you can use this to help you decide what clustering to use in the future.
number_of_errorsNumber of clustering errors returned by the function. If this argument is omitted, the 10 most recent errors are returned.
Usage notes¶
The second argument of the function specifies a column name/expression or a number of errors. You cannot include both arguments in a single function call.
The table name, column name, and expression are strings, and should be enclosed in single quotes.
Returns¶
The function returns a value of type VARCHAR.
The returned string is in JSON format and contains the following name/value pairs:
cluster_by_keysColumns in table used to return clustering information; can be any columns in the table.
notesThis column can contain suggestions to make clustering more efficient. For example, this field might contain a warning if the cardinality of the clustering column is extremely high.
This column can be empty.
For more information about how to cluster efficiently, see Strategies for Selecting Clustering Keys.
total_partition_countTotal number of micro-partitions that comprise the table.
total_constant_partition_countTotal number of micro-partitions for which the value of the specified columns have reached a constant state (i.e. the micro-partitions will not benefit significantly from reclustering). The number of constant micro-partitions in a table has an impact on pruning for queries. The higher the number, the more micro-partitions can be pruned from queries executed on the table, which has a corresponding impact on performance.
average_overlapsAverage number of overlapping micro-partitions for each micro-partition in the table. A high number indicates the table is not well-clustered.
average_depthAverage overlap depth of each micro-partition in the table. A high number indicates the table is not well-clustered.
This value is also returned by SYSTEM$CLUSTERING_DEPTH.
partition_depth_histogramA histogram depicting the distribution of overlap depth for each micro-partition in the table. The histogram contains buckets with widths:
0to16with increments of1.For buckets larger than
16, increments of twice the width of the previous bucket (e.g.32,64,128, …).
clustering_errorsAn array of JSON objects, each with a
timestampanderrorname/value pair. Theerrordescribes why automatic clustering was not able to recluster data.By default, the 10 most recent errors are returned in the array. To return more or fewer errors, specify a number as the second argument of the function.
For more information about micro-partition overlap and depth, and their impact on query pruning, see Understanding Snowflake Table Structures.
Examples¶
Return the 5 most recent clustering errors:
SELECT SYSTEM$CLUSTERING_INFORMATION('t1', 5);
Return the clustering information for a table using two columns in the table:
SELECT SYSTEM$CLUSTERING_INFORMATION('test2', '(col1, col3)');+--------------------------------------------------------------------+ | SYSTEM$CLUSTERING_INFORMATION('TEST2', '(COL1, COL3)') | |--------------------------------------------------------------------| | { | | "cluster_by_keys" : "LINEAR(COL1, COL3)", | | "total_partition_count" : 1156, | | "total_constant_partition_count" : 0, | | "average_overlaps" : 117.5484, | | "average_depth" : 64.0701, | | "partition_depth_histogram" : { | | "00000" : 0, | | "00001" : 0, | | "00002" : 3, | | "00003" : 3, | | "00004" : 4, | | "00005" : 6, | | "00006" : 3, | | "00007" : 5, | | "00008" : 10, | | "00009" : 5, | | "00010" : 7, | | "00011" : 6, | | "00012" : 8, | | "00013" : 8, | | "00014" : 9, | | "00015" : 8, | | "00016" : 6, | | "00032" : 98, | | "00064" : 269, | | "00128" : 698 | | }, | | "clustering_errors" : [ { | | "timestamp" : "2023-04-03 17:50:42 +0000", | | "error" : "(003325) Clustering service has been disabled.\n" | | } | | ] | | } | +--------------------------------------------------------------------+This example indicates that the
test2table is not well-clustered for the following reasons:
Zero (
0) constant micro-partitions out of1156total micro-partitions.High average of overlapping micro-partitions.
High average of overlap depth across micro-partitions.
Most of the micro-partitions are grouped at the lower-end of the histogram, with the majority of micro-partitions having an overlap depth between
64and128.Automatic clustering was previously disabled.
Limitations¶
If a table has more than 2 million partitions:
The results of the function are based on a subset of the table’s partitions.
The value of the output’s
total_partition_countfield is 2 million.