- Categories:
System functions (System Information)
SYSTEM$CLUSTERING_INFORMATION¶
Returns clustering information, including average clustering depth, for a table based on one or more columns in the table.
- See also:
Syntax¶
SYSTEM$CLUSTERING_INFORMATION( '<table_name>'
[ , { '( <expr1> [ , <expr2> ... ] )' | <number_of_errors> } ] )
Arguments¶
table_name
Table for which you want to return clustering information.
(expr1 [ , expr2 ... ])
Column names or expressions for which clustering information is returned:
For a table with no clustering key, this argument is required. If this argument is omitted, an error is returned.
For a table with a clustering key, this argument is optional; if the argument is omitted, Snowflake uses the defined clustering key to return clustering information.
Even if only one column name or expression is passed, it must be inside parentheses.
Note
You can use this argument to return clustering information for any columns in the table, regardless of whether a clustering key is defined for the table.
In other words, you can use this to help you decide what clustering to use in the future.
number_of_errors
Number of clustering errors returned by the function. If this argument is omitted, the 10 most recent errors are returned.
Usage notes¶
The second argument of the function specifies a column name/expression or a number of errors. You cannot include both arguments in a single function call.
The table name, column name, and expression are strings, and should be enclosed in single quotes.
Output¶
The function returns a value of type VARCHAR.
The returned string is in JSON format and contains the following name/value pairs:
cluster_by_keys
Columns in table used to return clustering information; can be any columns in the table.
notes
This column can contain suggestions to make clustering more efficient. For example, this field might contain a warning if the cardinality of the clustering column is extremely high.
This column can be empty.
For more information about how to cluster efficiently, see Strategies for Selecting Clustering Keys.
total_partition_count
Total number of micro-partitions that comprise the table.
total_constant_partition_count
Total number of micro-partitions for which the value of the specified columns have reached a constant state (i.e. the micro-partitions will not benefit significantly from reclustering). The number of constant micro-partitions in a table has an impact on pruning for queries. The higher the number, the more micro-partitions can be pruned from queries executed on the table, which has a corresponding impact on performance.
average_overlaps
Average number of overlapping micro-partitions for each micro-partition in the table. A high number indicates the table is not well-clustered.
average_depth
Average overlap depth of each micro-partition in the table. A high number indicates the table is not well-clustered.
This value is also returned by SYSTEM$CLUSTERING_DEPTH.
partition_depth_histogram
A histogram depicting the distribution of overlap depth for each micro-partition in the table. The histogram contains buckets with widths:
0
to16
with increments of1
.For buckets larger than
16
, increments of twice the width of the previous bucket (e.g.32
,64
,128
, …).
clustering_errors
An array of JSON objects, each with a
timestamp
anderror
name/value pair. Theerror
describes why automatic clustering was not able to recluster data.By default, the 10 most recent errors are returned in the array. To return more or fewer errors, specify a number as the second argument of the function.
For more information about micro-partition overlap and depth, and their impact on query pruning, see Understanding Snowflake Table Structures.
Examples¶
Return the 5 most recent clustering errors:
SELECT SYSTEM$CLUSTERING_INFORMATION('t1', 5);
Return the clustering information for a table using two columns in the table:
SELECT SYSTEM$CLUSTERING_INFORMATION('test2', '(col1, col3)');+--------------------------------------------------------------------+ | SYSTEM$CLUSTERING_INFORMATION('TEST2', '(COL1, COL3)') | |--------------------------------------------------------------------| | { | | "cluster_by_keys" : "LINEAR(COL1, COL3)", | | "total_partition_count" : 1156, | | "total_constant_partition_count" : 0, | | "average_overlaps" : 117.5484, | | "average_depth" : 64.0701, | | "partition_depth_histogram" : { | | "00000" : 0, | | "00001" : 0, | | "00002" : 3, | | "00003" : 3, | | "00004" : 4, | | "00005" : 6, | | "00006" : 3, | | "00007" : 5, | | "00008" : 10, | | "00009" : 5, | | "00010" : 7, | | "00011" : 6, | | "00012" : 8, | | "00013" : 8, | | "00014" : 9, | | "00015" : 8, | | "00016" : 6, | | "00032" : 98, | | "00064" : 269, | | "00128" : 698 | | }, | | "clustering_errors" : [ { | | "timestamp" : "2023-04-03 17:50:42 +0000", | | "error" : "(003325) Clustering service has been disabled.\n" | | } | | ] | | } | +--------------------------------------------------------------------+This example indicates that the
test2
table is not well-clustered for the following reasons:
Zero (
0
) constant micro-partitions out of1156
total micro-partitions.High average of overlapping micro-partitions.
High average of overlap depth across micro-partitions.
Most of the micro-partitions are grouped at the lower-end of the histogram, with the majority of micro-partitions having an overlap depth between
64
and128
.Automatic clustering was previously disabled.
Limitations¶
If a table has more than 2 million partitions:
The results of the function are based on a subset of the table’s partitions.
The value of the output’s
total_partition_count
field is 2 million.