- Categories:
- System functions (System Information) 
SYSTEM$CLUSTERING_INFORMATION¶
Returns clustering information, including average clustering depth, for a table based on one or more columns in the table.
- See also:
Syntax¶
SYSTEM$CLUSTERING_INFORMATION( '<table_name>'
    [ , { '( <expr1> [ , <expr2> ... ] )' | <number_of_errors> } ] )
Arguments¶
- table_name
- Table for which you want to return clustering information. 
- (expr1 [ , expr2 ... ])
- Column names or expressions for which clustering information is returned: - For a table with no clustering key, this argument is required. If this argument is omitted, an error is returned. 
- For a table with a clustering key, this argument is optional; if the argument is omitted, Snowflake uses the defined clustering key to return clustering information. 
 - Even if only one column name or expression is passed, it must be inside parentheses. - Note - You can use this argument to return clustering information for any columns in the table, regardless of whether a clustering key is defined for the table. - In other words, you can use this to help you decide what clustering to use in the future. 
- number_of_errors
- Number of clustering errors returned by the function. If this argument is omitted, the 10 most recent errors are returned. 
Usage notes¶
- The second argument of the function specifies a column name/expression or a number of errors. You cannot include both arguments in a single function call. 
- The table name, column name, and expression are strings, and should be enclosed in single quotes. 
Returns¶
The function returns a value of type VARCHAR.
The returned string is in JSON format and contains the following name/value pairs:
- cluster_by_keys
- Columns in table used to return clustering information; can be any columns in the table. 
- notes
- This column can contain suggestions to make clustering more efficient. For example, this field might contain a warning if the cardinality of the clustering column is extremely high. - This column can be empty. - For more information about how to cluster efficiently, see Strategies for Selecting Clustering Keys. 
- total_partition_count
- Total number of micro-partitions that comprise the table. 
- total_constant_partition_count
- Total number of micro-partitions for which the value of the specified columns have reached a constant state (i.e. the micro-partitions will not benefit significantly from reclustering). The number of constant micro-partitions in a table has an impact on pruning for queries. The higher the number, the more micro-partitions can be pruned from queries executed on the table, which has a corresponding impact on performance. 
- average_overlaps
- Average number of overlapping micro-partitions for each micro-partition in the table. A high number indicates the table is not well-clustered. 
- average_depth
- Average overlap depth of each micro-partition in the table. A high number indicates the table is not well-clustered. - This value is also returned by SYSTEM$CLUSTERING_DEPTH. 
- partition_depth_histogram
- A histogram depicting the distribution of overlap depth for each micro-partition in the table. The histogram contains buckets with widths: - 0to- 16with increments of- 1.
- For buckets larger than - 16, increments of twice the width of the previous bucket (e.g.- 32,- 64,- 128, …).
 
- clustering_errors
- An array of JSON objects, each with a - timestampand- errorname/value pair. The- errordescribes why automatic clustering was not able to recluster data.- By default, the 10 most recent errors are returned in the array. To return more or fewer errors, specify a number as the second argument of the function. 
For more information about micro-partition overlap and depth, and their impact on query pruning, see Understanding Snowflake Table Structures.
Examples¶
Return the 5 most recent clustering errors:
SELECT SYSTEM$CLUSTERING_INFORMATION('t1', 5);
Return the clustering information for a table using two columns in the table:
SELECT SYSTEM$CLUSTERING_INFORMATION('test2', '(col1, col3)');+--------------------------------------------------------------------+ | SYSTEM$CLUSTERING_INFORMATION('TEST2', '(COL1, COL3)') | |--------------------------------------------------------------------| | { | | "cluster_by_keys" : "LINEAR(COL1, COL3)", | | "total_partition_count" : 1156, | | "total_constant_partition_count" : 0, | | "average_overlaps" : 117.5484, | | "average_depth" : 64.0701, | | "partition_depth_histogram" : { | | "00000" : 0, | | "00001" : 0, | | "00002" : 3, | | "00003" : 3, | | "00004" : 4, | | "00005" : 6, | | "00006" : 3, | | "00007" : 5, | | "00008" : 10, | | "00009" : 5, | | "00010" : 7, | | "00011" : 6, | | "00012" : 8, | | "00013" : 8, | | "00014" : 9, | | "00015" : 8, | | "00016" : 6, | | "00032" : 98, | | "00064" : 269, | | "00128" : 698 | | }, | | "clustering_errors" : [ { | | "timestamp" : "2023-04-03 17:50:42 +0000", | | "error" : "(003325) Clustering service has been disabled.\n" | | } | | ] | | } | +--------------------------------------------------------------------+This example indicates that the
test2table is not well-clustered for the following reasons:
Zero (
0) constant micro-partitions out of1156total micro-partitions.
High average of overlapping micro-partitions.
High average of overlap depth across micro-partitions.
Most of the micro-partitions are grouped at the lower-end of the histogram, with the majority of micro-partitions having an overlap depth between
64and128.
Automatic clustering was previously disabled.
Limitations¶
If a table has more than 2 million partitions:
- The results of the function are based on a subset of the table’s partitions. 
- The value of the output’s - total_partition_countfield is 2 million.