- Categories:
TOP_INSIGHTS¶
Fully qualified name: SNOWFLAKE.ML.TOP_INSIGHTS
Finds the most important dimensions in a dataset, builds segments from those dimensions, and then detects which of those segments influenced the metric.
TOP_INSIGHTS is well-suited to extracting root causes from datasets that have a large number of dimensions. Continuous dimensions are also supported without pre-processing them into categorical dimensions, and the results can indicate dimensions with negative conditions (for example, “region is not North America”).
Syntax¶
SNOWFLAKE.ML.TOP_INSIGHTS(
<categorical_dimensions>, <continuous_dimensions>,
<metric>, <label> )
Arguments¶
categorical_dimensions
OBJECT containing a 1:1 mapping between dimension names and associated categorical columns. The value can be from a single column or derived from a simple combination of columns.
continuous_dimensions
OBJECT containing a 1:1 mapping between dimension names and associated continuous columns. The value can be from a single column or derived from a simple combination of columns. Values of continuous dimensions must not be NULL.
metric
FLOAT column representing a target metric that is being investigated. This value must be strictly non-negative. The value can be from a single column or derived from a simple combination of columns.
label
BOOLEAN column that distinguishes between control and test data.
TRUE
represents test data, andFALSE
represents control data. The value can be from a single column or derived from a simple combination of columns (for example, a date comparison).
Output¶
The function returns the following columns:
Column Name |
Data Type |
Description |
---|---|---|
|
ARRAY of strings that define a segment or insight from the algorithm. For example: [
"not country = canada",
"length_of_vertical <= 4.5",
"vertical = finance"
]
|
|
|
The total value of the metric in the control period in a specific segment. |
|
|
The total value of the metric in the test period in a specific segment. |
|
|
The amount that |
|
|
A quantification of how the metric in the specific segment changes across time periods compared to the overall metric across the same periods:
|
|
|
The total value of the metric in the control period across all segments. |
|
|
The expected value of the metric in the test period, based on the relationship between |
|
|
The total value of the metric in the control period across all segments. |
|
|
The total value of the metric in the test period across all segments. |
|
|
The growth rate between the control and test periods across all segments, defined as |
|
|
Indicates whether the specific segment is new in the test data. |
|
|
Indicates whether the specific segment is missing in the test data. |
Usage Notes¶
Metrics must be non-negative.
Input data must be restricted to only test or control data.
Runtime scales with the number of dimensions and the cardinality of those dimensions.
Cardinality of categorical dimensions is automatically reduced when their cardinality exceeds 5.
Example¶
See the Contribution Explorer example.