Categories:

Table Functions

TOP_INSIGHTS (SNOWFLAKE.ML)

Finds the most important dimensions in a dataset, builds segments from those dimensions, and then detects which of those segments influenced the metric.

TOP_INSIGHTS is well-suited to extracting root causes from datasets that have a large number of dimensions. Continuous dimensions are also supported without pre-processing them into categorical dimensions, and the results can indicate dimensions with negative conditions (for example, “region is not North America”).

Syntax

SNOWFLAKE.ML.TOP_INSIGHTS(
  <categorical_dimensions>, <continuous_dimensions>,
  <metric>, <label> )
Copy

Arguments

categorical_dimensions

OBJECT containing a 1:1 mapping between dimension names and associated categorical columns. The value can be from a single column or derived from a simple combination of columns.

continuous_dimensions

OBJECT containing a 1:1 mapping between dimension names and associated continuous columns. The value can be from a single column or derived from a simple combination of columns. Values of continuous dimensions must not be NULL.

metric

FLOAT column representing a target metric that is being investigated. This value must be strictly non-negative. The value can be from a single column or derived from a simple combination of columns.

label

BOOLEAN column that distinguishes between control and test data. TRUE represents test data, and FALSE represents control data. The value can be from a single column or derived from a simple combination of columns (for example, a date comparison).

Output

The function returns the following columns:

Column Name

Data Type

Description

contributor

ARRAY

ARRAY of strings that define a segment or insight from the algorithm.

For example:

[
  "not country = canada",
  "length_of_vertical <= 4.5",
  "vertical = finance"
]
Copy

metric_control

FLOAT

The total value of the metric in the control period in a specific segment.

metric_test

FLOAT

The total value of the metric in the test period in a specific segment.

surprise

FLOAT

The amount that metric_test exceeds its expected value based on the overall metric change across periods.

relative_change

FLOAT

A quantification of how the metric in the specific segment changes across time periods compared to the overall metric across the same periods:

  • A relative change higher than 1.0 means that this segment grew at a higher rate than the overall metric across time periods, driving growth.

  • A relative change lower than 1.0 means that this segment grew at a lower rate than the overall metric, driving decline.

growth_rate

FLOAT

The total value of the metric in the control period across all segments.

expected_metric_test

FLOAT

The expected value of the metric in the test period, based on the relationship between overall_metric_control and overall_metric_test.

overall_metric_control

FLOAT

The total value of the metric in the control period across all segments.

overall_metric_test

FLOAT

The total value of the metric in the test period across all segments.

overall_growth_rate

FLOAT

The growth rate between the control and test periods across all segments, defined as overall_metric_test divided by overall_metric_control.

new_in_test

BOOLEAN

Indicates whether the specific segment is new in the test data.

missing_in_test

BOOLEAN

Indicates whether the specific segment is missing in the test data.

Usage Notes

  • Metrics must be non-negative.

  • Input data must be restricted to only test or control data.

  • Runtime scales with the number of dimensions and the cardinality of those dimensions.

  • Cardinality of categorical dimensions is automatically reduced when their cardinality exceeds 5.

Example

See the Contribution Explorer example.