ANOMALY_DETECTION

Fully qualified name: SNOWFLAKE.ML.ANOMALY_DETECTION

Anomaly detection allows you to detect outliers in your time series data by using a machine learning algorithm. You use CREATE SNOWFLAKE.ML.ANOMALY_DETECTION to create and train the detection model, and then use the <name>!DETECT_ANOMALIES method to detect anomalies.

Important

Legal notice. This Snowflake Cortex ML-Based function is powered by machine learning technology. Machine learning technology and results provided may be inaccurate, inappropriate, or biased. Decisions based on machine learning outputs, including those built into automatic pipelines, should have human oversight and review processes to ensure model-generated content is accurate. Snowflake Cortex ML-based function queries will be treated as any other SQL query and may be considered metadata.

Metadata. When you use Snowflake Cortex ML-Based functions, Snowflake logs generic error messages returned by an ML function, in addition to what is mentioned in Metadata Fields. These error logs help us troubleshoot issues that arise and improve these functions to serve you better.

CREATE SNOWFLAKE.ML.ANOMALY_DETECTION

Creates an object that you use to detect anomalies or replaces the anomaly detection object of the same name.

Syntax

CREATE [ OR REPLACE ] SNOWFLAKE.ML.ANOMALY_DETECTION <name>(
  INPUT_DATA => <reference_to_training_data>,
  [ SERIES_COLNAME => '<series_column_name>', ]
  TIMESTAMP_COLNAME => '<timestamp_column_name>',
  TARGET_COLNAME => '<target_column_name>',
  LABEL_COLNAME => '<label_column_name>',
  [ CONFIG_OBJECT => <config_object> ]
)
[ [ WITH ] TAG ( <tag_name> = '<tag_value>' [ , <tag_name> = '<tag_value>' , ... ] ) ]
[ COMMENT = '<string_literal>' ]
Copy

Parameters

name

Specifies the identifier (name) for the anomaly detector object; must be unique for the schema in which the object is created.

In addition, the identifier must start with an alphabetic character and cannot contain spaces or special characters unless the entire identifier string is enclosed in double quotes (for example, "My object"). Identifiers enclosed in double quotes are also case-sensitive. For more details, see Identifier requirements.

Constructor Arguments

Required:

INPUT_DATA => reference_to_training_data

Specifies a reference to the table, view, or query that returns the training data for the model.

To create this reference, call SYSTEM$REFERENCE or SYSTEM$QUERY_REFERENCE.

TIMESTAMP_COLNAME => 'timestamp_column_name'

Specifies the name of the column containing the timestamps (TIMESTAMP_NTZ) in the time series data.

TARGET_COLNAME => 'target_column_name'

Specifies the name of the column containing the data (NUMERIC or FLOAT) to analyze.

LABEL_COLNAME => 'label_column_name'

Specifies the name of the column containing the labels for the data. Labels are Boolean (true/false) values indicating whether a given row is a known anomaly. If you do not have labeled data, pass an empty string ('') for this argument.

Optional:

SERIES_COLNAME => 'series_column_name'

Name of the column containing the identifier for the series (for multiple time series). This column should be a VARIANT because it can be any kind of value or a combination of values from more than one column in an array.

CONFIG_OBJECT => config_object

An OBJECT containing key-value pairs used to configure the model training job.

Key

Type

Default

Description

on_error

STRING

'ABORT'

String (constant) that specifies the error handling method for training. This is most useful when training multiple series. Supported values are:

  • 'abort': Abort training if an error is encountered in any time series.

  • 'skip': Skip any time series where training encounters an error. This allows training to succeed for other time series. To see which series failed during model training, call the model’s <name>!SHOW_TRAINING_LOGS method.

Usage Notes

  • If the column names specified by the TIMESTAMP_COLNAME, TARGET_COLNAME, or LABEL_COLNAME arguments do not exist in the table, view, or query specified by the INPUT_DATA argument, an error occurs.

  • Replication of class instances is currently not supported.

SHOW SNOWFLAKE.ML.ANOMALY_DETECTION

Lists all anomaly detection models.

Syntax

SHOW SNOWFLAKE.ML.ANOMALY_DETECTION [ LIKE <pattern> ];
  [ IN
      {
        ACCOUNT                  |

        DATABASE                 |
        DATABASE <database_name> |

        SCHEMA                   |
        SCHEMA <schema_name>     |
        <schema_name>
      }
   ]
Copy

Parameters

LIKE 'pattern'

Optionally filters the command output by object name. The filter uses case-insensitive pattern matching, with support for SQL wildcard characters (% and _).

For example, the following patterns return the same results:

... LIKE '%testing%' ...
... LIKE '%TESTING%' ...

. Default: No value (no filtering is applied to the output).

[ IN ... ]

Optionally specifies the scope of the command. Specify one of the following:

ACCOUNT

Returns records for the entire account.

DATABASE, . DATABASE db_name

Returns records for the current database in use or for a specified database (db_name).

If you specify DATABASE without db_name and no database is in use, the keyword has no effect on the output.

SCHEMA, . SCHEMA schema_name, . schema_name

Returns records for the current schema in use or a specified schema (schema_name).

SCHEMA is optional if a database is in use or if you specify the fully qualified schema_name (for example, db.schema).

If no database is in use, specifying SCHEMA has no effect on the output.

Default: Depends on whether the session currently has a database in use:

  • Database: DATABASE is the default (that is, the command returns the objects you have privileges to view in the database).

  • No database: ACCOUNT is the default (that is, the command returns the objects you have privileges to view in your account).

Output

Model properties and metadata in the following columns:

Column

Description

created_on

Date and time when the model was created

name

Name of the model

database_name

Database in which the model is stored

schema_name

Schema in which the model is stored

current_version

The version of the model algorithm

comment

Comment for the model

owner

The role that owns the model

DROP SNOWFLAKE.ML.ANOMALY_DETECTION

Removes the specified model from the current or specified schema. Dropped models cannot be recovered; they must be recreated.

Syntax

DROP SNOWFLAKE.ML.ANOMALY_DETECTION [IF EXISTS] <name>;
Copy

Parameters

name

Specifies the identifier for the model to drop. If the identifier contains spaces, special characters, or mixed-case characters, the entire string must be enclosed in double quotes. Identifiers enclosed in double quotes are also case-sensitive.

If the model identifier is not fully qualified (in the form of db_name.schema_name.name or schema_name.name)), the command looks for the model in the current schema for the session.

<name>!DETECT_ANOMALIES

Detects and reports on anomalies in the input data passed to the method. This is a method of the anomaly detector object that you create by executing the CREATE SNOWFLAKE.ML.ANOMALY_DETECTION command.

The method returns a table which labels each row of the input data as anomalous or not.

Syntax

<name>!DETECT_ANOMALIES(
  INPUT_DATA => <reference_to_data_to_analyze>,
  TIMESTAMP_COLNAME => '<timestamp_column_name>',
  TARGET_COLNAME => '<target_column_name>',
  [ CONFIG_OBJECT => <configuration_object>, ]
  [ SERIES_COLNAME => '<series_column_name>' ]
)
Copy

Note

name is the object that you create by executing the CREATE SNOWFLAKE.ML.ANOMALY_DETECTION command.

Arguments

Required:

INPUT_DATA => reference_to_data_to_analyze

A reference to the table, view, or query that returns the data to analyze.

TIMESTAMP_COLNAME => 'timestamp_column_name'

The name of the column containing the timestamps (TIMESTAMP_NTZ) in the time series data.

TARGET_COLNAME => 'target_column_name'

The name of the column containing the data to analyze (type NUMERIC or FLOAT).

Optional:

SERIES_COLNAME => 'series_column_name'

Name of the column containing the identifier for the series (for multiple time series). This column should be a VARIANT because it can be any type of value or values from multiple columns in an array.

CONFIG_OBJECT => config_object

An OBJECT containing key-value pairs used to configure the anomaly detection job.

Key

Type

Default

Description

prediction_interval

FLOAT

0.99

Value between 0 and 1 that specifies the percentage of the observations that should be marked as anomalies:

  • For less strict anomaly detection (that is, identifying fewer observations marked as anomalies), specify a higher value.

  • For more strict anomaly detection (that is, identifying more observations as anomalies), reduce this value.

on_error

STRING

'ABORT'

String (constant) that specifies the error handling for the anomaly detection task. This is most useful when detecting anomalies in multiple series. Supported values are:

  • 'abort': Abort the operation if an error is encountered in any time series.

  • 'skip': Skip any time series where anomaly detection encounters an error. This allows anomaly detection to succeed for other time series. Series that failed are absent from the output.

Output

The function returns the following columns:

Column Name

Data Type

Description

TS

TIMESTAMP_NTZ

The timestamps of the data

Y

FLOAT

The values for the time series

FORECAST

FLOAT

The predicted value at the timestamp.

LOWER_BOUND

FLOAT

The lower bound of the value within the prediction interval. Values that are lower than this are flagged as anomalies.

UPPER_BOUND

FLOAT

The upper bound of the value within the prediction interval. Values that are higher than this are flagged as anomalies.

IS_ANOMALY

BOOLEAN

True if the value is an anomaly; False if not.

PERCENTILE

FLOAT

The corresponding percentile of the observed Y value given the prediction interval.

If the percentile is outside of ((1 - alpha) / 2, 1 - (1 - alpha) / 2), the value is flagged as an anomaly. For example, if the prediction interval is 0.95, a percentile of 0.96 would not be an anomaly, but a percentile of 0.98 would be.

If the prediction_interval field is not specified in the configuration object, the default is 0.99.

DISTANCE

FLOAT

The multiple of the standard deviation from the FORECAST column (z-score)

Usage Notes

  • The columns for the data specified in the CREATE SNOWFLAKE.ML.ANOMALY_DETECTION command (in the INPUT_DATA constructor argument) must match the columns for the data specified in the INPUT_DATA argument of this method.

    For example, if you passed the SERIES_COLNAME argument to the CREATE SNOWFLAKE.ML.ANOMALY_DETECTION command, you must also pass the SERIES_COLNAME argument to this method. If you omitted the SERIES_COLNAME argument in the command, you must omit that argument here.

  • If the column names specified by the TIMESTAMP_COLNAME or TARGET_COLNAME arguments do not exist in the table, view, or query specified by the INPUT_DATA argument, an error occurs.

<name>!EXPLAIN_FEATURE_IMPORTANCE

Returns the relative feature importance for each feature used by the model.

Syntax

<name>!EXPLAIN_FEATURE_IMPORTANCE();
Copy

Output

Column

Type

Description

SERIES

VARIANT

Series value (only present if model was trained with multiple time series)

RANK

INTEGER

The importance rank of a feature for a specific series

FEATURE_NAME

VARCHAR

The name of the feature used to train the model aggregated_endogenous_features represents all features derived as transformations of your target variable.

IMPORTANCE_SCORE

FLOAT

The feature’s importance score: a value in [0, 1], with 0 being the lowest possible importance, and 1 the highest.

FEATURE_TYPE

VARCHAR

The source of the feature, one of:

  • user_provided

  • derived_from_timestamp

  • derived_from_endogenous

<name>!SHOW_TRAINING_LOGS

Returns logs from model training. Output is non-NULL only when 'ON_ERROR' = 'SKIP' is set in the training CONFIG_OBJECT.

Syntax

<name>!SHOW_TRAINING_LOGS();
Copy

Output

The SERIES column is present only for multi-series models. Single-series models do not have this column.

Column

Type

Description

SERIES

VARIANT

Series value. Only present if model was trained with multiple time series.

LOGS

OBJECT

A log of errors encountered during training. The value for the key Errors is an array of training errors. If no errors were encountered, the LOGS column is NULL.

Examples

See Detecting Anomalies.