ANOMALY_DETECTION

Fully qualified name: SNOWFLAKE.ML.ANOMALY_DETECTION

Anomaly detection allows you to detect outliers in your time series data by using a machine learning algorithm. You use CREATE SNOWFLAKE.ML.ANOMALY_DETECTION to create and train the detection model, and then use the <model_name>!DETECT_ANOMALIES method to detect anomalies.

CREATE SNOWFLAKE.ML.ANOMALY_DETECTION

Creates an object that you use to detect anomalies or replaces the anomaly detection object of the same name.

Syntax

CREATE [ OR REPLACE ] SNOWFLAKE.ML.ANOMALY_DETECTION <name>(
  [ SERIES_COLNAME => '<series_column_name>' , ]
  INPUT_DATA => <reference_to_training_data>,
  TIMESTAMP_COLNAME => '<timestamp_column_name>',
  TARGET_COLNAME => '<target_column_name>',
  LABEL_COLNAME => '<label_column_name>'
)
Copy

Parameters

name

Specifies the identifier (name) for the anomaly detector object; must be unique for the schema in which the object is created.

In addition, the identifier must start with an alphabetic character and cannot contain spaces or special characters unless the entire identifier string is enclosed in double quotes (for example, "My object"). Identifiers enclosed in double quotes are also case-sensitive. For more details, see Identifier Requirements.

Constructor Arguments

Required:

INPUT_DATA => reference_to_training_data

Specifies a reference to the table, view, or query that returns the training data for the model.

To create this reference, call SYSTEM$REFERENCE or SYSTEM$QUERY_REFERENCE.

TIMESTAMP_COLNAME => 'timestamp_column_name'

Specifies the name of the column containing the timestamps (TIMESTAMP_NTZ) in the time series data.

TARGET_COLNAME => 'target_column_name'

Specifies the name of the column containing the data (NUMERIC or FLOAT) to analyze.

LABEL_COLNAME => 'label_column_name'

Specifies the name of the column containing the labels for the data. Labels are Boolean (true/false) values indicating whether a given row is a known anomaly.

If you do not have labeled data, pass an empty string for this argument.

Optional:

SERIES_COLNAME => 'series_column_name'

Name of the column containing the identifier for the series (for multiple time series). This column should be a VARIANT because it can be any kind of value or a combination of values from more than one column in an array.

Usage Notes

  • If the column names specified by the TIMESTAMP_COLNAME, TARGET_COLNAME, or LABEL_COLNAME arguments do not exist in the table, view, or query specified by the INPUT_DATA argument, an error occurs.

SHOW SNOWFLAKE.ML.ANOMALY_DETECTION

Lists all anomaly detection models.

Syntax

SHOW SNOWFLAKE.ML.ANOMALY_DETECTION [ LIKE <pattern> ];
                                    [ IN
                                      {
                                        ACCOUNT                  |

                                        DATABASE                 |
                                        DATABASE <database_name> |

                                        SCHEMA                   |
                                        SCHEMA <schema_name>     |
                                        <schema_name>
                                      }
                                    ]
Copy

Parameters

LIKE 'pattern'

Filters the command output by object name. The filter uses case-insensitive pattern matching with support for SQL wildcard characters (% and _).

For example, the following patterns return the same results:

... LIKE '%testing%' ...
... LIKE '%TESTING%' ...
[ IN ... ]

Optionally specifies the scope of the command. Specify one of the following:

ACCOUNT

Returns records for the entire account.

DATABASE, . DATABASE db_name

Returns records for the current database in use or for a specified database (db_name).

If you specify DATABASE without db_name and no database is in use, the keyword has no effect on the output.

SCHEMA, . SCHEMA schema_name, . schema_name

Returns records for the current schema in use or a specified schema (schema_name).

SCHEMA is optional if a database is in use or if you specify the fully qualified schema_name (for example, db.schema).

If no database is in use, specifying SCHEMA has no effect on the output.

Default: Depends on whether the session currently has a database in use:

  • Database: DATABASE is the default (that is, the command returns the objects you have privileges to view in the database).

  • No database: ACCOUNT is the default (that is, the command returns the objects you have privileges to view in your account).

Output

The command output provides model properties and metadata in the following columns:

Column

Description

created_on

Date and time when the model was created

name

Name of the model

database_name

Database in which the model is stored

schema_name

Schema in which the model is stored

current_version

The version of the model, currently 1

comment

Comment for the model

owner

The role that owns the model

DROP SNOWFLAKE.ML.ANOMALY_DETECTION

Removes the specified model from the current or specified schema.

Syntax

DROP SNOWFLAKE.ML.ANOMALY_DETECTION [IF EXISTS] <name>;
Copy

Parameters

name

Specifies the identifier for the model to drop. If the identifier contains spaces, special characters, or mixed-case characters, the entire string must be enclosed in double quotes. Identifiers enclosed in double quotes are also case-sensitive.

If the model identifier is not fully qualified (in the form of db_name.schema_name.name or schema_name.name)), the command looks for the model in the current schema for the session.

Usage Notes

  • Dropped models cannot be recovered; they must be recreated.

<model_name>!DETECT_ANOMALIES

Detects and reports on anomalies in the input data passed to the method. This is a method of the anomaly detector object that you create by executing the CREATE SNOWFLAKE.ML.ANOMALY_DETECTION command.

The method returns a table containing a row for each detected anomaly.

Syntax

<model_name>!DETECT_ANOMALIES(
  INPUT_DATA => <reference_to_data_to_analyze>
  , TIMESTAMP_COLNAME => '<timestamp_column_name>'
  , TARGET_COLNAME => '<target_column_name>'
  [ , CONFIG_OBJECT => <configuration_object> ]
  [ , SERIES_COLNAME => '<series_column_name>' ]
)
Copy

Note

model_name is the object that you create by executing the CREATE SNOWFLAKE.ML.ANOMALY_DETECTION command.

Arguments

Required:

INPUT_DATA => reference_to_data_to_analyze

Specifies a reference to the table, view, or query that returns the data to analyze.

TIMESTAMP_COLNAME => 'timestamp_column_name'

Specifies the name of the column containing the timestamps (TIMESTAMP_NTZ) in the time series data.

TARGET_COLNAME => 'target_column_name'

Specifies the name of the column containing the data to analyze (NUMERIC or FLOAT).

Optional:

SERIES_COLNAME => 'series_column_name'

Name of the column containing the identifier for the series (for multiple time series). This column should be a VARIANT because it can be any type of value or values from multiple columns in an array.

CONFIG_OBJECT => configuration_object

OBJECT specifying the configuration settings for this object.

You can use either an object constant or the OBJECT_CONSTRUCT function to construct this object.

In this object, you can set the following keys and values:

Key

Description of Value

'prediction_interval'

Value between 0 and 1 that specifies the percentage of the observations that should be marked as anomalies:

  • For less strict anomaly detection (that is, identifying fewer observations marked as anomalies), specify a higher value.

  • For more strict anomaly detection (that is, identifying more observations as anomalies), reduce this value.

Default value: 0.99 (1% of the data will be marked as anomalies.)

Output

The function returns the following columns:

Column Name

Data Type

Description

TS

TIMESTAMP_NTZ

The timestamps of the data

Y

FLOAT

The values for the time series

FORECAST

FLOAT

The predicted value at the timestamp

The function uses these values to create the prediction intervals that identify anomalous data points.

LOWER_BOUND

FLOAT

The lower bound of the value within the prediction interval

Values that are lower than this will be flagged as anomalies.

UPPER_BOUND

FLOAT

The upper bound of the value within the prediction interval

Values that are higher than this will be flagged as anomalies.

IS_ANOMALY

BOOLEAN

True if the value is an anomaly; False if not

PERCENTILE

FLOAT

The corresponding percentile of the prediction interval for the Y value

If the percentile is outside of ((1-prediction_interval)/2, (1-prediction_interval)/2 + prediction_interval, the value is not flagged as an anomaly.

If the prediction_interval field is not specified in the configuration object, the default prediction interval is 0.99. For example, if the percentile is 0.96, and the prediction interval is 0.95, the observed value will be flagged as an anomaly.

DISTANCE

FLOAT

The multiple of standard deviation from the FORECAST column (z-score)

Usage Notes

  • The columns for the data specified in the CREATE SNOWFLAKE.ML.ANOMALY_DETECTION command (in the INPUT_DATA constructor argument) must match the columns for the data specified in the INPUT_DATA argument of this method.

    For example, if you passed the SERIES_COLNAME argument to the CREATE SNOWFLAKE.ML.ANOMALY_DETECTION command, you must also pass the SERIES_COLNAME argument to this method. If you omitted the SERIES_COLNAME argument in the command, you must omit that argument when calling this method.

  • If the column names specified by the TIMESTAMP_COLNAME or TARGET_COLNAME arguments do not exist in the table, view, or query specified by the INPUT_DATA argument, an error occurs.

<model_name>!EXPLAIN_FEATURE_IMPORTANCE

Returns the relative feature importance for each feature used by the model.

Syntax

<model_name>!EXPLAIN_FEATURE_IMPORTANCE();
Copy

Output

Column

Type

Description

SERIES

VARIANT

Series value (only present if model was trained with multiple time series)

RANK

INTEGER

The importance rank of a feature for a specific series

FEATURE_NAME

VARCHAR

The name of the feature used to train the model aggregated_endogenous_features represents all features derived as transformations of your target variable.

IMPORTANCE_SCORE

FLOAT

The feature’s importance score: a value in [0, 1], with 0 being the lowest possible importance, and 1 the highest.

FEATURE_TYPE

VARCHAR

The source of the feature, one of:

  • user_provided

  • derived_from_timestamp

  • derived_from_endogenous

Examples

See Detecting Anomalies.