ANOMALY_DETECTION¶
Fully qualified name: SNOWFLAKE.ML.ANOMALY_DETECTION
Anomaly detection allows you to detect outliers in your time series data by using a machine learning algorithm. You use CREATE SNOWFLAKE.ML.ANOMALY_DETECTION to create and train the detection model, and then use the <model_name>!DETECT_ANOMALIES method to detect anomalies.
CREATE SNOWFLAKE.ML.ANOMALY_DETECTION¶
Creates an object that you use to detect anomalies or replaces the anomaly detection object of the same name.
Syntax¶
CREATE [ OR REPLACE ] SNOWFLAKE.ML.ANOMALY_DETECTION <name>(
[ SERIES_COLNAME => '<series_column_name>' , ]
INPUT_DATA => <reference_to_training_data>,
TIMESTAMP_COLNAME => '<timestamp_column_name>',
TARGET_COLNAME => '<target_column_name>',
LABEL_COLNAME => '<label_column_name>'
)
Parameters¶
name
Specifies the identifier (name) for the anomaly detector object; must be unique for the schema in which the object is created.
In addition, the identifier must start with an alphabetic character and cannot contain spaces or special characters unless the entire identifier string is enclosed in double quotes (for example,
"My object"
). Identifiers enclosed in double quotes are also case-sensitive. For more details, see Identifier Requirements.
Constructor Arguments¶
Required:
INPUT_DATA => reference_to_training_data
Specifies a reference to the table, view, or query that returns the training data for the model.
To create this reference, call SYSTEM$REFERENCE or SYSTEM$QUERY_REFERENCE.
TIMESTAMP_COLNAME => 'timestamp_column_name'
Specifies the name of the column containing the timestamps (TIMESTAMP_NTZ) in the time series data.
TARGET_COLNAME => 'target_column_name'
Specifies the name of the column containing the data (NUMERIC or FLOAT) to analyze.
LABEL_COLNAME => 'label_column_name'
Specifies the name of the column containing the labels for the data. Labels are Boolean (true/false) values indicating whether a given row is a known anomaly.
If you do not have labeled data, pass an empty string for this argument.
Optional:
SERIES_COLNAME => 'series_column_name'
Name of the column containing the identifier for the series (for multiple time series). This column should be a VARIANT because it can be any kind of value or a combination of values from more than one column in an array.
Usage Notes¶
If the column names specified by the TIMESTAMP_COLNAME, TARGET_COLNAME, or LABEL_COLNAME arguments do not exist in the table, view, or query specified by the INPUT_DATA argument, an error occurs.
SHOW SNOWFLAKE.ML.ANOMALY_DETECTION¶
Lists all anomaly detection models.
Syntax¶
SHOW SNOWFLAKE.ML.ANOMALY_DETECTION [ LIKE <pattern> ];
[ IN
{
ACCOUNT |
DATABASE |
DATABASE <database_name> |
SCHEMA |
SCHEMA <schema_name> |
<schema_name>
}
]
Parameters¶
LIKE 'pattern'
Filters the command output by object name. The filter uses case-insensitive pattern matching with support for SQL wildcard characters (
%
and_
).For example, the following patterns return the same results:
... LIKE '%testing%' ...
... LIKE '%TESTING%' ...
[ IN ... ]
Optionally specifies the scope of the command. Specify one of the following:
ACCOUNT
Returns records for the entire account.
DATABASE
, .DATABASE db_name
Returns records for the current database in use or for a specified database (
db_name
).If you specify
DATABASE
withoutdb_name
and no database is in use, the keyword has no effect on the output.SCHEMA
, .SCHEMA schema_name
, .schema_name
Returns records for the current schema in use or a specified schema (
schema_name
).SCHEMA
is optional if a database is in use or if you specify the fully qualifiedschema_name
(for example,db.schema
).If no database is in use, specifying
SCHEMA
has no effect on the output.
Default: Depends on whether the session currently has a database in use:
Database:
DATABASE
is the default (that is, the command returns the objects you have privileges to view in the database).No database:
ACCOUNT
is the default (that is, the command returns the objects you have privileges to view in your account).
Output¶
The command output provides model properties and metadata in the following columns:
Column |
Description |
---|---|
created_on |
Date and time when the model was created |
name |
Name of the model |
database_name |
Database in which the model is stored |
schema_name |
Schema in which the model is stored |
current_version |
The version of the model, currently 1 |
comment |
Comment for the model |
owner |
The role that owns the model |
DROP SNOWFLAKE.ML.ANOMALY_DETECTION¶
Removes the specified model from the current or specified schema.
Syntax¶
DROP SNOWFLAKE.ML.ANOMALY_DETECTION [IF EXISTS] <name>;
Parameters¶
name
Specifies the identifier for the model to drop. If the identifier contains spaces, special characters, or mixed-case characters, the entire string must be enclosed in double quotes. Identifiers enclosed in double quotes are also case-sensitive.
If the model identifier is not fully qualified (in the form of
db_name.schema_name.name
orschema_name.name
)), the command looks for the model in the current schema for the session.
Usage Notes¶
Dropped models cannot be recovered; they must be recreated.
<model_name>!DETECT_ANOMALIES¶
Detects and reports on anomalies in the input data passed to the method. This is a method of the anomaly detector object that you create by executing the CREATE SNOWFLAKE.ML.ANOMALY_DETECTION command.
The method returns a table containing a row for each detected anomaly.
Syntax¶
<model_name>!DETECT_ANOMALIES(
INPUT_DATA => <reference_to_data_to_analyze>
, TIMESTAMP_COLNAME => '<timestamp_column_name>'
, TARGET_COLNAME => '<target_column_name>'
[ , CONFIG_OBJECT => <configuration_object> ]
[ , SERIES_COLNAME => '<series_column_name>' ]
)
Note
model_name
is the object that you create by executing the CREATE SNOWFLAKE.ML.ANOMALY_DETECTION command.
Arguments¶
Required:
INPUT_DATA => reference_to_data_to_analyze
Specifies a reference to the table, view, or query that returns the data to analyze.
TIMESTAMP_COLNAME => 'timestamp_column_name'
Specifies the name of the column containing the timestamps (TIMESTAMP_NTZ) in the time series data.
TARGET_COLNAME => 'target_column_name'
Specifies the name of the column containing the data to analyze (NUMERIC or FLOAT).
Optional:
SERIES_COLNAME => 'series_column_name'
Name of the column containing the identifier for the series (for multiple time series). This column should be a VARIANT because it can be any type of value or values from multiple columns in an array.
CONFIG_OBJECT => configuration_object
OBJECT specifying the configuration settings for this object.
You can use either an object constant or the OBJECT_CONSTRUCT function to construct this object.
In this object, you can set the following keys and values:
Key
Description of Value
'prediction_interval'
Value between 0 and 1 that specifies the percentage of the observations that should be marked as anomalies:
For less strict anomaly detection (that is, identifying fewer observations marked as anomalies), specify a higher value.
For more strict anomaly detection (that is, identifying more observations as anomalies), reduce this value.
Default value: 0.99 (1% of the data will be marked as anomalies.)
Output¶
The function returns the following columns:
Column Name |
Data Type |
Description |
---|---|---|
TS |
TIMESTAMP_NTZ |
The timestamps of the data |
Y |
FLOAT |
The values for the time series |
FORECAST |
FLOAT |
The predicted value at the timestamp The function uses these values to create the prediction intervals that identify anomalous data points. |
LOWER_BOUND |
FLOAT |
The lower bound of the value within the prediction interval Values that are lower than this will be flagged as anomalies. |
UPPER_BOUND |
FLOAT |
The upper bound of the value within the prediction interval Values that are higher than this will be flagged as anomalies. |
IS_ANOMALY |
BOOLEAN |
True if the value is an anomaly; False if not |
PERCENTILE |
FLOAT |
The corresponding percentile of the prediction interval for the Y value If the percentile is outside of ((1-prediction_interval)/2, (1-prediction_interval)/2 + prediction_interval, the value is not flagged as an anomaly. If the |
DISTANCE |
FLOAT |
The multiple of standard deviation from the FORECAST column (z-score) |
Usage Notes¶
The columns for the data specified in the CREATE SNOWFLAKE.ML.ANOMALY_DETECTION command (in the INPUT_DATA constructor argument) must match the columns for the data specified in the INPUT_DATA argument of this method.
For example, if you passed the SERIES_COLNAME argument to the CREATE SNOWFLAKE.ML.ANOMALY_DETECTION command, you must also pass the SERIES_COLNAME argument to this method. If you omitted the SERIES_COLNAME argument in the command, you must omit that argument when calling this method.
If the column names specified by the TIMESTAMP_COLNAME or TARGET_COLNAME arguments do not exist in the table, view, or query specified by the INPUT_DATA argument, an error occurs.
<model_name>!EXPLAIN_FEATURE_IMPORTANCE¶
Returns the relative feature importance for each feature used by the model.
Syntax¶
<model_name>!EXPLAIN_FEATURE_IMPORTANCE();
Output¶
Column |
Type |
Description |
---|---|---|
SERIES |
Series value (only present if model was trained with multiple time series) |
|
RANK |
The importance rank of a feature for a specific series |
|
FEATURE_NAME |
The name of the feature used to train the model |
|
IMPORTANCE_SCORE |
The feature’s importance score: a value in [0, 1], with 0 being the lowest possible importance, and 1 the highest. |
|
FEATURE_TYPE |
The source of the feature, one of:
|
Examples¶
See Detecting Anomalies.