FORECAST

Fully-qualified name: SNOWFLAKE.ML.FORECAST

A forecast model produces a forecast for a single time series or for multiple time series. You use CREATE SNOWFLAKE.ML.FORECAST to create and train the forecasting model, then use the model’s <name>!FORECAST method to produce forecasts. The <name>!EXPLAIN_FEATURE_IMPORTANCE method provides information about how each feature in the training data influences the forecast. The <name>!SHOW_TRAINING_LOGS method provides error messages for any time series whose models failed to fit. The Examples method provides evaluation metrics on out-of-sample data.

Important

Legal notice. This Snowflake Cortex ML-Based function is powered by machine learning technology. Machine learning technology and results provided may be inaccurate, inappropriate, or biased. Decisions based on machine learning outputs, including those built into automatic pipelines, should have human oversight and review processes to ensure model-generated content is accurate. Snowflake Cortex ML-based function queries will be treated as any other SQL query and may be considered metadata.

Metadata. When you use Snowflake Cortex ML-Based functions, Snowflake logs generic error messages returned by an ML function, in addition to what is mentioned in Metadata Fields. These error logs help us troubleshoot issues that arise and improve these functions to serve you better.

CREATE SNOWFLAKE.ML.FORECAST

Creates a new forecast model from the training data you provide or replaces the forecast model of the same name.

Syntax

CREATE [ OR REPLACE ] SNOWFLAKE.ML.FORECAST [ IF NOT EXISTS ] <name>(
  INPUT_DATA => <input_data>,
  [ SERIES_COLNAME => '<series_colname>', ]
  TIMESTAMP_COLNAME => '<timestamp_colname>',
  TARGET_COLNAME => '<target_colname>',
  [ CONFIG_OBJECT => <config_object> ]
)
[ [ WITH ] TAG ( <tag_name> = '<tag_value>' [ , <tag_name> = '<tag_value>' , ... ] ) ]
[ COMMENT = '<string_literal>' ]
Copy

Note

Using named arguments makes argument order irrelevant and results in more readable code. However, you can also use positional arguments, as in the following example:

CREATE SNOWFLAKE.ML.FORECAST <name>(
  '<input_data>', '<series_colname>', '<timestamp_colname>', '<target_colname>'
);
Copy

Parameters

name

Specifies the identifier for the model; must be unique for the schema in which the model is created.

If the model identifier is not fully qualified (in the form of db_name.schema_name.name or schema_name.name), the command creates the model in the current schema for the session.

In addition, the identifier must start with an alphabetic character and cannot contain spaces or special characters unless the entire identifier string is enclosed in double quotes (for example, "My object"). Identifiers enclosed in double quotes are also case-sensitive.

For more details, see Identifier requirements.

Constructor Arguments

Required:

INPUT_DATA => input_data

A reference to the input data. Using a reference allows the training process, which runs with limited privileges, to use your privileges to access the data. You can use a reference to a table or a view if your data is already in that form, or you can use a query reference to provide the query to be executed to obtain the data.

The referenced data is the entire training data consumed by the forecasting model. If input_data contains any columns that are not named as timestamp_colname, target_colname, or series_colname, they are considered exogenous variables (additional features). Order of the columns in the input data is not important.

Your input data must have columns with appropriate types for your use case. See Examples for details on each use case.

Use Case

Columns and types

Single time series

Multiple time series

Single time series with exogenous variables

Multiple time series with exogenous variables

TIMESTAMP_COLNAME => 'timestamp_colname'

Name of the column containing the timestamps in input_data.

TARGET_COLNAME => 'target_colname'

Name of the column containing the target (dependent value) in input_data.

Optional:

SERIES_COLNAME => 'series_colname'

For multiple time series models, the name of the column defining the multiple time series in input_data. This column can be a value of any type, or an array of values from one or more other columns, as shown in Forecast on Multiple Series.

If you are providing arguments positionally, this must be the second argument.

CONFIG_OBJECT => config_object

An OBJECT containing key-value pairs used to configure the model training job.

Key

Type

Default

Description

on_error

STRING

'ABORT'

String (constant) that specifies the error handling method for the model training task. This is most useful when training multiple series. Supported values are:

  • 'abort': Abort the training operation if an error is encountered in any time series.

  • 'skip': Skip any time series where training encounters an error. This allows model training to succeed for other time series. To see which series failed, use the model’s <name>!SHOW_TRAINING_LOGS method.

evaluate

BOOLEAN

TRUE

Whether evaluation metrics should be generated. If TRUE, then additional models are trained for cross-validation using the parameters in the evaluation_config.

evaluation_config

OBJECT

See Evaluation Configuration below.

A optional config object to specify how out-of-sample evaluation metrics should be generated.

Evaluation Configuration

The evaluation_config object contains key-value pairs that configure cross-validation. These parameters are from scikit-learn’s TimeSeriesSplit.

Key

Type

Default

Description

n_splits

INTEGER

5

Number of splits.

max_train_size

INTEGER or NULL (no maximum).

NULL

Maximum size for a single training set.

test_size

INTEGER or NULL.

NULL

Used to limit the size of the test set.

gap

INTEGER

0

Number of samples to exclude from the end of each training set before the test set.

prediction_interval

FLOAT

0.95

The prediction interval used in calculating interval metrics.

Usage Notes

Replication of class instances is currently not supported.

SHOW SNOWFLAKE.ML.FORECAST

Lists all forecasting models.

Syntax

SHOW SNOWFLAKE.ML.FORECAST [ LIKE <pattern> ]
  [ IN
      {
        ACCOUNT                  |

        DATABASE                 |
        DATABASE <database_name> |

        SCHEMA                   |
        SCHEMA <schema_name>     |
        <schema_name>
      }
   ]
Copy

Parameters

LIKE 'pattern'

Optionally filters the command output by object name. The filter uses case-insensitive pattern matching, with support for SQL wildcard characters (% and _).

For example, the following patterns return the same results:

... LIKE '%testing%' ...
... LIKE '%TESTING%' ...

. Default: No value (no filtering is applied to the output).

[ IN ... ]

Optionally specifies the scope of the command. Specify one of the following:

ACCOUNT

Returns records for the entire account.

DATABASE, . DATABASE db_name

Returns records for the current database in use or for a specified database (db_name).

If you specify DATABASE without db_name and no database is in use, the keyword has no effect on the output.

SCHEMA, . SCHEMA schema_name, . schema_name

Returns records for the current schema in use or a specified schema (schema_name).

SCHEMA is optional if a database is in use or if you specify the fully qualified schema_name (for example, db.schema).

If no database is in use, specifying SCHEMA has no effect on the output.

Default: Depends on whether the session currently has a database in use:

  • Database: DATABASE is the default (that is, the command returns the objects you have privileges to view in the database).

  • No database: ACCOUNT is the default (that is, the command returns the objects you have privileges to view in your account).

Output

The command output provides model properties and metadata in the following columns:

Column

Description

created_on

Date and time when the model was created

name

Name of the model

database_name

Database in which the model is stored

schema_name

Schema in which the model is stored

current_version

The version of the model algorithm

comment

Comment for the model

owner

The role that owns the model

DROP SNOWFLAKE.ML.FORECAST

Removes the specified model from the current or specified schema. Dropped models cannot be recovered; they must be recreated.

Syntax

DROP SNOWFLAKE.ML.FORECAST [ IF EXISTS ] <name>;
Copy

Parameters

name

Specifies the identifier for the model to drop. If the identifier contains spaces, special characters, or mixed-case characters, the entire string must be enclosed in double quotes. Identifiers enclosed in double quotes are also case-sensitive.

If the model identifier is not fully qualified (in the form of db_name.schema_name.name or schema_name.name)), the command looks for the model in the current schema for the session.

<name>!FORECAST

Generates a forecast from the previously trained model name.

Syntax

The required arguments vary depending on what use case the model was trained for.

For single-series models without exogenous variables:

<name>!FORECAST(
  FORECASTING_PERIODS => <forecasting_periods>,
  [ CONFIG_OBJECT => <config_object> ]
);
Copy

For single-series models with exogenous variables:

<name>!FORECAST(
  INPUT_DATA => <input_data>,
  TIMESTAMP_COLNAME => '<timestamp_colname>',
  [ CONFIG_OBJECT => <config_object> ]
);
Copy

For multiple-series models without exogenous variables:

<name>!FORECAST(
  SERIES_VALUE => <series>,
  FORECASTING_PERIODS => <forecasting_periods>,
  [ CONFIG_OBJECT => <config_object> ]
);
Copy

For multiple-series models with exogenous variables:

<name>!FORECAST(
  SERIES_VALUE => <series>,
  SERIES_COLNAME => <series_colname>,
  INPUT_DATA => <input_data>,
  TIMESTAMP_COLNAME => '<timestamp_colname>',
  [ CONFIG_OBJECT => <config_object> ]
);
Copy

Arguments

Required:

Not all of the following arguments are required for every use case.

FORECASTING_PERIODS => forecasting_periods

Required for forecasts without exogenous variables.

The number of steps ahead to forecast. The interval between steps is inferred by the model during training.

INPUT_DATA => input_data

Required for forecasts with exogenous variables.

A reference to a table, view, or query that contains the future timestamps and values of the exogenous variables (additional user-provided features) that were passed as input_data when training the model. Using a reference allows the forecasting process, which runs with limited privileges, to use your privileges to access the data. Columns are matched between this argument and the original exogenous training data by name.

TIMESTAMP_COLNAME => 'timestamp_colname'

Required for forecasts with exogenous variables.

The name of the column in input_data containing the timestamps.

SERIES_COLNAME => 'series_colname'

Required for multi-series forecasts with exogenous variables.

The name of the column in input_data specifying the series.

SERIES_VALUE => series

Required for multi-series forecasts.

The time series to forecast. Can be a single value (e.g., 'Series A'::variant) or a VARIANT, but must specify a series that the model has been trained on. If not specified, all trained series are predicted.

Optional:

CONFIG_OBJECT => config_object

An OBJECT containing key-value pairs used to configure the forecast job.

Key

Type

Default

Description

prediction_interval

FLOAT

0.95

A value greater than or equal to 0.0 and less than 1.0. The default value of 0.95 means 95% of future points are expected to fall within the interval [lower_bound, upper_bound] from the forecast result.

on_error

STRING

'ABORT'

String (constant) specifying the error handling method. This is most useful when forecasting multiple series. Supported values are:

  • 'abort': Abort the model forecasting operation if an error is encountered in any time series.

  • 'skip': Skip any time series where forecasting encounters an error. This allows forecasting to succeed for other time series. Series that failed are absent from the model output.

Output

The SERIES column is present only for multi-series forecasts. Single-series forecasts do not have this column.

Column

Type

Description

SERIES

VARIANT

Series value (if model was trained with multiple time series).

TS

TIMESTAMP_NTZ

Timestamp.

FORECAST

FLOAT

Forecast target value.

LOWER_BOUND

FLOAT

Lower boundary of prediction interval.

UPPER_BOUND

FLOAT

Upper boundary of prediction interval.

<name>!EXPLAIN_FEATURE_IMPORTANCE

Returns the relative feature importance for each feature used by the model.

Syntax

<name>!EXPLAIN_FEATURE_IMPORTANCE();
Copy

Output

The SERIES column is present only for multi-series forecasts. Single-series forecasts do not have this column.

Column

Type

Description

SERIES

VARIANT

Series value (if model was trained with multiple time series).

RANK

INTEGER

The importance rank of a feature for a particular series.

FEATURE_NAME

VARCHAR

The name of the feature used to train the model. aggregated_endogenous_features represents all features derived as transformations of the target variable.

IMPORTANCE_SCORE

FLOAT

The feature’s importance score: a value in [0, 1], with 0 being the lowest possible importance, and 1 the highest.

FEATURE_TYPE

VARCHAR

The source of the feature. One of:

  • user_provided: Feature data provided by the user.

  • derived_from_timestamp: Periodic feature (e.g. day, week, or month) derived from timestamp data.

  • derived_from_endogenous: Features derived from a transformation of the target variable.

<name>!SHOW_EVALUATION_METRICS

Returns out-of-sample evaluation metrics generated using time series cross validation. Metrics are available only if evaluate=TRUE in the CONFIG_OBJECT during model construction (this is the default).

Syntax

<name>!SHOW_EVALUATION_METRICS();
Copy

Output

The SERIES column is present only for multi-series forecasts. Single-series forecasts do not have this column.

Column

Type

Description

SERIES

VARIANT

Series value (only present if model was trained with multiple time series)

ERROR_METRIC

VARCHAR

The name of the error metric used. The method returns the following metrics:

Point Metrics:

Interval Metrics: These metrics use the prediction_interval argument from the Evaluation Configuration.

  • COVERAGE_INTERVAL: The proportion of actual values that fall within the prediction interval.

  • WINKLER_ALPHA: Winkler Score.

LOGS

VARIANT

Contains error or warning messages.

<name>!SHOW_TRAINING_LOGS

Returns logs from model training. Output is non-NULL only when 'ON_ERROR' = 'SKIP' is set in the training CONFIG_OBJECT, as otherwise the entire model fails to train.

Syntax

<name>!SHOW_TRAINING_LOGS();
Copy

Output

The SERIES column is present only for multi-series models. Single-series models do not have this column.

Column

Type

Description

SERIES

VARIANT

Series value (if model was trained with multiple time series).

LOGS

OBJECT

Object containing errors encountered during training. Currently the only key is Errors, an array of errors. If no errors were encountered, the logs object is NULL.

Examples

See Examples.