CREATE SNOWFLAKE.ML.FORECAST¶
Creates a new forecast model from the training data you provide or replaces the forecast model of the same name.
Syntax¶
CREATE [ OR REPLACE ] SNOWFLAKE.ML.FORECAST [ IF NOT EXISTS ] <model_name>(
INPUT_DATA => <input_data>,
[ SERIES_COLNAME => '<series_colname>', ]
TIMESTAMP_COLNAME => '<timestamp_colname>',
TARGET_COLNAME => '<target_colname>',
[ CONFIG_OBJECT => <config_object> ]
)
[ [ WITH ] TAG ( <tag_name> = '<tag_value>' [ , <tag_name> = '<tag_value>' , ... ] ) ]
[ COMMENT = '<string_literal>' ]
Note
Using named arguments makes argument order irrelevant and results in more readable code. However, you can also use positional arguments, as in the following example:
CREATE SNOWFLAKE.ML.FORECAST <name>(
'<input_data>', '<series_colname>', '<timestamp_colname>', '<target_colname>'
);
Parameters¶
model_name
Specifies the identifier for the model; must be unique for the schema in which the model is created.
If the model identifier is not fully qualified (in the form of
db_name.schema_name.name
orschema_name.name
), the command creates the model in the current schema for the session.In addition, the identifier must start with an alphabetic character and cannot contain spaces or special characters unless the entire identifier string is enclosed in double quotes (for example,
"My object"
). Identifiers enclosed in double quotes are also case-sensitive.For more details, see Identifier requirements.
Constructor arguments¶
Required:
INPUT_DATA => input_data
A reference to the input data. Using a reference allows the training process, which runs with limited privileges, to use your privileges to access the data. You can use a reference to a table or a view if your data is already in that form, or you can use a query reference to provide the query to be executed to obtain the data.
The referenced data is the entire training data consumed by the forecasting model. If
input_data
contains any columns that are not named astimestamp_colname
,target_colname
, orseries_colname
, they are considered exogenous variables (additional features). Order of the columns in the input data is not important.Your input data must have columns with appropriate types for your use case. See Examples for details on each use case.
Use Case
Columns and types
Single time series
Timestamp column: TIMESTAMP_NTZ.
Target value column: FLOAT.
Multiple time series
Series column: VARIANT containing numeric values and text.
Timestamp column: TIMESTAMP_NTZ.
Target value column: FLOAT.
Single time series with exogenous variables
Timestamp column: TIMESTAMP_NTZ.
Target value column: FLOAT.
Multiple time series with exogenous variables
Series column: VARIANT containing numeric values and text.
Timestamp column: TIMESTAMP_NTZ.
Target value column: FLOAT.
TIMESTAMP_COLNAME => 'timestamp_colname'
Name of the column containing the timestamps in
input_data
.TARGET_COLNAME => 'target_colname'
Name of the column containing the target (dependent value) in
input_data
.
Optional:
SERIES_COLNAME => 'series_colname'
For multiple time-series models, the name of the column defining the multiple time series in
input_data
. This column can be a value of any type, or an array of values from one or more other columns, as shown in Forecast on Multiple Series.If you are providing arguments positionally, this must be the second argument.
CONFIG_OBJECT => config_object
An OBJECT containing key-value pairs used to configure the model training job.
Key
Type
Default
Description
on_error
'ABORT'
String (constant) that specifies the error handling method for the model training task. This is most useful when training multiple series. Supported values are:
'abort'
: Abort the training operation if an error is encountered in any time series.'skip'
: Skip any time series where training encounters an error. This allows model training to succeed for other time series. To see which series failed, use the model’s <model_name>!SHOW_TRAINING_LOGS method.
evaluate
TRUE
Whether evaluation metrics should be generated. If TRUE, then additional models are trained for cross-validation using the parameters in the
evaluation_config
.evaluation_config
See Evaluation configuration below.
A optional config object to specify how out-of-sample evaluation metrics should be generated.
Evaluation configuration¶
The evaluation_config
object contains key-value pairs that configure cross-validation. These parameters are from scikit-learn’s
TimeSeriesSplit.
Key
Type
Default
Description
n_splits
5
Number of splits.
max_train_size
INTEGER or NULL (no maximum).
NULL
Maximum size for a single training set.
test_size
INTEGER or NULL.
NULL
Used to limit the size of the test set.
gap
0
Number of samples to exclude from the end of each training set before the test set.
prediction_interval
0.95
The prediction interval used in calculating interval metrics.
Usage notes¶
Replication of class instances is currently not supported.
Examples¶
See Examples.