ML Observability: Monitoring model behavior over time

Model behavior can change over time due to input drift, stale training assumptions, and data pipeline issues, as well as the usual factors, including changes to the underlying hardware and software and the fluid nature of traffic. ML Observability allows you to track the quality of production models you have deployed via the Snowflake Model Registry across multiple dimensions, such as performance, drift, and volume.

Currently, the model monitor supports regression and binary classification models.

ML Observability workflow

When you use a model that has been logged in the Snowflake Model Registry for inference, you receive results in the form of a Snowpark or pandas DataFrame, depending on the type of input DataFrame passed to the inference method. This data typically originates in Snowflake. Even in cases where inference is run outside Snowflake, it is common to store the results in Snowflake. ML Observability allows you to monitor your model’s performance in both of these scenarios by working on the stored inference data. The typical workflow is shown below.

ML Observability workflow

The monitoring logs store the inference data and the predictions so that the ML Observability feature can observe changes in predictions over time. The monitoring logs are stored in a table that contains an ID, a timestamp, features, predictions, and a ground truth label, which indicates whether a given row is a prediction or observed data. The basic structure is shown below.

ML Observability in action

You must explicitly create a model monitor object for each model version you want to monitor. Each model version can have exactly one monitor, and each monitor can monitor exactly one model version; they cannot be shared. The monitor object automatically refreshes the monitor logs by querying source data and updates the monitoring reports based on the logs.

Each monitor encapsulates the following information:

  • The model version to monitor.

  • The table in which the monitor logs are stored.

  • The minimum time granularity at which data is stored (aggregation window), currently 1 day minimum.

  • An optional baseline table for comparative metric operations such as drift.

Prerequisites

Before you begin, make sure you have the following:

  • A Snowflake account.

  • Version 1.7.1 or later of the snowflake-ml-python Python package.

  • Familiarity with the Snowflake Model Registry.

Creating a model monitor

Create a model monitor using the CREATE MODEL MONITOR command. The model monitor must be created in the same schema as the model version to be monitored. You must have the CREATE MODEL MONITOR privilege on the schema where the monitor is created.

Note

You can create a maximum of 250 model monitors per account.

The following is a summary of the full syntax of the CREATE MODEL MONITOR command:

CREATE [OR REPLACE] MODEL MONITOR [IF NOT EXISTS] <NAME> WITH
    MODEL=<model_name> VERSION=<version_name> FUNCTION=<function_name>
    SOURCE=<source_name>
    BASELINE=<baseline_name>
    TIMESTAMP_COLUMN=<timestamp_colname>
    ID_COLUMNS=(<id_col_1>, ...)
    [ PREDICTION_CLASS_COLUMNS=('<prediction_class_col_1>' , ...) ]
    [ PREDICTION_SCORE_COLUMNS=('<prediction_col_1>' , ...) ]
    [ ACTUAL_CLASS_COLUMNS=('<actual_class_col_1>',...) ]
    [ ACTUAL_SCORE_COLUMNS=('<actual_col_1>',...) ]
    WAREHOUSE=<warehouse>
    REFRESH_INTERVAL='<refresh_interval>'
    AGGREGATION_WINDOW='<aggregation_window>'
Copy

When creating a model monitor, you must specify the following parameters:

  • NAME: The name of the monitor to be created. Must be unique within the schema.

  • MODEL and VERSION: The name and version of the model to be monitored.

  • FUNCTION: The name of the inference function that generates predictions.

  • SOURCE: The table containing the data to be used for inference.

  • BASELINE: A table containing a snapshot of recent inference resuts that will be used as a baseline to detect drift. This argument is technically optional, but if you do not provide baseline data, the monitor cannot detect drift.

  • TIMESTAMP_COLUMN: The name of the timestamp column.

  • ID_COLUMNS: Array of columns that contain record IDs. Required even if empty.

  • WAREHOUSE: The warehouse used for internal compute operations for the monitor, including dynamic table operations.

  • REFRESH_INTERVAL: The interval at which the monitor refreshes from the source data (for example “1 day”). The supported intervals are the same as those available for dynamic table target lag.

  • AGGREGATION_WINDOW: The aggregation window used to compute metrics in days (for example “1 day”).

Additional columns are specified as follows depending on the model type. All values are arrays of strings that contain column names.

  • PREDICTION_CLASS_COLUMNS and ACTUAL_CLASS_COLUMNS: Required for classification models. The specified PREDICTION columns are compared with the corresponding ACTUAL columns to compute classification metrics.

  • PREDICTION_SCORE_COLUMNS and ACTUAL_SCORE_COLUMNS: Required to for regression models. The specified PREDICTION columns are compared with the corresponding ACTUAL columns to compute regression metrics.

You may also use PREDICTION_SCORE_COLUMNS with ACTUAL_CLASS_COLUMNS for classification models if the prediction scores have only the values 0 and 1.

Refresh suspension

Like dynamic tables, model monitors suspend refreshes when they encounter five consecutive refresh failures related to the source tables. You can view the status and cause of refresh suspension using the DESCRIBE MODEL MONITOR command. The output includes the following columns:

  • aggregation_status: The value in this column is a JSON object. One or more of the values in this object will be SUSPENDED if the model monitor is suspended.

  • aggregation_last_error: The value in this column is a JSON object that contains the specific SQL error that caused the suspension.

After resolving the root cause of the refresh failure, you can resume the monitor by issuing ALTER MODEL MONITOR <name> RESUME.

Modifying a model monitor

The ALTER MODEL MONITOR command lets you change the following properties of a model monitor:

  • Suspend status using the SUSPEND and RESUME keywords.

    ALTER MODEL MONITOR [IF EXISTS] <name> SUSPEND;
    ALTER MODEL MONITOR [IF EXISTS] <name> RESUME;
    
    Copy
  • Change the warehouse used for refreshes using SET WAREHOUSE=….

    ALTER MODEL MONITOR [IF EXISTS] <name> SET WAREHOUSE='<warehouse_name>';
    
    Copy
  • Change the refresh interval using SET REFRESH_INTERVAL=….

    ALTER MODEL MONITOR [IF EXISTS] <name> SET REFRESH_INTERVAL='<refresh_interval>';
    
    Copy

All other properties are immutable and cannot be changed after the model has been created.

Deleting a model monitor

The DROP MODEL MONITOR command deletes a model monitor.

DROP MODEL MONITOR <name>;
Copy

Viewing information about model monitors

The SHOW MODEL MONITORS command shows information about model monitors you have access to whose names optionally match the specified pattern.

SHOW MODEL MONITORS [LIKE '<pattern>']
 [ IN { ACCOUNT | DATABASE [ <db_name> ] | SCHEMA [ <schema_name> ] } ];
Copy

The output includes the parameters used to create each listed monitor, the database and schema where it resides, its status, and a description or comment.

The DESCRIBE MODEL MONITOR command shows detailed information about a model monitor.

DESCRIBE MODEL MONITOR <name>;
Copy

The output includes all the information from the SHOW command, and additionally:

  • The aggregation status, including columns for the status of each nested dynamic table type, the last error, and the time of the last refresh.

  • The prediction, actual, and feature columns being used from the source table.

Viewing monitoring reports

To view monitor reports, visit the ML Monitoring dashboard in Snowsight. In the Snowsight navigation pane, select AI & ML then select Models. The resulting list contains all the models in the Snowflake Model Registry in all the databases and schemas that your current role has access to.

Open a model’s details page by selecting the corresponding row in the Models list. The details page displays key model information, including the model’s description, tags, versions, and monitors.

The Monitors list in the details page displays the list of model monitors, the model versions they are attached to, their status, and when they were created.

Open a model monitor dashboard page by selecting the corresponding row in the Monitors list. The dashboard is populated with graphs displaying key metrics of the model over time. The exact graphs displayed depend on the type of model the monitor is based on (that is, binary classification or regression).

In the dashboard, you can take the following actions:

  • Change the range of the graphs by clicking the time range selector.

  • Change the graphs shown by clicking the Settings button. (Hover the mouse over a metric name to see more information about it.)

  • Compare model monitors by clicking the Compare model selector drop-down.

  • Display more information about the model monitor by selecting Display monitor details.

Access control requirements

The following privileges are required to work with model monitor objects.

Command

Required practice

CREATE MODEL MONITOR

  • CREATE MODEL MONITOR privilege on the schema where the monitor is created.

  • SELECT on data source (table or view)

  • USAGE on database, schema, warehouse, and model

SHOW MODEL MONITORS

Any privilege on the model monitor

DESCRIBE MODEL MONITOR

Any privilege on the model monitor

ALTER MODEL MONITOR

MODIFY on the model monitor

DROP MODEL MONITOR

OWNERSHIP on the model monitor

Model monitor dashboard

USAGE on the model and the model monitor

Known limitations

The following limitations apply to model monitors:

  • Monitors must be created in the same database and schema as the model version being monitored.

  • Currently only single-output regression and binary classification models are supported.

  • At least one prediction column (class or score) is required. Actual columns are optional but are required to calculate accuracy metrics.

  • If baseline data is not provided, drift cannot be calculated. You will need to drop the monitor and create it again to add baseline data.

  • A given column can be used only once in the monitor. For example, you cannot use the same column as both an ID and a prediction.

  • Model monitors expect that your data does not contain invalid values such as nulls, NaNs, +/-Inf, probability scores outside the range of 0-1, classes that are not exactly 0 or 1, or more than two classes in a PREDICTION_CLASS_COLUMNS column. Such issues may cause the monitor to fail and eventually be suspended.

  • Timestamp columns must be of type TIMESTAMP_NTZ or DATE. Prediction and actual columns must be of type NUMBER.

  • Aggregation windows must be specified in days.

  • The number of monitored features can be at most 500.

  • You can create at most 250 monitors.

Cost considerations

Virtual warehouse compute

Model monitors run in a virtual warehouse. This warehouse incurs cost when the user creates the service and each time the monitor is refreshed. A virtual warehouse is also used when loading the associated Snowsight dashboard, and incurs charges.

Storage

Model Monitors materialize the source data into a table stored in your account.

Cloud services compute

Model Monitors use cloud services compute to trigger refreshes when an underlying base object has changed. Cloud services compute cost is only billed if the daily cloud services cost is greater than 10% of the daily warehouse cost for the account.