ML Observability: Monitoring model behavior over time

Model behavior can change over time due to input drift, stale training assumptions, and data pipeline issues, as well as the usual factors, including changes to the underlying hardware and software and the fluid nature of traffic. ML Observability allows you to track the quality of production models you have deployed via the Snowflake Model Registry across multiple dimensions, such as performance, drift, and volume.

Currently, the model monitor supports regression and binary classification models.

Note

To dive in and start using ML Observability, see the quickstart.

ML Observability workflow

When you use a model that has been logged in the Snowflake Model Registry for inference, you receive results in the form of a Snowpark or pandas DataFrame, depending on the type of input DataFrame passed to the inference method. This data typically originates in Snowflake. Even in cases where inference is run outside Snowflake, it is common to store the results in Snowflake. ML Observability allows you to monitor your model’s performance in both of these scenarios by working on the stored inference data. The typical workflow is shown below.

ML Observability workflow

The monitoring logs store the inference data and the predictions so that the ML Observability feature can observe changes in predictions over time. The monitoring logs are stored in a table that contains an ID, a timestamp, features, predictions, and a ground truth label, which indicates whether a given row is a prediction or observed data. The basic structure is shown below.

ML Observability in action

You must explicitly create a model monitor object for each model version you want to monitor. Each model version can have exactly one monitor, and each monitor can monitor exactly one model version; they cannot be shared. The monitor object automatically refreshes the monitor logs by querying source data and updates the monitoring reports based on the logs.

Each monitor encapsulates the following information:

  • The model version to monitor.

  • The table in which the monitor logs are stored.

  • The minimum time granularity at which data is stored (aggregation window), currently 1 day minimum.

  • An optional baseline table for comparative metric operations such as drift.

Prerequisites

Before you begin, make sure you have the following:

  • A Snowflake account.

  • Version 1.7.1 or later of the snowflake-ml-python Python package.

  • Familiarity with the Snowflake Model Registry.

Creating a model monitor

Create a model monitor using the CREATE MODEL MONITOR command. The model monitor must be created in the same schema as the model version to be monitored. You must have the CREATE MODEL MONITOR privilege on the schema where the monitor is created. You can create a maximum of 250 model monitors per account.

See CREATE MODEL MONITOR for more details on the CREATE MODEL MONITOR command.

Tip

For details on other SQL commands that you can use with model monitors, see Model monitor commands.

Temporarily stopping and resuming monitoring

You can suspend (temporarily stop) a model monitor using ALTER MODEL MONITOR … SUSPEND. To resume monitoring, issue ALTER MODEL MONITOR … RESUME.

Automatic suspension on refresh failure

Model monitors automatically suspend refreshes when they encounter five consecutive refresh failures related to the source tables. You can view the status and cause of refresh suspension using the DESCRIBE MODEL MONITOR command. The output includes the following columns, among others:

  • aggregation_status: The value in this column is a JSON object. One or more of the values in this object will be SUSPENDED if the model monitor is suspended.

  • aggregation_last_error: The value in this column is a JSON object that contains the specific SQL error that caused the suspension.

After resolving the root cause of the refresh failure, resume the monitor by issuing ALTER MODEL MONITOR … RESUME.

Viewing monitoring reports

To view monitor reports, visit the ML Monitoring dashboard in Snowsight. In the Snowsight navigation pane, select AI & ML then select Models. The resulting list contains all the models in the Snowflake Model Registry in all the databases and schemas that your current role has access to.

Open a model’s details page by selecting the corresponding row in the Models list. The details page displays key model information, including the model’s description, tags, versions, and monitors.

The Monitors list in the details page displays the list of model monitors, the model versions they are attached to, their status, and when they were created.

Open a model monitor dashboard page by selecting the corresponding row in the Monitors list. The dashboard is populated with graphs displaying key metrics of the model over time. The exact graphs displayed depend on the type of model the monitor is based on (that is, binary classification or regression).

In the dashboard, you can take the following actions:

  • Change the range of the graphs by clicking the time range selector.

  • Change the graphs shown by clicking the Settings button. (Hover the mouse over a metric name to see more information about it.)

  • Compare model monitors by clicking the Compare model selector drop-down.

  • Display more information about the model monitor by selecting Display monitor details.

Access control requirements

The following privileges are required to work with model monitor objects.

Command

Required privileges

CREATE MODEL MONITOR

  • CREATE MODEL MONITOR privilege on the schema where you want to create the model.

  • SELECT on data source (table or view)

  • USAGE on database, schema, warehouse, and model

SHOW MODEL MONITORS

Any privilege on the model monitor

DESCRIBE MODEL MONITOR

Any privilege on the model monitor

ALTER MODEL MONITOR

MODIFY on the model monitor

DROP MODEL MONITOR

OWNERSHIP on the model monitor

Model monitor dashboard

USAGE on the model and the model monitor

Known limitations

The following limitations apply to model monitors:

  • Monitors must be created in the same database and schema as the model version being monitored.

  • Currently only single-output regression and binary classification models are supported.

  • At least one prediction column (class or score) is required. Actual columns are optional but are required to calculate accuracy metrics.

  • If baseline data is not provided, drift cannot be calculated. You will need to drop the monitor and create it again to add baseline data.

  • A given column can be used only once in the monitor. For example, you cannot use the same column as both an ID and a prediction.

  • Model monitors expect that your data does not contain invalid values such as nulls, NaNs, +/-Inf, probability scores outside the range of 0-1, classes that are not exactly 0 or 1, or more than two classes in a PREDICTION_CLASS_COLUMNS column. Such issues may cause the monitor to fail and eventually be suspended.

  • Timestamp columns must be of type TIMESTAMP_NTZ or DATE. Prediction and actual columns must be of type NUMBER.

  • Aggregation windows must be specified in days.

  • The number of monitored features can be at most 500.

  • You can create at most 250 monitors.

Cost considerations

Virtual warehouse compute

Model monitors run in a virtual warehouse. This warehouse incurs cost when the user creates the service and each time the monitor is refreshed. A virtual warehouse is also used when loading the associated Snowsight dashboard, and incurs charges.

Storage

Model Monitors materialize the source data into a table stored in your account.

Cloud services compute

Model Monitors use cloud services compute to trigger refreshes when an underlying base object has changed. Cloud services compute cost is only billed if the daily cloud services cost is greater than 10% of the daily warehouse cost for the account.