ML Observability: Monitoring model behavior over time¶

Model behavior can change over time due to input drift, stale training assumptions, and data pipeline issues, as well as the usual factors, including changes to the underlying hardware and software and the fluid nature of traffic. ML Observability allows you to track the quality of production models you have deployed via the Snowflake Model Registry across multiple dimensions, such as performance, drift, and volume.

Currently, the model monitor supports regression and binary classification models.

Note

To dive in and start using ML Observability, see the quickstart.

ML Observability workflow¶

When you use a model that has been logged in the Snowflake Model Registry for inference, you receive results in the form of a Snowpark or pandas DataFrame, depending on the type of input DataFrame passed to the inference method. This data typically originates in Snowflake. Even in cases where inference is run outside Snowflake, it is common to store the results in Snowflake. ML Observability allows you to monitor your model’s performance in both of these scenarios by working on the stored inference data. The typical workflow is shown below.

The monitoring logs store the inference data and the predictions so that the ML Observability feature can observe changes in predictions over time. The monitoring logs are stored in a table that contains an ID, a timestamp, features, predictions, and a ground truth label, which indicates whether a given row is a prediction or observed data. The basic structure is shown below.

You must explicitly create a model monitor object for each model version you want to monitor. Each model version can have exactly one monitor, and each monitor can monitor exactly one model version; they cannot be shared. The monitor object automatically refreshes the monitor logs by querying source data and updates the monitoring reports based on the logs.

Each monitor encapsulates the following information:

The model version to monitor.
The table in which the monitor logs are stored.
The minimum time granularity at which data is stored (aggregation window), currently 1 day minimum.
An optional baseline table for comparative metric operations such as drift.

Prerequisites¶

Before you begin, make sure you have the following:

A Snowflake account.
Version 1.7.1 or later of the snowflake-ml-python Python package.
Familiarity with the Snowflake Model Registry.

Creating a model monitor¶

Create a model monitor using the CREATE MODEL MONITOR command. The model monitor must be created in the same schema as the model version to be monitored. You must have the CREATE MODEL MONITOR privilege on the schema where the monitor is created. You can create a maximum of 250 model monitors per account.

See CREATE MODEL MONITOR for more details on the CREATE MODEL MONITOR command.

Tip

For details on other SQL commands that you can use with model monitors, see Model monitor commands.

Temporarily stopping and resuming monitoring¶

You can suspend (temporarily stop) a model monitor using ALTER MODEL MONITOR … SUSPEND. To resume monitoring, issue ALTER MODEL MONITOR … RESUME.

Automatic suspension on refresh failure¶

Model monitors automatically suspend refreshes when they encounter five consecutive refresh failures related to the source tables. You can view the status and cause of refresh suspension using the DESCRIBE MODEL MONITOR command. The output includes the following columns, among others:

aggregation_status: The value in this column is a JSON object. One or more of the values in this object will be SUSPENDED if the model monitor is suspended.
aggregation_last_error: The value in this column is a JSON object that contains the specific SQL error that caused the suspension.

After resolving the root cause of the refresh failure, resume the monitor by issuing ALTER MODEL MONITOR … RESUME.

Viewing monitoring reports¶

To view monitor reports, visit the ML Monitoring dashboard in Snowsight. In the Snowsight navigation pane, select AI & ML then select Models. The resulting list contains all the models in the Snowflake Model Registry in all the databases and schemas that your current role has access to.

Open a model’s details page by selecting the corresponding row in the Models list. The details page displays key model information, including the model’s description, tags, versions, and monitors.

The Monitors list in the details page displays the list of model monitors, the model versions they are attached to, their status, and when they were created.

Open a model monitor dashboard page by selecting the corresponding row in the Monitors list. The dashboard is populated with graphs displaying key metrics of the model over time. The exact graphs displayed depend on the type of model the monitor is based on (that is, binary classification or regression).

In the dashboard, you can take the following actions:

Change the range of the graphs by clicking the time range selector.
Change the graphs shown by clicking the Settings button. (Hover the mouse over a metric name to see more information about it.)
Compare model monitors by clicking the Compare model selector drop-down.
Display more information about the model monitor by selecting Display monitor details.

Querying monitoring results¶

Each model monitor that you create has the following metrics:

Drift metrics: Distribution changes or data shifts
Performance metrics: Distribution changes or data shifts
Statistical metrics: Counts or null values

To query the metrics computed by the monitor, use the monitor metric functions. The metric functions get the metrics from the model monitor objects. You can use the results from the metric functions to create custom dashboards in Streamlit or other centralized monitoring tools.

Important

You must have the following privileges to work with model monitor objects:

Command	Required privileges
CREATE MODEL MONITOR	CREATE MODEL MONITOR privilege on the schema where you want to create the model SELECT on data source (table or view) USAGE on database, schema, warehouse, and model
SHOW MODEL MONITORS	Any privilege on the model monitor
DESCRIBE MODEL MONITOR	Any privilege on the model monitor
ALTER MODEL MONITOR	MODIFY on the model monitor
DROP MODEL MONITOR	OWNERSHIP on the model monitor

Use the following SQL template to get the drift metric from your model monitor.

SELECT *
FROM TABLE(MODEL_MONITOR_DRIFT_METRIC (
                                        <model_monitor_name>,
                                        <drift_metric_name>,
                                        <column_name>,
                                        <granularity>,
                                        <start_time>,
                                        <end_time>
                                      )
          )

Copy

Use the following SQL template to get the performance metric from your model monitor.

SELECT *
FROM TABLE(MODEL_MONITOR_PERFORMANCE_METRIC (
                                        <model_monitor_name>,
                                        <metric_name>,
                                        <granularity>,
                                        <start_time>,
                                        <end_time>
                                      )
          )

Copy

Use the following SQL template to get the statistical metric from your model monitor.

SELECT *
FROM TABLE(MODEL_MONITOR_STAT_METRIC (
                                        <model_monitor_name>,
                                        <metric_name>,
                                        <granularity>,
                                        <start_time>,
                                        <end_time>
                                      )
          )

Copy

You can set up alerts and notifications for your monitoring metrics. For more information, see Alerts and Notifications.

Known limitations¶

The following limitations apply to model monitors:

Monitors must reside in the same database and schema as the model version.
Only single-output regression and binary classification models are supported.
At least one prediction column (class or score) is required; actual columns are optional but needed for accuracy metrics.
Drift calculation requires baseline data; without it, to add baseline data, you must drop the monitor and create it again.
Each column can only be used once in the monitor. For example, you can’t use the same column as the ID column and the prediction column.
Data can’t contain invalid values (nulls, NaNs, +/-Inf, probability scores outside 0-1, non-binary classes, or more than two classes in a PREDICTION_CLASS_COLUMNS column) to avoid monitor failure and suspension.
Timestamp columns must be of type TIMESTAMP_NTZ; prediction and actual columns must be NUMBER.
You must specify the aggregation windows in days.
A maximum of 500 features can be monitored.
Up to 250 monitors can be created.

Cost considerations¶

Virtual warehouse compute:

Model monitors use a virtual warehouse, incurring costs during creation and each refresh.

Loading the Snowsight dashboard also uses a virtual warehouse, incurring additional charges.

Storage:

Model monitors materialize the source data into a table stored in your account.

Cloud services compute:

Model monitors use cloud services compute to trigger refreshes when an underlying base object has changed. Cloud services compute cost is only billed if the daily cloud services cost is greater than 10% of the daily warehouse cost for the account.