ML Observability: Monitoring model behavior over time¶
Model behavior can change over time due to input drift, stale training assumptions, and data pipeline issues, as well as the usual factors, including changes to the underlying hardware and software and the fluid nature of traffic. ML Observability allows you to track the quality of production models you have deployed via the Snowflake Model Registry across multiple dimensions, such as performance, drift, and volume.
Currently, the model monitor supports regression and binary classification models.
ML Observability workflow¶
When you use a model that has been logged in the Snowflake Model Registry for inference, you receive results in the form of a Snowpark or pandas DataFrame, depending on the type of input DataFrame passed to the inference method. This data typically originates in Snowflake. Even in cases where inference is run outside Snowflake, it is common to store the results in Snowflake. ML Observability allows you to monitor your model’s performance in both of these scenarios by working on the stored inference data. The typical workflow is shown below.
The monitoring logs store the inference data and the predictions so that the ML Observability feature can observe changes in predictions over time. The monitoring logs are stored in a table that contains an ID, a timestamp, features, predictions, and a ground truth label, which indicates whether a given row is a prediction or observed data. The basic structure is shown below.
You must explicitly create a model monitor object for each model version you want to monitor. Each model version can have exactly one monitor, and each monitor can monitor exactly one model version; they cannot be shared. The monitor object automatically refreshes the monitor logs by querying source data and updates the monitoring reports based on the logs.
Each monitor encapsulates the following information:
The model version to monitor.
The table in which the monitor logs are stored.
The minimum time granularity at which data is stored (aggregation window), currently 1 day minimum.
An optional baseline table for comparative metric operations such as drift.
Prerequisites¶
Before you begin, make sure you have the following:
A Snowflake account in any AWS region. Contact your account representative if your account is in Azure.
Version 1.7.1 or later of the
snowflake-ml-python
Python package.Familiarity with the Snowflake Model Registry.
Creating a model monitor¶
Create a model monitor using the CREATE MODEL MONITOR command. The model monitor must be created in the same schema as the model version to be monitored. You must have the CREATE MODEL MONITOR privilege on the schema where the monitor is created.
Note
You can create a maximum of 250 model monitors per account.
The following is a summary of the full syntax of the CREATE MODEL MONITOR command:
CREATE [OR REPLACE] MODEL MONITOR [IF NOT EXISTS] <NAME> WITH
MODEL=<model_name> VERSION=<version_name> FUNCTION=<function_name>
SOURCE=<source_name>
BASELINE=<baseline_name>
TIMESTAMP_COLUMN=<timestamp_colname>
ID_COLUMNS=(<id_col_1>, ...)
[ PREDICTION_CLASS_COLUMNS=('<prediction_class_col_1>' , ...) ]
[ PREDICTION_SCORE_COLUMNS=('<prediction_col_1>' , ...) ]
[ ACTUAL_CLASS_COLUMNS=('<actual_class_col_1>',...) ]
[ ACTUAL_SCORE_COLUMNS=('<actual_col_1>',...) ]
WAREHOUSE=<warehouse>
REFRESH_INTERVAL='<refresh_interval>'
AGGREGATION_WINDOW='<aggregation_window>'
When creating a model monitor, you must specify the following parameters:
NAME: The name of the monitor to be created. Must be unique within the schema.
MODEL and VERSION: The name and version of the model to be monitored.
FUNCTION: The name of the inference function that generates predictions.
SOURCE: The table containing the data to be used for inference.
BASELINE: A table containing a snapshot of recent inference resuts that will be used as a baseline to detect drift. This argument is technically optional, but if you do not provide baseline data, the monitor cannot detect drift.
TIMESTAMP_COLUMN: The name of the timestamp column.
ID_COLUMNS: Array of columns that contain record IDs. Required even if empty.
WAREHOUSE: The warehouse used for internal compute operations for the monitor, including dynamic table operations.
REFRESH_INTERVAL: The interval at which the monitor refreshes from the source data (for example “1 day”). The supported intervals are the same as those available for dynamic table target lag.
AGGREGATION_WINDOW: The aggregation window used to compute metrics in days (for example “1 day”).
Additional columns are specified as follows depending on the model type. All values are arrays of strings that contain column names.
PREDICTION_CLASS_COLUMNS and ACTUAL_CLASS_COLUMNS: Required for classification models. The specified PREDICTION columns are compared with the corresponding ACTUAL columns to compute classification metrics.
PREDICTION_SCORE_COLUMNS and ACTUAL_SCORE_COLUMNS: Required to for regression models. The specified PREDICTION columns are compared with the corresponding ACTUAL columns to compute regression metrics.
You may also use PREDICTION_SCORE_COLUMNS with ACTUAL_CLASS_COLUMNS if the prediciton scores have only the values 0 and 1.
Refresh suspension¶
Like dynamic tables, model monitors suspend refreshes when they encounter five consecutive refresh failures related to the source tables. You can view the status and cause of refresh suspension using the DESCRIBE MODEL MONITOR command. The output includes the following columns:
aggregation_status
: The value in this column is a JSON object. One or more of the values in this object will be SUSPENDED if the model monitor is suspended.aggregation_last_error
: The value in this column is a JSON object that contains the specific SQL error that caused the suspension.
After resolving the root cause of the refresh failure, you can resume the monitor by issuing ALTER MODEL MONITOR <name> RESUME.
Modifying a model monitor¶
The ALTER MODEL MONITOR command lets you change the following properties of a model monitor:
Suspend status using the SUSPEND and RESUME keywords.
ALTER MODEL MONITOR [IF EXISTS] <name> SUSPEND; ALTER MODEL MONITOR [IF EXISTS] <name> RESUME;
Change the warehouse used for refreshes using SET WAREHOUSE=….
ALTER MODEL MONITOR [IF EXISTS] <name> SET WAREHOUSE='<warehouse_name>';
Change the refresh interval using SET REFRESH_INTERVAL=….
ALTER MODEL MONITOR [IF EXISTS] <name> SET REFRESH_INTERVAL='<refresh_interval>';
All other properties are immutable and cannot be changed after the model has been created.
Deleting a model monitor¶
The DROP MODEL MONITOR command deletes a model monitor.
DROP MODEL MONITOR <name>;
Viewing information about model monitors¶
The SHOW MODEL MONITORS command shows information about model monitors you have access to whose names optionally match the specified pattern.
SHOW MODEL MONITORS [LIKE '<pattern>']
[ IN { ACCOUNT | DATABASE [ <db_name> ] | SCHEMA [ <schema_name> ] } ];
The output includes the parameters used to create each listed monitor, the database and schema where it resides, its status, and a description or comment.
The DESCRIBE MODEL MONITOR command shows detailed information about a model monitor.
DESCRIBE MODEL MONITOR <name>;
The output includes all the information from the SHOW command, and additionally:
The aggregation status, including columns for the status of each nested dynamic table type, the last error, and the time of the last refresh.
The prediction, actual, and feature columns being used from the source table.
Viewing monitoring reports¶
To view monitor reports, visit the ML Monitoring dashboard in Snowsight. In the Snowsight navigation pane, select AI & ML then select Models. The resulting list contains all the models in the Snowflake Model Registry in all the databases and schemas that your current role has access to.
Open a model’s details page by selecting the corresponding row in the Models list. The details page displays key model information, including the model’s description, tags, versions, and monitors.
The Monitors list in the details page displays the list of model monitors, the model versions they are attached to, their status, and when they were created.
Open a model monitor dashboard page by selecting the corresponding row in the Monitors list. The dashboard is populated with graphs displaying key metrics of the model over time. The exact graphs displayed depend on the type of model the monitor is based on (that is, binary classification or regression).
In the dashboard, you can take the following actions:
Change the range of the graphs by clicking the time range selector.
Change the graphs shown by clicking the Settings button. (Hover the mouse over a metric name to see more information about it.)
Compare model monitors by clicking the Compare model selector drop-down.
Display more information about the model monitor by selecting Display monitor details.
Access control requirements¶
The following privileges are required to work with model monitor objects.
Command |
Required practice |
---|---|
CREATE MODEL MONITOR |
|
SHOW MODEL MONITORS |
Any privilege on the model monitor |
DESCRIBE MODEL MONITOR |
Any privilege on the model monitor |
ALTER MODEL MONITOR |
MODIFY on the model monitor |
DROP MODEL MONITOR |
OWNERSHIP on the model monitor |
Model monitor dashboard |
USAGE on the model and the model monitor |
Known limitations¶
The following limitations apply to model monitors:
Monitors must be created in the same database and schema as the model version being monitored.
Currently only single-output regression and binary classification models are supported.
At least one prediction column (class or score) is required. Actual columns are optional but are required to calculate accuracy metrics.
If baseline data is not provided, drift cannot be calculated. You will need to drop the monitor and create it again to add baseline data.
A given column can be used only once in the monitor. For example, you cannot use the same column as both an ID and a prediction.
Model monitors expect that your data does not contain invalid values such as nulls, NaNs, +/-Inf, probability scores outside the range of 0-1, classes that are not exactly 0 or 1, or more than two classes in a PREDICTION_CLASS_COLUMNS column. Such issues may cause the monitor to fail and eventually be suspended.
Timestamp columns must be of type TIMESTAMP_NTZ or DATE. Prediction and actual columns must be of type NUMBER.
Aggregation windows must be specified in days.
The number of monitored features can be at most 500.
You can create at most 250 monitors.
Cost considerations¶
- Virtual warehouse compute
Model monitors run in a virtual warehouse. This warehouse incurs cost when the user creates the service and each time the monitor is refreshed. Additionally, a virtual warehouse is used when the associated Snowsight dashboard is loaded, and will incur charges for the same.
- Storage
Model Monitors materialize the source data into a table stored in your account.
- Cloud services compute
Model Monitors use cloud services compute to trigger refreshes when an underlying base object has changed. Cloud services compute cost is only billed if the daily cloud services cost is greater than 10% of the daily warehouse cost for the account.