Snowflake Model Registry¶
Note
The model registry API described in this topic is generally available as of package version 1.5.0.
After training a model, operationalizing the model and running inference in Snowflake starts with logging the model in the Snowflake Model Registry. The Model Registry lets you securely manage models and their metadata in Snowflake, regardless of origin and type, and makes running inference easy. The key benefits and capabilities of Snowflake Model Registry are:
Ability to store and manage model versions, model metrics, and other model metadata
Ability to serve models and run distributed inference at scale using Python, SQL, or REST API endpoints
Ability to manage model life cycle with flexible governance options and working with models from dev to prod environments
Ability to monitor model performance and drift using Snowflake ML Observability
Ability to securely manage model access with role based access control
The model registry stores machine learning models as first-class schema-level objects in Snowflake.
After you have logged a model, you can invoke its methods (equivalent to functions or stored procedures) to perform model operations, such as inference , in a Snowflake virtual warehouse, or serve the model in Snowpark Container Services for GPU-based inference.
The Snowflake Model Registry has built-in types support for the most common model types, including scikit-learn, xgboost, LightGBM, PyTorch, TensorFlow, Hugging Face pipelines, and MLFlow pyfunc models. The Model Registry is also flexible and powerful enough to support your own previously-trained models, as well as any custom processing code.
Tip
See examples of these model types with end to end workflows in Examples and Quickstarts.
The main classes in the Snowflake Model Registry Python API are:
snowflake.ml.registry.Registry: Manages models within a schema.
snowflake.ml.model.Model: Represents a model.
snowflake.ml.model.ModelVersion: Represents a version of a model.
This topic describes how to perform registry operations in Python using the snowflake-ml-python
library.
You can also perform many registry operations in SQL; see Model Registry SQL.
Required privileges¶
To create a model, you must either own the schema where the model is created or have the CREATE MODEL
privilege on it.
To use a model, you must either own the model or have the USAGE privilege on it. The USAGE privilege allows grantees to
use the model for inference without being able to see any of its internals. To give users access to all existing models in a schema, use
GRANT USAGE ON ALL MODELS IN SCHEMA <schema> TO ROLE <role>;
You can also give users access to future models created in a schema
automatically via GRANT USAGE ON FUTURE MODELS IN SCHEMA <schema> TO ROLE <role>;
. .
If a user’s role has USAGE on a model, it appears in the Snowsight model registry page. For details about how privileges work in Snowflake, see Access control privileges.
Note
By default, models currently do not support replication. This feature is a part of BCR bundle 2024_08, which you can enable if you need model replication. The feature will be enabled by default soon.
Current limitations¶
The following limits apply to models and model versions:
Models |
|
---|---|
Model versions |
|
Opening the Snowflake Model Registry¶
Models are first-class Snowflake objects and can be organized within a database and schema along with other Snowflake objects. The Snowflake Model Registry provides a Python class for managing models within a schema. Thus, any Snowflake schema can be used as a registry. It is not necessary to initialize or otherwise prepare a schema for this purpose. Snowflake recommends creating one or more dedicated schemas for this purpose, such as ML.REGISTRY. You can create the schema using CREATE SCHEMA.
Before you can create or modify models in the registry, you must open the registry. Opening the registry returns a reference to it, which you can then use to add new models and obtain references to existing models.
from snowflake.ml.registry import Registry
reg = Registry(session=sp_session, database_name="ML", schema_name="REGISTRY")
Registering models and versions¶
Adding a model to the registry is called logging the model. Log a model by calling the registry’s log_model
method. This method serializes the model — a Python object — and creates a Snowflake model object from it. This method also adds metadata, such as a description, to the model as specified in the log_model
call.
Each model can have unlimited versions. To log additional versions of the model, call log_model
again with
the same model_name
but a different version_name
.
You cannot add tags to a model when it is added to the registry, because tags are attributes of the model, and
log_model
adds a specific model version, only creating a model when adding its first version. You can update
the model’s tags after logging the first version of the model.
In the following example, clf
, short for “classifier,” is the Python model object, which was already
created elsewhere in your code. You can add a comment at registration time, as shown here. The combination of
name and version must be unique in the schema. You may specify conda_dependencies
lists; the
specified packages will be deployed with the model.
from snowflake.ml.model import type_hints
mv = reg.log_model(clf,
model_name="my_model",
version_name="v1",
conda_dependencies=["scikit-learn"],
comment="My awesome ML model",
metrics={"score": 96},
sample_input_data=train_features,
task=type_hints.Task.TABULAR_BINARY_CLASSIFICATION)
The arguments of log_model
are described here.
Required arguments
Argument |
Description |
---|---|
|
The Python model object of a supported model type. Must be serializable (“pickleable”). |
|
The model’s name, used with |
Note
The combination of model name and version must be unique in the schema.
Optional arguments
Argument |
Description |
---|---|
|
String specifying the model’s version, used with |
|
List of paths to directories of code to import when loading or deploying the model. |
|
Comment, for example a description of the model. |
|
List of Conda packages required by your model. This argument specifies package names
and optional versions in Conda format,
that is, |
|
List of external modules to pickle with the model. Supported with scikit-learn, Snowpark ML, PyTorch, TorchScript, and custom models. |
|
Dictionary that contains metrics linked to the model version. |
|
Dictionary that contains options for model creation. The following options are available for all model types:
Individual model types may support additional options. See Using built-in model types. |
|
List of package specs for PyPI packages required by your model. Only supported for models running in Snowpark Container Services. |
|
The version of Python under which the model will run. Defaults to |
|
A DataFrame that contains sample input data. The feature names required by the model and their types are
extracted from this DataFrame. Either this argument or |
|
Model method signatures as a mapping from target method name to signatures of input and output. Either this argument or
|
|
The task defining the problem the model is meant to solve. If unspecified, best effort is made to infer the model task from
the model class or it is set to |
log_model
returns a snowflake.ml.model.ModelVersion
object, which represents the version of the model
that was added to the registry.
After registration, the model itself cannot be modified (although you can change its metadata). To delete a model and all its versions, use the registry’s delete_model method.
Working with model artifacts¶
After a model has been logged, its artifacts (the files backing the model, including its serialized Python objects and various metadata files such as its manifest) are available on an internal stage. Artifacts cannot be modified, but you can view or download the artifacts of models you own.
Note
Having the USAGE privilege on a model does not allow you to access its artifacts; ownership is required.
You can access model artifacts from a stage using, for example, the GET command or its equivalent in Snowpark Python, FileOperation.get.
However, you cannot address model artifacts using the usual stage path syntax. Instead, use a snow://
URL, a more
general way to specify the location of objects in Snowflake. For example, a version inside a model can be specified by a
URL of the form snow://model/<model_name>/versions/<version_name>/
.
Knowing the of name of the model and the version you want, you can use the LIST command to view the artifacts of the model as follows:
LIST 'snow://model/my_model/versions/V3/';
The output resembles:
name size md5 last_modified
versions/V3/MANIFEST.yml 30639 2f6186fb8f7d06e737a4dfcdab8b1350 Thu, 18 Jan 2024 09:24:37 GMT
versions/V3/functions/apply.py 2249 e9df6db11894026ee137589a9b92c95d Thu, 18 Jan 2024 09:24:37 GMT
versions/V3/functions/predict.py 2251 132699b4be39cc0863c6575b18127f26 Thu, 18 Jan 2024 09:24:37 GMT
versions/V3/model.zip 721663 e92814d653cecf576f97befd6836a3c6 Thu, 18 Jan 2024 09:24:37 GMT
versions/V3/model/env/conda.yml 332 1574be90b7673a8439711471d58ec746 Thu, 18 Jan 2024 09:24:37 GMT
versions/V3/model/model.yaml 25718 33e3d9007f749bb2e98f19af2a57a80b Thu, 18 Jan 2024 09:24:37 GMT
To retrieve one of these artifacts, use the SQL GET command:
GET 'snow://model/model_my_model/versions/V3/MANIFEST.yml'
Or the equivalent with Snowpark Python:
session.file.get('snow://model/my_model/versions/V3/MANIFEST.yml', 'model_artifacts')
Note
The names and organization of a model’s artifacts can vary depending on the type of the model and might change. The preceding example artifact list is intended to be illustrative, not authoritative.
Deleting models¶
Use the registry’s delete_model
method to delete a model and all its versions:
reg.delete_model("mymodel")
Tip
You can also delete models in SQL using DROP MODEL.
Getting models from the registry¶
To get information about each model, use the show_models
method:
model_df = reg.show_models()
Tip
In SQL, use SHOW MODELS to get a list of models.
The result of show_models
is a pandas DataFrame. The available columns are listed here:
Column |
Description |
---|---|
|
Date and time when the model was created. |
|
Name of the model. |
|
Database in which the model is stored. |
|
Schema in which the model is stored. |
|
Role that owns the model. |
|
Comment for the model. |
|
JSON array listing versions of the model. |
|
Version of the model used when referring to the model without a version. |
To get a list of the models in the registry instead, each as a Model
instance, use the models
method:
model_list = reg.models()
To get a reference to a specific model from the registry by name, use the registry’s get_model
method:
m = reg.get_model("MyModel")
Note
Model
instances are not copies of the original logged Python model object; they are references to the underlying
model object in the registry.
After you have a reference to a model, either one from the list returned by the models
method or one retrieved using
get_model
, you can work with its metadata and
its versions.
Viewing and updating a model’s metadata¶
You can view and update a model’s metadata attributes in the registry, including its name, comment, tags, and metrics.
Retrieving and updating comments¶
Use the model’s comment
attribute to retrieve and update the model’s comment:
print(m.comment)
m.comment = "A better description than the one I provided originally"
Note
The description
attribute is a synonym for comment
. The previous code can also be written this way:
print(m.description)
m.description = "A better description than the one I provided originally"
Tip
You can also set a model’s comment in SQL by using ALTER MODEL.
Renaming a model¶
Use the rename
method to rename or move a model. Specify a fully qualified name as the new name to move the model to
a different database or schema.
m.rename("MY_MODEL_TOO")
Tip
You can also rename a model in SQL using ALTER MODEL.
Working with model versions¶
A model can have unlimited versions, each identified by a string. You can use any version naming convention that you
like. Logging a model actually logs a specific version of the model. To log additional versions of a model, call
log_model
again with the same model_name
but a different version_name
.
Tip
In SQL, use SHOW VERSIONS IN MODEL to see the versions of a model.
A version of a model is represented by an instance of the snowflake.ml.model.ModelVersion
class.
To get a list of all the versions of a model, call the model object’s versions
method. The result is a list of
ModelVersion
instances:
version_list = m.versions()
To get information about each model as a DataFrame instead, call the model’s show_versions
method:
version_df = m.show_versions()
The resulting DataFrame contains the following columns:
Column |
Description |
---|---|
|
Date and time when the model version was created. |
|
Name of the version. |
|
Database in which the version is stored. |
|
Schema in which the version is stored. |
|
Name of the model that this version belongs to. |
|
Boolean value indicating whether this version is the model’s default version. |
|
JSON array of the names of the functions available in this version. |
|
JSON object containing metadata as key-value pairs ( |
|
JSON object from the |
Deleting model versions¶
You can delete a model version by using the model’s delete_version
method:
m.delete_version("rc1")
Tip
You can also delete a model version in SQL by using ALTER MODEL … DROP VERSION.
Default version¶
A version of a model can be designated as the default model. Retrieve or set the model’s default
attribute to obtain
the current default version (as a ModelVersion
object) or to change it (using a string):
default_version = m.default
m.default = "v2"
Tip
In SQL, use ALTER MODEL to set the default version.
Model version aliases¶
You can assign an alias to a model version by using the SQL ALTER MODEL command. You can use an alias wherever a version name is required, such as when getting a reference to a model version, in Python or in SQL. A given alias can be assigned to only one model version at a time.
In addition to aliases you create, the following system aliases are available in all models:
DEFAULT
refers to the default version of the model.FIRST
refers to the oldest version of the model by creation time.LAST
refers to the newest version of the model by creation time.
Alias names you create must not be the same as any existing version name or alias in the model, including system aliases.
Getting a reference to a model version¶
To get a reference to a specific version of a model as a ModelVersion
instance, use the model’s version
method.
Use the model’s default
attribute to get the default version of the model:
m = reg.get_model("MyModel")
mv = m.version("v1")
mv = m.default
After you have a reference to a specific version of a model (such as the variable mv
in this example), you can
retrieve or update its comments or metrics and call the model’s methods (or functions) as shown in the following sections.
Retrieving and updating comments¶
As with models, model versions can have comments, which can be accessed and set via the model version’s comment
or
description
attribute:
print(mv.comment)
print(mv.description)
mv.comment = "A model version comment"
mv.description = "Same as setting the comment"
Tip
You can also change a model version’s comment in SQL by using ALTER MODEL … MODIFY VERSION.
Retrieving and updating metrics¶
Metrics are key-value pairs used to track prediction accuracy and other model version characteristics. You can set
metrics when creating a model version or set them using the set_metric
method. A metric value can be any Python
object that can be serialized to JSON, including numbers, strings, lists, and dictionaries. Unlike tags, metric names
and possible values do not need to be defined in advance.
A test accuracy metric might be generated using sklearn’s accuracy_score
:
from sklearn import metrics
test_accuracy = metrics.accuracy_score(test_labels, prediction)
The confusion matrix can be generated similarly using sklearn:
test_confusion_matrix = metrics.confusion_matrix(test_labels, prediction)
Then you can set these values as metrics:
# scalar metric
mv.set_metric("test_accuracy", test_accuracy)
# hierarchical (dictionary) metric
mv.set_metric("evaluation_info", {"dataset_used": "my_dataset", "accuracy": test_accuracy, "f1_score": f1_score})
# multivalent (matrix) metric
mv.set_metric("confusion_matrix", test_confusion_matrix)
To retrieve a model version’s metrics as a Python dictionary, use show_metrics
:
metrics = mv.show_metrics()
To delete a metric, call delete_metric
:
mv.delete_metric("test_accuracy")
Tip
You can also modify a model version’s metrics (which are stored in as metadata) in SQL by using ALTER MODEL … MODIFY VERSION.
Retrieving model explanations¶
The model registry can explain a model’s results, telling you which input features contribute most to predictions, by calculating
Shapley values. This preview feature is available by default in all
model views created in Snowflake 8.31 and later through the underlying model’s explain
method. You can call explain
from SQL or via a model view’s
run
method in Python.
For details on this feature, see Model Explainability.
Exporting a model version¶
Use mv.export
to export a model’s files to a local directory; the directory is created if it does not exist:
mv.export("~/mymodel/")
By default, the exported files include the code, the environment to load the model, and model weights. To also
export the files needed to run the model in a warehouse, specify export_mode = ExportMode.FULL
:
mv.export("~/mymodel/", export_mode=ExportMode.FULL)
Loading a model version¶
Use mv.load
to load the original Python model object that was originally added to the registry. You can then
use the model for inference just as though you had defined it in your Python code:
clf = mv.load()
To ensure proper functionality of a model loaded from the registry, the target Python environment (that is, the
versions of the Python interpreter and of all libraries) should be identical to the environment from which the model
was logged. Specify force=True
in the load
call to force the model to be loaded even if the environment is
different.
Tip
To make sure your environment is the same as the one where the model is hosted, download a copy of the conda environment from the model registry:
conda_env = session.file.get("snow://model/<modelName>/versions/<versionName>/runtimes/python_runtime/env/conda.yml", ".")
open("~/conda.yml", "w").write(conda_env)
Then create a new conda environment from this file:
conda env create --name newenv --file=~/conda.yml
conda activate newenv
The optional options
argument is a dictionary of options for loading the model. Currently, the argument supports
only the use_gpu
option.
Option |
Type |
Description |
Default |
---|---|---|---|
|
|
Enables GPU-specific loading logic. |
|
The following example illustrates the use of the options
argument:
clf = mv.load(options={"use_gpu": True})
Calling model methods¶
Model versions can have methods, which are attached functions that can be executed to perform inference or other model operations. The versions of a model can have different methods, and the signatures of these methods can also differ.
To call a method of a model version, use mv.run
, where mv
is a ModelVersion
object. Specify the name of the
function to be called and pass a Snowpark or pandas DataFrame that contains the inference data, along with any required
parameters. The method is executed in a Snowflake warehouse.
The return value of the method is a Snowpark or pandas DataFrame, matching the type of DataFrame passed in.
Snowpark DataFrames are evaluated lazily, so the method is run only when the DataFrame’s collect
, show
,
or to_pandas
method is called.
Note
Invoking a method runs it in the warehouse specified in the session you’re using to connect to the registry. See Specifying a Warehouse.
The following example illustrates running the predict
method of a model. This model’s predict
method does not
require any parameters besides the inference data (test_features
here). If it did, they would be passed as
additional arguments after the inference data.
remote_prediction = mv.run(test_features, function_name="predict")
remote_prediction.show() # assuming test_features is Snowpark DataFrame
To see what methods can be called on a given model, call mv.show_functions
. The return value of this method is a
list of ModelFunctionInfo
objects. Each of these objects includes the following attributes:
name
: The name of the function that can be called from Python or SQL.target_method
: The name of the Python method in the original logged model.
Tip
You can also call model methods in SQL. See Model commands.
Cost considerations¶
Using the Snowflake Model Registry incurs standard Snowflake consumption-based costs. These include:
Cost of storing model artifacts, metadata, and functions. For general information about storage costs, see Exploring storage cost.
Cost of copying files between stages to Snowflake. See COPY FILES.
Cost of serverless model object operations through the Snowsight UI or the SQL or Python interface, such as showing models and model versions and altering model comments, tags, and metrics.
Warehouse compute costs, which vary depending on the type of model and the quantity of data used in inference. For general information about Snowflake compute costs, see Understanding compute cost. Warehouse compute costs are incurred for:
Model and version creation operations
Invoking a model’s methods