Snowflake Model Registry

Note

The model registry API described in this topic is generally available as of package version 1.5.0.

After training a model, operationalizing the model and running inference in Snowflake starts with logging the model in the Snowflake Model Registry. The Model Registry lets you securely manage models and their metadata in Snowflake, regardless of origin and type, and makes running inference easy. The key benefits and capabilities of Snowflake Model Registry are:

  • Ability to store and manage model versions, model metrics, and other model metadata

  • Ability to serve models and run distributed inference at scale using Python, SQL, or REST API endpoints

  • Ability to manage model life cycle with flexible governance options and working with models from dev to prod environments

  • Ability to monitor model performance and drift using Snowflake ML Observability

  • Ability to securely manage model access with role based access control

The model registry stores machine learning models as first-class schema-level objects in Snowflake.

After you have logged a model, you can invoke its methods (equivalent to functions or stored procedures) to perform model operations, such as inference , in a Snowflake virtual warehouse, or serve the model in Snowpark Container Services for GPU-based inference.

The Snowflake Model Registry has built-in types support for the most common model types, including scikit-learn, xgboost, LightGBM, PyTorch, TensorFlow, Hugging Face pipelines, and MLFlow pyfunc models. The Model Registry is also flexible and powerful enough to support your own previously-trained models, as well as any custom processing code.

Tip

See examples of these model types with end to end workflows in Examples and Quickstarts.

The main classes in the Snowflake Model Registry Python API are:

This topic describes how to perform registry operations in Python using the snowflake-ml-python library. You can also perform many registry operations in SQL; see Model Registry SQL.

Required privileges

To create a model, you must either own the schema where the model is created or have the CREATE MODEL privilege on it. To use a model, you must either own the model or have the USAGE privilege on it. The USAGE privilege allows grantees to use the model for inference without being able to see any of its internals. To give users access to all existing models in a schema, use GRANT USAGE ON ALL MODELS IN SCHEMA <schema> TO ROLE <role>; You can also give users access to future models created in a schema automatically via GRANT USAGE ON FUTURE MODELS IN SCHEMA <schema> TO ROLE <role>;. .

If a user’s role has USAGE on a model, it appears in the Snowsight model registry page. For details about how privileges work in Snowflake, see Access control privileges.

Note

By default, models currently do not support replication. This feature is a part of BCR bundle 2024_08, which you can enable if you need model replication. The feature will be enabled by default soon.

Current limitations

The following limits apply to models and model versions:

Models

  • Maximum of 1000 versions

Model versions

  • Maximum of 10 methods

  • Maximum of 10 imports

  • Maximum of 500 arguments per method

  • Maximum metadata (including metrics) of 100 KB

  • Maximum total model size of 5 GB (for warehouse deployed models)

  • Maximum config file size of 250 KB, including conda.yml and other manifest files that log_model generates internally. (If a model has many functions and all of them have many arguments, for example, this limit might be exceeded.)

Opening the Snowflake Model Registry

Models are first-class Snowflake objects and can be organized within a database and schema along with other Snowflake objects. The Snowflake Model Registry provides a Python class for managing models within a schema. Thus, any Snowflake schema can be used as a registry. It is not necessary to initialize or otherwise prepare a schema for this purpose. Snowflake recommends creating one or more dedicated schemas for this purpose, such as ML.REGISTRY. You can create the schema using CREATE SCHEMA.

Before you can create or modify models in the registry, you must open the registry. Opening the registry returns a reference to it, which you can then use to add new models and obtain references to existing models.

from snowflake.ml.registry import Registry

reg = Registry(session=sp_session, database_name="ML", schema_name="REGISTRY")
Copy

Registering models and versions

Adding a model to the registry is called logging the model. Log a model by calling the registry’s log_model method. This method serializes the model — a Python object — and creates a Snowflake model object from it. This method also adds metadata, such as a description, to the model as specified in the log_model call.

Each model can have unlimited versions. To log additional versions of the model, call log_model again with the same model_name but a different version_name.

You cannot add tags to a model when it is added to the registry, because tags are attributes of the model, and log_model adds a specific model version, only creating a model when adding its first version. You can update the model’s tags after logging the first version of the model.

In the following example, clf, short for “classifier,” is the Python model object, which was already created elsewhere in your code. You can add a comment at registration time, as shown here. The combination of name and version must be unique in the schema. You may specify conda_dependencies lists; the specified packages will be deployed with the model.

from snowflake.ml.model import type_hints
mv = reg.log_model(clf,
                   model_name="my_model",
                   version_name="v1",
                   conda_dependencies=["scikit-learn"],
                   comment="My awesome ML model",
                   metrics={"score": 96},
                   sample_input_data=train_features,
                   task=type_hints.Task.TABULAR_BINARY_CLASSIFICATION)
Copy

The arguments of log_model are described here.

Required arguments

Argument

Description

model

The Python model object of a supported model type. Must be serializable (“pickleable”).

model_name

The model’s name, used with version_name to identify the model in the registry. The name cannot be changed after the model is logged. Must be a valid Snowflake identifier.

Note

The combination of model name and version must be unique in the schema.

Optional arguments

Argument

Description

version_name

String specifying the model’s version, used with model_name to identify the model in the registry. Must be a valid Snowflake identifier. If missing, a human-readable version name is generated automatically.

code_paths

List of paths to directories of code to import when loading or deploying the model.

comment

Comment, for example a description of the model.

conda_dependencies

List of Conda packages required by your model. This argument specifies package names and optional versions in Conda format, that is, "[channel::]package [operator version]". If you do not specify a channel, the Snowflake channel is assumed when model runs on a warehouse. conda-forge is assumed for models running on Snowpark Container Services (SPCS).

ext_modules

List of external modules to pickle with the model. Supported with scikit-learn, Snowpark ML, PyTorch, TorchScript, and custom models.

metrics

Dictionary that contains metrics linked to the model version.

options

Dictionary that contains options for model creation. The following options are available for all model types:

  • embed_local_ml_library: whether to embed a copy of the local Snowpark ML library into the model. Default: False.

  • relax_version: whether to relax the version constraints of the dependencies. This replaces version specifiers like ==x.y.z with specifiers like <=x.y, <(x+1). Default: True.

  • method_options: A dictionary of per-method options, where the key is the name of a method and the value is a dictionary that contains one or more of the options described here. The available options are:

    • case_sensitive: Indicates whether the method and its signature are case-sensitive. Case-sensitive methods must be double-quoted when used in SQL. This option also allows non-alphabetic characters in method names. Default: False.

    • max_batch_size: Maximum batch size that the method will accept when called in the warehouse. Default: None (the batch size is automatically determined).

Individual model types may support additional options. See Using built-in model types.

pip_requirements

List of package specs for PyPI packages required by your model. Only supported for models running in Snowpark Container Services.

python_version

The version of Python under which the model will run. Defaults to None, which designates the latest version available in the warehouse.

sample_input_data

A DataFrame that contains sample input data. The feature names required by the model and their types are extracted from this DataFrame. Either this argument or signatures must be provided for all models except Snowpark ML and MLFlow models and Hugging Face pipelines.

signatures

Model method signatures as a mapping from target method name to signatures of input and output. Either this argument or sample_input_data must be provided for all models except Snowpark ML and MLFlow models and Hugging Face pipelines.

task

The task defining the problem the model is meant to solve. If unspecified, best effort is made to infer the model task from the model class or it is set to type_hints.Task.UNKNOWN. Check snowflake.ml.model.type_hints for all task options.

log_model returns a snowflake.ml.model.ModelVersion object, which represents the version of the model that was added to the registry.

After registration, the model itself cannot be modified (although you can change its metadata). To delete a model and all its versions, use the registry’s delete_model method.

Working with model artifacts

After a model has been logged, its artifacts (the files backing the model, including its serialized Python objects and various metadata files such as its manifest) are available on an internal stage. Artifacts cannot be modified, but you can view or download the artifacts of models you own.

Note

Having the USAGE privilege on a model does not allow you to access its artifacts; ownership is required.

You can access model artifacts from a stage using, for example, the GET command or its equivalent in Snowpark Python, FileOperation.get.

However, you cannot address model artifacts using the usual stage path syntax. Instead, use a snow:// URL, a more general way to specify the location of objects in Snowflake. For example, a version inside a model can be specified by a URL of the form snow://model/<model_name>/versions/<version_name>/.

Knowing the of name of the model and the version you want, you can use the LIST command to view the artifacts of the model as follows:

LIST 'snow://model/my_model/versions/V3/';
Copy

The output resembles:

name                                      size                  md5                      last_modified
versions/V3/MANIFEST.yml           30639    2f6186fb8f7d06e737a4dfcdab8b1350        Thu, 18 Jan 2024 09:24:37 GMT
versions/V3/functions/apply.py      2249    e9df6db11894026ee137589a9b92c95d        Thu, 18 Jan 2024 09:24:37 GMT
versions/V3/functions/predict.py    2251    132699b4be39cc0863c6575b18127f26        Thu, 18 Jan 2024 09:24:37 GMT
versions/V3/model.zip             721663    e92814d653cecf576f97befd6836a3c6        Thu, 18 Jan 2024 09:24:37 GMT
versions/V3/model/env/conda.yml          332        1574be90b7673a8439711471d58ec746        Thu, 18 Jan 2024 09:24:37 GMT
versions/V3/model/model.yaml       25718    33e3d9007f749bb2e98f19af2a57a80b        Thu, 18 Jan 2024 09:24:37 GMT

To retrieve one of these artifacts, use the SQL GET command:

GET 'snow://model/model_my_model/versions/V3/MANIFEST.yml'
Copy

Or the equivalent with Snowpark Python:

session.file.get('snow://model/my_model/versions/V3/MANIFEST.yml', 'model_artifacts')
Copy

Note

The names and organization of a model’s artifacts can vary depending on the type of the model and might change. The preceding example artifact list is intended to be illustrative, not authoritative.

Deleting models

Use the registry’s delete_model method to delete a model and all its versions:

reg.delete_model("mymodel")
Copy

Tip

You can also delete models in SQL using DROP MODEL.

Getting models from the registry

To get information about each model, use the show_models method:

model_df = reg.show_models()
Copy

Tip

In SQL, use SHOW MODELS to get a list of models.

The result of show_models is a pandas DataFrame. The available columns are listed here:

Column

Description

created_on

Date and time when the model was created.

name

Name of the model.

database_name

Database in which the model is stored.

schema_name

Schema in which the model is stored.

owner

Role that owns the model.

comment

Comment for the model.

versions

JSON array listing versions of the model.

default_version_name

Version of the model used when referring to the model without a version.

To get a list of the models in the registry instead, each as a Model instance, use the models method:

model_list = reg.models()
Copy

To get a reference to a specific model from the registry by name, use the registry’s get_model method:

m = reg.get_model("MyModel")
Copy

Note

Model instances are not copies of the original logged Python model object; they are references to the underlying model object in the registry.

After you have a reference to a model, either one from the list returned by the models method or one retrieved using get_model, you can work with its metadata and its versions.

Viewing and updating a model’s metadata

You can view and update a model’s metadata attributes in the registry, including its name, comment, tags, and metrics.

Retrieving and updating comments

Use the model’s comment attribute to retrieve and update the model’s comment:

print(m.comment)
m.comment = "A better description than the one I provided originally"
Copy

Note

The description attribute is a synonym for comment. The previous code can also be written this way:

print(m.description)
m.description = "A better description than the one I provided originally"
Copy

Tip

You can also set a model’s comment in SQL by using ALTER MODEL.

Retrieving and updating tags

Tags are metadata used to record a model’s purpose, algorithm, training data set, lifecycle stage, or other information you choose. You can set tags when the model is registered or at any time afterward. You can also update the values of existing tags or remove tags entirely.

Note

You must define the names of all tags (and potentially their possible values) first by using CREATE TAG. See Object Tagging.

To get all of a model’s tags as a Python dictionary, use show_tags:

print(m.show_tags())
Copy

To add a new tag or change the value of an existing tag, use set_tag:

m.set_tag("live_version", "v1")
Copy

To retrieve the value of a tag, use get_tag:

m.get_tag("live_version")
Copy

To remove a tag, use unset_tag:

m.unset_tag("live_version")
Copy

Tip

You can also set a model’s comment in SQL by using ALTER MODEL.

Renaming a model

Use the rename method to rename or move a model. Specify a fully qualified name as the new name to move the model to a different database or schema.

m.rename("MY_MODEL_TOO")
Copy

Tip

You can also rename a model in SQL using ALTER MODEL.

Working with model versions

A model can have unlimited versions, each identified by a string. You can use any version naming convention that you like. Logging a model actually logs a specific version of the model. To log additional versions of a model, call log_model again with the same model_name but a different version_name.

Tip

In SQL, use SHOW VERSIONS IN MODEL to see the versions of a model.

A version of a model is represented by an instance of the snowflake.ml.model.ModelVersion class.

To get a list of all the versions of a model, call the model object’s versions method. The result is a list of ModelVersion instances:

version_list = m.versions()
Copy

To get information about each model as a DataFrame instead, call the model’s show_versions method:

version_df = m.show_versions()
Copy

The resulting DataFrame contains the following columns:

Column

Description

created_on

Date and time when the model version was created.

name

Name of the version.

database_name

Database in which the version is stored.

schema_name

Schema in which the version is stored.

model_name

Name of the model that this version belongs to.

is_default_version

Boolean value indicating whether this version is the model’s default version.

functions

JSON array of the names of the functions available in this version.

metadata

JSON object containing metadata as key-value pairs ({} if no metadata is specified).

user_data

JSON object from the user_data section of the model definition manifest ({} if no user data is specified).

Deleting model versions

You can delete a model version by using the model’s delete_version method:

m.delete_version("rc1")
Copy

Tip

You can also delete a model version in SQL by using ALTER MODEL … DROP VERSION.

Default version

A version of a model can be designated as the default model. Retrieve or set the model’s default attribute to obtain the current default version (as a ModelVersion object) or to change it (using a string):

default_version = m.default
m.default = "v2"
Copy

Tip

In SQL, use ALTER MODEL to set the default version.

Model version aliases

You can assign an alias to a model version by using the SQL ALTER MODEL command. You can use an alias wherever a version name is required, such as when getting a reference to a model version, in Python or in SQL. A given alias can be assigned to only one model version at a time.

In addition to aliases you create, the following system aliases are available in all models:

  • DEFAULT refers to the default version of the model.

  • FIRST refers to the oldest version of the model by creation time.

  • LAST refers to the newest version of the model by creation time.

Alias names you create must not be the same as any existing version name or alias in the model, including system aliases.

Getting a reference to a model version

To get a reference to a specific version of a model as a ModelVersion instance, use the model’s version method. Use the model’s default attribute to get the default version of the model:

m = reg.get_model("MyModel")

mv = m.version("v1")
mv = m.default
Copy

After you have a reference to a specific version of a model (such as the variable mv in this example), you can retrieve or update its comments or metrics and call the model’s methods (or functions) as shown in the following sections.

Retrieving and updating comments

As with models, model versions can have comments, which can be accessed and set via the model version’s comment or description attribute:

print(mv.comment)
print(mv.description)

mv.comment = "A model version comment"
mv.description = "Same as setting the comment"
Copy

Tip

You can also change a model version’s comment in SQL by using ALTER MODEL … MODIFY VERSION.

Retrieving and updating metrics

Metrics are key-value pairs used to track prediction accuracy and other model version characteristics. You can set metrics when creating a model version or set them using the set_metric method. A metric value can be any Python object that can be serialized to JSON, including numbers, strings, lists, and dictionaries. Unlike tags, metric names and possible values do not need to be defined in advance.

A test accuracy metric might be generated using sklearn’s accuracy_score:

from sklearn import metrics

test_accuracy = metrics.accuracy_score(test_labels, prediction)
Copy

The confusion matrix can be generated similarly using sklearn:

test_confusion_matrix = metrics.confusion_matrix(test_labels, prediction)
Copy

Then you can set these values as metrics:

# scalar metric
mv.set_metric("test_accuracy", test_accuracy)

# hierarchical (dictionary) metric
mv.set_metric("evaluation_info", {"dataset_used": "my_dataset", "accuracy": test_accuracy, "f1_score": f1_score})

# multivalent (matrix) metric
mv.set_metric("confusion_matrix", test_confusion_matrix)
Copy

To retrieve a model version’s metrics as a Python dictionary, use show_metrics:

metrics = mv.show_metrics()
Copy

To delete a metric, call delete_metric:

mv.delete_metric("test_accuracy")
Copy

Tip

You can also modify a model version’s metrics (which are stored in as metadata) in SQL by using ALTER MODEL … MODIFY VERSION.

Retrieving model explanations

The model registry can explain a model’s results, telling you which input features contribute most to predictions, by calculating Shapley values. This preview feature is available by default in all model views created in Snowflake 8.31 and later through the underlying model’s explain method. You can call explain from SQL or via a model view’s run method in Python.

For details on this feature, see Model Explainability.

Exporting a model version

Use mv.export to export a model’s files to a local directory; the directory is created if it does not exist:

mv.export("~/mymodel/")
Copy

By default, the exported files include the code, the environment to load the model, and model weights. To also export the files needed to run the model in a warehouse, specify export_mode = ExportMode.FULL:

mv.export("~/mymodel/", export_mode=ExportMode.FULL)
Copy

Loading a model version

Use mv.load to load the original Python model object that was originally added to the registry. You can then use the model for inference just as though you had defined it in your Python code:

clf = mv.load()
Copy

To ensure proper functionality of a model loaded from the registry, the target Python environment (that is, the versions of the Python interpreter and of all libraries) should be identical to the environment from which the model was logged. Specify force=True in the load call to force the model to be loaded even if the environment is different.

Tip

To make sure your environment is the same as the one where the model is hosted, download a copy of the conda environment from the model registry:

conda_env = session.file.get("snow://model/<modelName>/versions/<versionName>/runtimes/python_runtime/env/conda.yml", ".")
open("~/conda.yml", "w").write(conda_env)
Copy

Then create a new conda environment from this file:

conda env create --name newenv --file=~/conda.yml
conda activate newenv
Copy

The optional options argument is a dictionary of options for loading the model. Currently, the argument supports only the use_gpu option.

Option

Type

Description

Default

use_gpu

bool

Enables GPU-specific loading logic.

False

The following example illustrates the use of the options argument:

clf = mv.load(options={"use_gpu": True})
Copy

Calling model methods

Model versions can have methods, which are attached functions that can be executed to perform inference or other model operations. The versions of a model can have different methods, and the signatures of these methods can also differ.

To call a method of a model version, use mv.run, where mv is a ModelVersion object. Specify the name of the function to be called and pass a Snowpark or pandas DataFrame that contains the inference data, along with any required parameters. The method is executed in a Snowflake warehouse.

The return value of the method is a Snowpark or pandas DataFrame, matching the type of DataFrame passed in. Snowpark DataFrames are evaluated lazily, so the method is run only when the DataFrame’s collect, show, or to_pandas method is called.

Note

Invoking a method runs it in the warehouse specified in the session you’re using to connect to the registry. See Specifying a Warehouse.

The following example illustrates running the predict method of a model. This model’s predict method does not require any parameters besides the inference data (test_features here). If it did, they would be passed as additional arguments after the inference data.

remote_prediction = mv.run(test_features, function_name="predict")
remote_prediction.show()   # assuming test_features is Snowpark DataFrame
Copy

To see what methods can be called on a given model, call mv.show_functions. The return value of this method is a list of ModelFunctionInfo objects. Each of these objects includes the following attributes:

  • name: The name of the function that can be called from Python or SQL.

  • target_method: The name of the Python method in the original logged model.

Tip

You can also call model methods in SQL. See Model commands.

Sharing models

The model registry can store two types of models. You can distinguish them using the MODEL_TYPE column in the output of SHOW MODELS.

  • CORTEX_FINETUNED: Models generated with Cortex Fine-tuning, which do not contain user code. To share this type of model, use Data Sharing.

  • USER_MODEL: Models that contain user code, such as models developed using Snowpark ML modeling classes. These models cannot currently be shared. The ability to share models that contain user code will be available in a future release.

Cost considerations

Using the Snowflake Model Registry incurs standard Snowflake consumption-based costs. These include:

  • Cost of storing model artifacts, metadata, and functions. For general information about storage costs, see Exploring storage cost.

  • Cost of copying files between stages to Snowflake. See COPY FILES.

  • Cost of serverless model object operations through the Snowsight UI or the SQL or Python interface, such as showing models and model versions and altering model comments, tags, and metrics.

  • Warehouse compute costs, which vary depending on the type of model and the quantity of data used in inference. For general information about Snowflake compute costs, see Understanding compute cost. Warehouse compute costs are incurred for:

    • Model and version creation operations

    • Invoking a model’s methods