Snowpark ML Model Registry

The Snowpark ML model registry stores Python ML models so they can easily be found and used by others. You can create your own registries in Snowflake, and store and maintain your models, using Snowpark Python and Snowpark ML. Registries are Snowflake databases.

Supported model types include:

  • Snowpark ML

  • scikit-learn

  • xgboost

  • Pytorch

  • TensorFlow

  • MLFlow


A version of the Snowpark ML model registry is now available to the public (see Snowflake Model Registry). However, that version does not yet support deploying models to Snowpark Container Services compute pools. If you are using this functionality, please feel free to continue using the private preview version of the model registry described in this topic.

Model APIs

Snowpark ML includes two separate APIs for working with models.

  • A relational API where model operations are methods of the registry object, and you provide a name and version to these methods to specify the model to be operated upon.

  • An object API where operations on a specific model are methods of a ModelReference object obtained from the registry.

Operations performed through these two APIs are equivalent; use the one you find most convenient. In general, the object API is more convenient if you already have a reference to the model. The relational API is more convenient if you are, for example, reading model names and versions from a file and performing some operation on those models, such as updating their metadata.

You can convert calls to the object API into calls to the relational API by calling the method of the same name on the registry and passing the model name and versions. For example, the two calls below are equivalent.

# set a tag on a model using the object API when you have a reference to it
model.set_tag(tag_name="stage", tag_value="production")

# set a tag on a model using the relational API when you have its name and version
registry.set_tag(model_name="my_model", model_version="102", tag_name="stage", tag_value="production")

The code examples in this topic use the object API when working with models. It is likely we will choose one API or the other for the public release of the model registry.


A Jupyter notebook containing example code for the model registry is available in the examples subfolder of the releases folder that we shared with you.

Installing the Snowpark ML Library

See Installing Snowpark ML for instructions for installing Snowpark ML.

Connecting to Snowflake

The model registry connects to Snowflake using a Snowpark session, which you can create in several ways, including by passing all the configuration settings in your Python code.

A better way to create a Snowpark session to have Snowpark read connection settings from the SnowSQL configuration file located at ~/.snowsql/config. This approach avoids exposing your connection settings, including your password, in your code. If you have already added one or more connection settings to this file, you can use them with the model registry, or you might add a new named connection specifically for use with Snowpark ML. For more information on adding connection settings to the SnowSQL configuration file, see Configuring default connection settings.

In your client code, you can log in with the default SnowSQL connection using the SnowflakeLoginOptions utility class as shown here.

from snowflake.snowpark import Session
from import SnowflakeLoginOptions

session = Session.builder.configs(SnowflakeLoginOptions()).create()

To use named SnowSQL connection settings to connect to Snowflake, specify its name as shown here.

session = Session.builder.configs(

In either case, you’ll use the resulting session object when creating or opening a registry.


When using the registry in a stored procedure, the stored procedure session can be used as your Snowpark session.

Required Privileges

Creating a registry requires the following privilege if the database does not already exist:

  • CREATE DATABASE global privilege

If the database does exist, the following privileges are necessary to use it as a model registry.

  • USAGE on the registry database

  • USAGE on the registry’s PUBLIC schema

  • CREATE TABLE on the registry’s PUBLIC schema

  • CREATE VIEW on the registry’s PUBLIC schema

  • SELECT on all tables in the registry’s PUBLIC schema

Using a registry (adding and working with models) requires the privileges below.

  • INSERT on all tables in the registry’s PUBLIC schema

  • SELECT on all views in the registry’s PUBLIC schema

  • CREATE STAGE on the registry’s PUBLIC schema

Creating the Model Registry

To create a model registry, use the create_model_registry function, passing it your Snowpark session.

from import model_registry

result = model_registry.create_model_registry(session=session, database_name="MODEL_REGISTRY")

The database name is optional. If you do not specify it, MODEL_REGISTRY is the default. By using different database names, you can create multiple registries in your account for access control, lifecycle management, or other purposes.

create_model_registry returns True if the registry was successfully created, or False if it was not. It is not an error to create a registry more than once, although you will receive a warning.

Getting a Reference to the Model Registry

Before you can create or modify models in the registry, you must obtain a reference to the registry.

registry = model_registry.ModelRegistry(session=session, database_name="MODEL_REGISTRY")

As with registry creation, the database name is optional; the default value MODEL_REGISTRY is used if you do not specify it.

You use the registry object to perform registry operations (such as registering models) and, optionally, model operations.

Registering a Model

Add a model by to the registry calling the registry’s log_model method. This method:

  • Serializes the model and uploads it to a Snowflake stage. The model, a Python object, must be serializable (“pickleable”).

  • Creates an entry in the model registry for the model, referencing the staged location.

  • Adds metadata such as description and tags to the model as specified in the log_model call.

In the example below, clf, short for “classifier,” is the model, assumed to already have been created elsewhere. You can add a name and tags at registration time, as shown here. The combination of name and version must be unique for each model in the registry.

model_id = registry.log_model(model=clf,
                description="My awesome ML model",
                tags={"stage": "testing", "classifier_type": "svm.SVC",
                    "svc_gamma": svc_gamma, "svc_C": svc_C},

The arguments shown here are described below.





The Python model object of a supported model type. Must be serializable (“pickleable”).


The model’s name, used with model_version to identify the model in the registry. The name cannot be changed after model has been added.


String specifying the model’s version, used with model_name to identify the model in the registry. The version cannot be changed after model has been added.



Path to directory of code to import when loading or deploying the model.


List of Conda packages required by your model. This argument specifies package names and optional versions in Conda format, that is, "[channel::]package [operator version]". If you do not specify a channel, the Snowflake channel is assumed.


Model description.


Dictionary containing options for model creation. Currently only one option is recognized.

  • embed_local_ml_library: whether to embed a copy of the local Snowpark ML library into the model. Default: False.


List of package specs for PyPI packages required by your model. Models with pip requirements can be deployed only to Snowpark Container Services (SPCS) compute pools, not to Snowflake warehouses.


A Snowpark DataFrame containing sample input data. The feature names required by the model, and their types, are extracted from this DataFrame. Either this argument or signatures must be provided for all models except Snowpark ML and MLFlow models.


Model signatures as a mapping from target method name to signatures of input and output. Either this argument or sample_input_data must be provided for all models except Snowpark ML and MLFlow models.


Dictionary containing metadata used to record a model’s purpose, algorithm, training data set, lifecycle stage, or other information you choose.


The combination of model name and version must be unique in the registry.

log_model returns the new model’s ID, an opaque string identifier assigned by the registry to identify a specific version of a model. The ID is used internally by the registry and is not needed by client code.

Once registered, the model itself is immutable (although you can change its metadata in the registry). To update a model, delete the old version and register the new version.

Listing Models in the Registry

To get a list of the models in the registry:

model_list = registry.list_models()

The model list is a Snowpark DataFrame, so you can easily choose which columns you want as well as filter and sort the list as desired. The most important columns are shown below. (A few additional columns are present but not currently used.)


Data type




Name of the role used to create the model.



The date and time at which the model was created.



The unique ID assigned to this model by the registry.



The model’s name.



The model’s version.



The model’s description.



Mapping of model’s metric names to values.



Mapping of model’s tag names to values.

To obtain the model registry sorted with the most recently created models first, for example:"ID", "NAME", "CREATION_TIME", "TAGS").order_by(
    "CREATION_TIME", ascending=False).show()

Or to filter by model ID:

model_list.filter(model_list["ID"] == model_id).select(
        "NAME", "TAGS", "METRICS").show()

In both examples, only the columns specified in the select method call are retrieved.

Getting a Reference to a Model

After a model has been registered, you can get a reference to it by creating a ModelReference from its name and version.

model = model_registry.ModelReference(registry=registry, model_name="my_model", model_version="101")

You can use this model reference to make changes to the model’s metadata, to deploy the model, and to manage its deployments.

Viewing and Updating a Model’s Metadata

You can view and update a model’s metadata attributes in the registry, including its description, tags, and metrics.

Viewing and Updating a Model’s Description

Use the model’s get_model_description and set_model_description methods to view and update the model’s description.


model.set_model_description("A better description than the one I provided originally")

Viewing and Updating a Model’s Tags

Tags are metadata used to record a model’s purpose, algorithm, training data set, lifecycle stage, or other information you choose. You can set tags when the model is registered or at any time afterward. You can also update the values of existing tags or remove tags entirely, as shown here.

# get all tags

# add tag or set new value of existing tag
model.set_tag("minor_version", "1")

# remove tag

One important use of tags is to manage the lifecycle of a model. Here we use a tag called “stage” for this purpose.

# Set model stage
model.set_tag("stage", "prod")

You can then get a list of all models and their stage as follows.

# List models and their tags
lm = registry.list_models()'NAME', 'VERSION', functions.parse_json(lm["TAGS"]).getField("stage")).show()

Model references support the following methods for working with tags.






Retrieves all of the model’s tags and their values as a Python dictionary.



Retrieves the value of the specified tag.



Returns True if the model has the specified tag, False if not.



Removes the specified tag from the model.



Sets the specified tag’s value, creating the tag if necessary.

Viewing and Updating a Model’s Metrics

Metrics are metadata used to track prediction accuracy and other characteristics of a model. The registry supports both scalars and more complex objects. Use the model’s set_metric method to set metrics. The following examples illustrate the use of a scalar, a dictionary, and a two-dimensional NumPy array as metrics.

The test accuracy metric might be generated using sklearn’s accuracy_score:

from sklearn import metrics

test_accuracy = metrics.accuracy_score(test_labels, prediction)

The confusion matrix can be generated similarly using sklearn:

test_confusion_matrix = metrics.confusion_matrix(test_labels, prediction)

Then we can set these values as metrics as follows.

# scalar metric
model.set_metric("test_accuracy", test_accuracy)

# hierarchical (dictionary) metric
model.set_metric("dataset_test", {"accuracy": test_accuracy})

# multivalent (matrix) metric
model.set_metric("confusion_matrix", test_confusion_matrix)

To view a model’s metrics, use get_metrics.


Model references support the following methods for working with metrics.






Retrieves all of the model’s metrics and their values as a Python dictionary.



Retrieves the value of the specified metric.



Returns True if the model has the specified metric, False if not.



Removes the specified metric from the model.



Sets the specified metric’s value, creating the metric if necessary.

Deploying and Using Models from the Registry

Models can be deployed to a Snowflake warehouse or in a Snowpark Container Services (SPCS) compute pool.

It is also possible to use a model locally.

Using a Model in a Snowflake Warehouse

To deploy a model to a warehouse, use the model’s deploy method. The registry generates a user-defined function (UDF), using a name you provide, that invokes the model’s predict method.

             target_method="predict",    # the name of the model's method, usually predict

If you do not specify the permanent argument as shown above, the deployment is temporary and will be removed automatically when your session ends (for example, when you close your Jupyter notebook, or when your Python script exits).

After the model has been deployed, you can use it by calling the model’s predict method with the deployment name you specified when you deployed it.

result_dataframe = model.predict("my_warehouse_predict", test_dataframe)

Using a Model in an SPCS Compute Pool

Before you can deploy your model to a Snowpark Container Services (SPCS) compute pool, you must have a Docker client installed.

To deploy a model to a SPCS compute pool, use code like the following. Note that you must specify a compute pool name, and the specified compute pool must already exist.

from import deploy_platforms

                "compute_pool": "my_pool",


You can optionally set the min_instances and max_instances keys in the options argument to control how many instances of the SCPS service can be run simultaneously. These options both default to 1.

After the model has been deployed, you can use it by calling the model’s predict method with the deployment name you specified when you deployed it.

result_dataframe = model.predict("my_spcs_predict", test_dataframe)

Deployment Arguments and Options

A complete list of available arguments for the deploy method is shown here.





The model’s name, used with model_version to identify the model in the registry.


String specifying the model’s version, used with model_name to identify the model in the registry.


The name of the generated user-defined function.



Dictionary containing options for model deployment. These are listed in the following tables.


If True, the deployed model remains in place when the current Python session ends. Default is False. Note that SPCS deployments are always permanent.

Options for all deployments





If True, include the input columns in the output. Default is False.


If True, preserve the row order in the output. Applies only for DataFrames containing fewer than 2 64 rows. Default is True.

Options for warehouse deployments





Stage location where the UDF should be stored when the permanent argument is True.


If True, allow the version constraints of dependencies to be relaxed slightly. May cause some errors if the selected dependencies are not fully compatible. Defaults to False.


If True, a new generated UDF can replace an existing UDF of the same name. Defaults to False.

Options for SPCS compute pool deployments





Compute pool name.



The name of the endpoint that the service function will communicate with. This option is useful when the service has multiple endpoints. Defaults to predict.


SnowService image repository path in format image_registry/atabase/schema/repository. By default inferred from session information.


Maximum number of service replicas. Default is 1


Minimum number of service replicas. Default is 1.


Number of workers used for model inference. Make sure the number of workers multiplied by the size of the model does not exceed available memory. Default is twice the number of CPU cores plus 1.


Specifies a previously-built Docker image to be used as-is; no image is built. This option is for users who often use a single image for multiple use cases. For this situation, you can have the initial deployment build an image (the name of the image is logged to the console), then re-use that image for additional deployments by passing its name here. Default: None.


When True, a CUDA-enabled Docker image is used to provide a runtime CUDA environment. Default is False.

Managing Deployments

Model references support the following methods for managing deployments.






Deletes the specified deployment.



Gets information about the specified deployment. The result is a Snowpark DataFrame containing attributes of the deployment.



Gets a list of the names of all deployments of the model.

For example, to delete a deployment named my_deployment, use:


Using a Model Locally


It is convenient to use a model locally when you first develop and test it. As the model grows, however, it will eventually exceed the capacity of your local system. At this point, deploy it to a warehouse or SPCS compute pool and run it there instead.

To use a model locally, you first deserialize (“unpickle”) the model from the registry in your own Python code using the model’s load_model method. You can then call the model’s predict method with your test data.

For the deserialization operation to be successful, the target Python environment must be very similar to the one originally used to add the model to the registry. Ideally, the environment should include not only the same version of Python, but of every dependency used by the model. Other versions of some dependencies (especially later point releases) may also be compatible.


Use a virtual environment and a requirements.txt file to manage your Python dependencies so you can easily recreate the original environment.

The following is an example of deserializing and using a model from the registry in Python.

clf = model.load_model()
results = clf.predict(test_features)    # Snowpark DataFrame

Deleting Models

You can delete models using the registry’s delete_model method. By default, the model itself is deleted from the stage, not just its entry in the registry; pass delete_artifact = False to keep the model.

# delete from registry and also delete the model itself
registry.delete_model(model_name="my_model", model_version="100")

# delete from registry but keep the model
registry.delete_model(model_name="my_model", model_version="100", delete_artifact=False)

Existing references to a model are no longer valid after you delete the model from the registry.

Auditing Model and Registry Changes

The registry maintains a history of changes mode to both the registry itself and to the models in it. To see this information, use registry.get_history or model.get_model_history. Both methods return a Snowpark DataFrame, which can be sorted and filtered as desired.

# print complete registry history

# print history of metadata changes to a model