Snowpark ML Model Registry¶
The Snowpark ML model registry stores Python ML models so they can easily be found and used by others. You can create your own registries in Snowflake, and store and maintain your models, using Snowpark Python and Snowpark ML. Registries are Snowflake databases.
Supported model types include:
scikit-learn
xgboost
Pytorch
TensorFlow
MLFlow
Tip
A version of the Snowpark ML model registry is now available to the public (see Snowflake Model Registry). However, that version does not yet support deploying models to Snowpark Container Services compute pools. If you are using this functionality, please feel free to continue using the private preview version of the model registry described in this topic.
Model APIs¶
Snowpark ML includes two separate APIs for working with models.
A relational API where model operations are methods of the registry object, and you provide a name and version to these methods to specify the model to be operated upon.
An object API where operations on a specific model are methods of a
ModelReference
object obtained from the registry.
Operations performed through these two APIs are equivalent; use the one you find most convenient. In general, the object API is more convenient if you already have a reference to the model. The relational API is more convenient if you are, for example, reading model names and versions from a file and performing some operation on those models, such as updating their metadata.
You can convert calls to the object API into calls to the relational API by calling the method of the same name on the registry and passing the model name and versions. For example, the two calls below are equivalent.
# set a tag on a model using the object API when you have a reference to it
model.set_tag(tag_name="stage", tag_value="production")
# set a tag on a model using the relational API when you have its name and version
registry.set_tag(model_name="my_model", model_version="102", tag_name="stage", tag_value="production")
The code examples in this topic use the object API when working with models. It is likely we will choose one API or the other for the public release of the model registry.
Examples¶
A Jupyter notebook containing example code for the model registry is available in the examples subfolder of the releases folder that we shared with you.
Installing the Snowpark ML Library¶
See Installing Snowpark ML for instructions for installing Snowpark ML.
Connecting to Snowflake¶
The model registry connects to Snowflake using a Snowpark session, which you can create in several ways, including by passing all the configuration settings in your Python code.
A better way to create a Snowpark session to have Snowpark read connection settings from the SnowSQL configuration file
located at ~/.snowsql/config
. This approach avoids exposing your connection settings, including your password,
in your code. If you have already added one or more connection settings to this file, you can use them with the model
registry, or you might add a new named connection specifically for use with Snowpark ML. For more information on adding
connection settings to the SnowSQL configuration file, see Configuring default connection settings.
In your client code, you can log in with the default SnowSQL connection using the SnowflakeLoginOptions
utility
class as shown here.
from snowflake.snowpark import Session
from snowflake.ml.utils.connection_params import SnowflakeLoginOptions
session = Session.builder.configs(SnowflakeLoginOptions()).create()
To use named SnowSQL connection settings to connect to Snowflake, specify its name as shown here.
session = Session.builder.configs(
SnowflakeLoginOptions(connection_name="snowpark-ml")).create()
In either case, you’ll use the resulting session object when creating or opening a registry.
Tip
When using the registry in a stored procedure, the stored procedure session can be used as your Snowpark session.
Required Privileges¶
Creating a registry requires the following privilege if the database does not already exist:
CREATE DATABASE global privilege
If the database does exist, the following privileges are necessary to use it as a model registry.
USAGE on the registry database
USAGE on the registry’s PUBLIC schema
CREATE TABLE on the registry’s PUBLIC schema
CREATE VIEW on the registry’s PUBLIC schema
SELECT on all tables in the registry’s PUBLIC schema
Using a registry (adding and working with models) requires the privileges below.
INSERT on all tables in the registry’s PUBLIC schema
SELECT on all views in the registry’s PUBLIC schema
CREATE STAGE on the registry’s PUBLIC schema
Creating the Model Registry¶
To create a model registry, use the create_model_registry
function, passing it your Snowpark session.
from snowflake.ml.registry import model_registry
result = model_registry.create_model_registry(session=session, database_name="MODEL_REGISTRY")
The database name is optional. If you do not specify it, MODEL_REGISTRY is the default. By using different database names, you can create multiple registries in your account for access control, lifecycle management, or other purposes.
create_model_registry
returns True
if the registry was successfully created, or False
if it was
not. It is not an error to create a registry more than once, although you will receive a warning.
Getting a Reference to the Model Registry¶
Before you can create or modify models in the registry, you must obtain a reference to the registry.
registry = model_registry.ModelRegistry(session=session, database_name="MODEL_REGISTRY")
As with registry creation, the database name is optional; the default value MODEL_REGISTRY is used if you do not specify it.
You use the registry object to perform registry operations (such as registering models) and, optionally, model operations.
Registering a Model¶
Add a model by to the registry calling the registry’s log_model
method. This method:
Serializes the model and uploads it to a Snowflake stage. The model, a Python object, must be serializable (“pickleable”).
Creates an entry in the model registry for the model, referencing the staged location.
Adds metadata such as description and tags to the model as specified in the
log_model
call.
In the example below, clf
, short for “classifier,” is the model, assumed to already have been created elsewhere.
You can add a name and tags at registration time, as shown here. The combination of name and version must
be unique for each model in the registry.
model_id = registry.log_model(model=clf,
model_name="my_model",
model_version="101",
conda_dependencies=["scikit-learn"],
description="My awesome ML model",
tags={"stage": "testing", "classifier_type": "svm.SVC",
"svc_gamma": svc_gamma, "svc_C": svc_C},
sample_input_data=train_features
)
The arguments shown here are described below.
Argument |
Description |
|
---|---|---|
Required |
|
The Python model object of a supported model type. Must be serializable (“pickleable”). |
|
The model’s name, used with |
|
|
String specifying the model’s version, used with |
|
Optional |
|
Path to directory of code to import when loading or deploying the model. |
|
List of Conda packages required by your model. This argument specifies package names
and optional versions in Conda format,
that is, |
|
|
Model description. |
|
|
Dictionary containing options for model creation. Currently only one option is recognized.
|
|
|
List of package specs for PyPI packages required by your model. Models with pip requirements can be deployed only to Snowpark Container Services (SPCS) compute pools, not to Snowflake warehouses. |
|
|
A Snowpark DataFrame containing sample input data. The feature names required by the model, and their types, are
extracted from this DataFrame. Either this argument or |
|
|
Model signatures as a mapping from target method name to signatures of input and output. Either this argument or
|
|
|
Dictionary containing metadata used to record a model’s purpose, algorithm, training data set, lifecycle stage, or other information you choose. |
Note
The combination of model name and version must be unique in the registry.
log_model
returns the new model’s ID, an opaque string identifier assigned by the registry to identify a
specific version of a model. The ID is used internally by the registry and is not needed by client code.
Once registered, the model itself is immutable (although you can change its metadata in the registry). To update a model, delete the old version and register the new version.
Listing Models in the Registry¶
To get a list of the models in the registry:
model_list = registry.list_models()
The model list is a Snowpark DataFrame, so you can easily choose which columns you want as well as filter and sort the list as desired. The most important columns are shown below. (A few additional columns are present but not currently used.)
Column |
Data type |
Description |
---|---|---|
CREATION_ROLE |
string |
Name of the role used to create the model. |
CREATION_TIME |
timestamp |
The date and time at which the model was created. |
ID |
string |
The unique ID assigned to this model by the registry. |
NAME |
string |
The model’s name. |
VERSION |
string |
The model’s version. |
DESCRIPTION |
string |
The model’s description. |
METRICS |
variant |
Mapping of model’s metric names to values. |
TAGS |
variant |
Mapping of model’s tag names to values. |
To obtain the model registry sorted with the most recently created models first, for example:
model_list.select("ID", "NAME", "CREATION_TIME", "TAGS").order_by(
"CREATION_TIME", ascending=False).show()
Or to filter by model ID:
model_list.filter(model_list["ID"] == model_id).select(
"NAME", "TAGS", "METRICS").show()
In both examples, only the columns specified in the select
method call are retrieved.
Getting a Reference to a Model¶
After a model has been registered, you can get a reference to it by creating a ModelReference
from its name and
version.
model = model_registry.ModelReference(registry=registry, model_name="my_model", model_version="101")
You can use this model reference to make changes to the model’s metadata, to deploy the model, and to manage its deployments.
Viewing and Updating a Model’s Metadata¶
You can view and update a model’s metadata attributes in the registry, including its description, tags, and metrics.
Viewing and Updating a Model’s Description¶
Use the model’s get_model_description
and set_model_description
methods to view and update the model’s
description.
print(model.get_model_description())
model.set_model_description("A better description than the one I provided originally")
Viewing and Updating a Model’s Metrics¶
Metrics are metadata used to track prediction accuracy and other characteristics of a model. The registry supports both
scalars and more complex objects. Use the model’s set_metric
method to set metrics. The following examples
illustrate the use of a scalar, a dictionary, and a two-dimensional NumPy array as metrics.
The test accuracy metric might be generated using sklearn’s accuracy_score
:
from sklearn import metrics
test_accuracy = metrics.accuracy_score(test_labels, prediction)
The confusion matrix can be generated similarly using sklearn:
test_confusion_matrix = metrics.confusion_matrix(test_labels, prediction)
Then we can set these values as metrics as follows.
# scalar metric
model.set_metric("test_accuracy", test_accuracy)
# hierarchical (dictionary) metric
model.set_metric("dataset_test", {"accuracy": test_accuracy})
# multivalent (matrix) metric
model.set_metric("confusion_matrix", test_confusion_matrix)
To view a model’s metrics, use get_metrics
.
print(model.get_metrics())
Model references support the following methods for working with metrics.
Method |
Arguments |
Description |
---|---|---|
|
none |
Retrieves all of the model’s metrics and their values as a Python dictionary. |
|
|
Retrieves the value of the specified metric. |
|
|
Returns |
|
|
Removes the specified metric from the model. |
|
metric_name ,metric_value |
Sets the specified metric’s value, creating the metric if necessary. |
Deploying and Using Models from the Registry¶
Models can be deployed to a Snowflake warehouse or in a Snowpark Container Services (SPCS) compute pool.
It is also possible to use a model locally.
Using a Model in a Snowflake Warehouse¶
To deploy a model to a warehouse, use the model’s deploy
method. The registry generates a user-defined function
(UDF), using a name you provide, that invokes the model’s predict
method.
model.deploy(deployment_name="my_warehouse_predict",
target_method="predict", # the name of the model's method, usually predict
permanent=True)
If you do not specify the permanent
argument as shown above, the deployment is temporary and will be removed
automatically when your session ends (for example, when you close your Jupyter notebook, or when your Python script
exits).
After the model has been deployed, you can use it by calling the model’s predict
method with the deployment
name you specified when you deployed it.
result_dataframe = model.predict("my_warehouse_predict", test_dataframe)
Using a Model in an SPCS Compute Pool¶
Before you can deploy your model to a Snowpark Container Services (SPCS) compute pool, you must have a Docker client installed.
To deploy a model to a SPCS compute pool, use code like the following. Note that you must specify a compute pool name, and the specified compute pool must already exist.
from snowflake.ml.model import deploy_platforms
model.deploy(deployment_name="my_spcs_predict",
platform=deploy_platforms.TargetPlatform.SNOWPARK_CONTAINER_SERVICES,
target_method="predict",
options={
"compute_pool": "my_pool",
})
Tip
You can optionally set the min_instances
and max_instances
keys in the options
argument to control
how many instances of the SCPS service can be run simultaneously. These options both default to 1.
After the model has been deployed, you can use it by calling the model’s predict
method with the deployment
name you specified when you deployed it.
result_dataframe = model.predict("my_spcs_predict", test_dataframe)
Deployment Arguments and Options¶
A complete list of available arguments for the deploy
method is shown here.
Argument |
Description |
|
---|---|---|
Required |
|
The model’s name, used with |
|
String specifying the model’s version, used with |
|
|
The name of the generated user-defined function. |
|
Optional |
|
Dictionary containing options for model deployment. These are listed in the following tables. |
|
If |
Options for all deployments
Option |
Description |
|
---|---|---|
Optional |
|
If |
|
If |
Options for warehouse deployments
Option |
Description |
|
---|---|---|
Optional |
|
Stage location where the UDF should be stored when the |
|
If |
|
|
If |
Options for SPCS compute pool deployments
Option |
Description |
|
---|---|---|
Required |
|
Compute pool name. |
Optional |
|
The name of the endpoint that the service function will communicate with.
This option is useful when the service has multiple endpoints. Defaults to |
|
SnowService image repository path in format |
|
|
Maximum number of service replicas. Default is 1 |
|
|
Minimum number of service replicas. Default is 1. |
|
|
Number of workers used for model inference. Make sure the number of workers multiplied by the size of the model does not exceed available memory. Default is twice the number of CPU cores plus 1. |
|
|
Specifies a previously-built Docker image to be used as-is; no image is built. This option is for users who often use a
single image for multiple use cases. For this situation, you can have the initial deployment build an image (the name
of the image is logged to the console), then re-use that image for additional deployments by passing its name here.
Default: |
|
|
When |
Managing Deployments¶
Model references support the following methods for managing deployments.
Method |
Arguments |
Description |
---|---|---|
|
|
Deletes the specified deployment. |
|
|
Gets information about the specified deployment. The result is a Snowpark DataFrame containing attributes of the deployment. |
|
none |
Gets a list of the names of all deployments of the model. |
For example, to delete a deployment named my_deployment
, use:
model.delete_deployment(deployment_name="my_deployment")
Using a Model Locally¶
Tip
It is convenient to use a model locally when you first develop and test it. As the model grows, however, it will eventually exceed the capacity of your local system. At this point, deploy it to a warehouse or SPCS compute pool and run it there instead.
To use a model locally, you first deserialize (“unpickle”) the model from the registry in your own Python code using the
model’s load_model
method. You can then call the model’s predict
method with your test data.
For the deserialization operation to be successful, the target Python environment must be very similar to the one originally used to add the model to the registry. Ideally, the environment should include not only the same version of Python, but of every dependency used by the model. Other versions of some dependencies (especially later point releases) may also be compatible.
Tip
Use a virtual environment and a requirements.txt
file to manage your Python dependencies so you can easily
recreate the original environment.
The following is an example of deserializing and using a model from the registry in Python.
clf = model.load_model()
results = clf.predict(test_features) # Snowpark DataFrame
Deleting Models¶
You can delete models using the registry’s delete_model
method. By default, the model itself is deleted from the
stage, not just its entry in the registry; pass delete_artifact = False
to keep the model.
# delete from registry and also delete the model itself
registry.delete_model(model_name="my_model", model_version="100")
# delete from registry but keep the model
registry.delete_model(model_name="my_model", model_version="100", delete_artifact=False)
Existing references to a model are no longer valid after you delete the model from the registry.
Auditing Model and Registry Changes¶
The registry maintains a history of changes mode to both the registry itself and to the models in it. To see this
information, use registry.get_history
or model.get_model_history
. Both methods return a Snowpark
DataFrame, which can be sorted and filtered as desired.
# print complete registry history
registry.get_history().show()
# print history of metadata changes to a model
model.get_model_history().show()