Snowpark ML: Machine Learning Toolkit for Snowflake

Snowpark ML is a set of tools, including SDKs and underlying infrastructure, for building and deploying machine learning models. With Snowpark ML, you can pre-process data and train, manage, and deploy ML models all within Snowflake. You benefit from Snowflake’s proven performance, scalability, stability, and governance at every stage of the machine learning workflow.

Snowpark ML works with Snowpark Python, so you can use Snowpark DataFrames to hold your training or test data and to receive your prediction results.

You can use Snowpark ML when writing Snowpark Python client applications in any compatible IDE.

Key Components of Snowpark ML

Snowpark ML provides APIs to support each stage of an end-to-end machine learning development and deployment process and includes two key components: Snowpark ML Development and Snowpark ML Ops.

Snowpark ML Development

Snowpark ML Development includes a collection of Python APIs that you can use to develop models efficiently inside Snowflake.

  • The modeling package (snowflake.ml.modeling) provides APIs for data preprocessing, feature engineering, and model training. The package also includes a preprocessing module with APIs that use compute resources provided by a Snowpark-optimized Warehouses to provide scalable data transformations. These APIs are based on familiar ML libraries, including scikit-learn, xgboost, and lightgbm.

  • An upcoming set of framework connectors provide optimized, secure, and performant data provisioning for Pytorch and Tensorflow frameworks in their native data loader formats. For early access to documentation, contact your Snowflake representative.

Snowpark ML Ops

Snowpark ML Ops complements the Snowpark ML Development API by providing model management capabilities and integrated deployment into Snowflake.

  • The FileSet API provides a Python fsspec-compliant API for materializing data into a Snowflake internal stage from a query or Snowpark DataFrame and several convenient methods for working with the data and feeding it to PyTorch or TensorFlow.

  • The model registry is a Python API for managing models within Snowflake and deploying them into Snowflake warehouses as vectorized user-defined functions (UDFs). For early access to model registry documentation, contact your Snowflake representative.

Installing Snowpark ML

All Snowpark ML features are available in a single package, snowflake-ml-python.

You can install Snowpark ML from the Snowflake conda channel using the conda command or from the Python Package Index (PyPI) using pip. Conda is preferred.

Installing Snowpark ML from conda

If you have an existing conda environment, you can skip step 1.

  1. Create the conda environment where you will install Snowpark ML:

    conda create --name snowpark-ml
    
    Copy
  2. Activate the conda environment:

    conda activate snowpark-ml
    
    Copy
  3. Install Snowpark ML:

    conda install snowflake-ml-python
    
    Copy

Installing Snowpark ML from PyPI

You can install the Snowpark ML package from the Python Package Index (PyPI) by using the standard Python package manager, pip.

Warning

Do not use this installation procedure if you are using a conda environment.

  1. Change to your project directory and activate your Python virtual environment:

    cd ~/projects/ml
    source .venv/bin/activate
    
    Copy
  2. Install the Snowpark ML package:

    python -m pip install snowflake-ml-python
    
    Copy

Setting Up Snowpark Python

Snowpark Python is a dependency of Snowpark ML and is installed automatically when you install Snowpark ML. If Snowpark Python is not set up on your system, you might need to perform additional configuration steps. See Setting Up Your Development Environment for Snowpark Python for Snowpark Python setup instructions.

Authenticating to Snowflake

Some parts of Snowpark ML require that you authenticate with Snowflake. You can do this with either a Snowflake Connector for Python Connection object or a Snowpark Python Session. Both ways are equivalent; use whichever works best in your application.

Use the SnowflakeLoginOptions function in the snowflake.ml.utils.connection_params module to get the configuration settings to create the connection or session. The function can read the parameters for the connection from a named connection in your SnowSQL configuration file or from environment variables that you set. It returns a dictionary containing these parameters, which can be used to create a connection or a session. The following examples read the connection parameters from the named connection myaccount in the SnowSQL configuration file.

To create a Snowflake Connector for Python connection, pass the configuration information returned by connection_params to snowflake.connector.connect:

from snowflake import connector
from snowflake.ml.utils import connection_params

params = connection_params.SnowflakeLoginOptions("myaccount")
sf_connection = connector.connect(**params)
Copy

To create a Snowpark Python session, create a builder for the Session class, and pass the connection information to the builder’s configs method:

from snowflake.snowpark import Session
from snowflake.ml.utils import connection_params

params = connection_params.SnowflakeLoginOptions("myaccount")
sp_session = Session.builder.configs(params).create()
Copy

You can now pass the connection or the session to any Snowpark ML function that needs it.

Cost Considerations

When you train and use models in Snowflake, you run code in a virtual warehouse, which incurs compute costs. These costs vary depending on the type of model and the quantity of data used in training and prediction.

See Understanding Compute Cost for general information about Snowflake compute costs.

Further Reading

See the following resources for information about Snowpark ML Modeling and Snowpark ML Ops.

Modeling

Ops

Contact your Snowflake representative for early access to documentation on upcoming features.

API Reference

The Snowpark ML API reference includes documentation on all publicly-released functionality. You can also obtain detailed API documentation for any class by using Python’s help function in an interactive Python session. For example:

from snowflake.ml.modeling.preprocessing import OneHotEncoder

help(OneHotEncoder)
Copy