Snowpark ML: Machine Learning Toolkit for Snowflake

Snowpark ML is a set of tools, including SDKs and underlying infrastructure, for building and deploying machine learning models. With Snowpark ML, you can pre-process data and train, manage, and deploy ML models all within Snowflake. You benefit from Snowflake’s proven performance, scalability, stability, and governance at every stage of the machine learning workflow.

Snowpark ML works with Snowpark Python, so you can use Snowpark DataFrames to hold your training or test data and to receive your prediction results.

You can use Snowpark ML when writing Snowpark Python client applications in any compatible IDE.

Key Components of Snowpark ML

Snowpark ML provides APIs to support each stage of an end-to-end machine learning development and deployment process and includes two key components: Snowpark ML Development and Snowpark ML Ops.

Snowpark ML Development

Snowpark ML Development includes a collection of Python APIs that you can use to develop models efficiently inside Snowflake.

  • The modeling package ( provides APIs for data preprocessing, feature engineering, and model training. The package also includes a preprocessing module with APIs that use compute resources provided by a Snowpark-optimized Warehouses to provide scalable data transformations. These APIs are based on familiar ML libraries, including scikit-learn, xgboost, and lightgbm.

  • An upcoming set of framework connectors provide optimized, secure, and performant data provisioning for Pytorch and Tensorflow frameworks in their native data loader formats. For early access to documentation, contact your Snowflake representative.

Snowpark ML Ops

Snowpark ML Ops complements the Snowpark ML Development API by providing model management capabilities and integrated deployment into Snowflake.

  • The FileSet API provides a Python fsspec-compliant API for materializing data into a Snowflake internal stage from a query or Snowpark DataFrame and several convenient methods for working with the data and feeding it to PyTorch or TensorFlow.

  • The model registry is a Python API for managing models within Snowflake and deploying them into Snowflake warehouses as vectorized user-defined functions (UDFs). For early access to model registry documentation, contact your Snowflake representative.

Installing Snowpark ML

All Snowpark ML features are available in a single package, snowflake-ml-python.

You can install Snowpark ML from the Snowflake conda channel using the conda command or from the Python Package Index (PyPI) using pip. Conda is preferred.

Installing Snowpark ML from the Snowflake conda Channel

  1. Create the conda environment where you will install Snowpark ML. If you prefer to use an existing environment, skip this step.

    conda create --name snowpark-ml
  2. Activate the conda environment:

    conda activate snowpark-ml
  3. Install Snowpark ML from the Snowflake conda channel:

    conda install --override-channels --channel snowflake-ml-python


When working from Snowpark ML, install packages from the Snowflake repository whenever possible. This ensures that you receive packages that have been validated with Snowpark ML.

Installing Snowpark ML from PyPI

You can install the Snowpark ML package from the Python Package Index (PyPI) by using the standard Python package manager, pip.


Do not use this installation procedure if you are using a conda environment. Use the conda instructions instead.

  1. Change to your project directory and activate your Python virtual environment:

    cd ~/projects/ml
    source .venv/bin/activate
  2. Install the Snowpark ML package:

    python -m pip install snowflake-ml-python

Setting Up Snowpark Python

Snowpark Python is a dependency of Snowpark ML and is installed automatically when you install Snowpark ML. If Snowpark Python is not set up on your system, you might need to perform additional configuration steps. See Setting Up Your Development Environment for Snowpark Python for Snowpark Python setup instructions.

Authenticating to Snowflake

Some parts of Snowpark ML require that you authenticate with Snowflake. You can do this with either a Snowflake Connector for Python Connection object or a Snowpark Python Session. Both ways are equivalent; use whichever works best in your application.

Use the SnowflakeLoginOptions function in the module to get the configuration settings to create the connection or session. The function can read the parameters for the connection from a named connection in your SnowSQL configuration file or from environment variables that you set. It returns a dictionary containing these parameters, which can be used to create a connection or a session. The following examples read the connection parameters from the named connection myaccount in the SnowSQL configuration file.

To create a Snowflake Connector for Python connection, pass the configuration information returned by connection_params to snowflake.connector.connect:

from snowflake import connector
from import connection_params

params = connection_params.SnowflakeLoginOptions("myaccount")
sf_connection = connector.connect(**params)

To create a Snowpark Python session, create a builder for the Session class, and pass the connection information to the builder’s configs method:

from snowflake.snowpark import Session
from import connection_params

params = connection_params.SnowflakeLoginOptions("myaccount")
sp_session = Session.builder.configs(params).create()

You can now pass the connection or the session to any Snowpark ML function that needs it.

Cost Considerations

When you train and use models in Snowflake, you run code in a virtual warehouse, which incurs compute costs. These costs vary depending on the type of model and the quantity of data used in training and prediction.

See Understanding Compute Cost for general information about Snowflake compute costs.

Further Reading

See the following resources for information about Snowpark ML Modeling and Snowpark ML Ops.



Contact your Snowflake representative for early access to documentation on upcoming features.

API Reference

The Snowpark ML API reference includes documentation on all publicly-released functionality. You can also obtain detailed API documentation for any class by using Python’s help function in an interactive Python session. For example:

from import OneHotEncoder