Snowpark ML: Machine Learning Toolkit for Snowflake¶
Snowpark ML is a set of tools, including SDKs and underlying infrastructure, for building and deploying machine learning models. With Snowpark ML, you can pre-process data and train, manage, and deploy ML models all within Snowflake. You benefit from Snowflake’s proven performance, scalability, stability, and governance at every stage of the machine learning workflow.
Snowpark ML works with Snowpark Python, so you can use Snowpark DataFrames to hold your training or test data and to receive your prediction results.
You can use Snowpark ML when writing Snowpark Python client applications in any compatible IDE.
Key Components of Snowpark ML¶
Snowpark ML provides APIs to support each stage of an end-to-end machine learning development and deployment process and includes two key components: Snowpark ML Development and Snowpark ML Ops.
Snowpark ML Development¶
Snowpark ML Development includes a collection of Python APIs that you can use to develop models efficiently inside Snowflake.
The modeling package (
snowflake.ml.modeling) provides APIs for data preprocessing, feature engineering, and model training. The package also includes a preprocessing module with APIs that use compute resources provided by a Snowpark-optimized Warehouses to provide scalable data transformations. These APIs are based on familiar ML libraries, including scikit-learn, xgboost, and lightgbm.
An upcoming set of framework connectors provide optimized, secure, and performant data provisioning for Pytorch and Tensorflow frameworks in their native data loader formats. For early access to documentation, contact your Snowflake representative.
Snowpark ML Ops¶
Snowpark ML Ops complements the Snowpark ML Development API by providing model management capabilities and integrated deployment into Snowflake.
The FileSet API provides a Python fsspec-compliant API for materializing data into a Snowflake internal stage from a query or Snowpark DataFrame and several convenient methods for working with the data and feeding it to PyTorch or TensorFlow.
The model registry is a Python API for managing models within Snowflake and deploying them into Snowflake warehouses as vectorized user-defined functions (UDFs). For early access to model registry documentation, contact your Snowflake representative.
Installing Snowpark ML¶
All Snowpark ML features are available in a single package,
You can install Snowpark ML from the Snowflake conda channel using the
conda command or from the Python Package Index
pip. Conda is preferred.
Installing Snowpark ML from conda¶
If you have an existing conda environment, you can skip step 1.
Create the conda environment where you will install Snowpark ML:
conda create --name snowpark-ml
Activate the conda environment:
conda activate snowpark-ml
Install Snowpark ML:
conda install snowflake-ml-python
Installing Snowpark ML from PyPI¶
You can install the Snowpark ML package from the Python Package Index (PyPI) by using the standard Python package manager,
Do not use this installation procedure if you are using a conda environment.
Change to your project directory and activate your Python virtual environment:
cd ~/projects/ml source .venv/bin/activate
Install the Snowpark ML package:
python -m pip install snowflake-ml-python
Setting Up Snowpark Python¶
Snowpark Python is a dependency of Snowpark ML and is installed automatically when you install Snowpark ML. If Snowpark Python is not set up on your system, you might need to perform additional configuration steps. See Setting Up Your Development Environment for Snowpark Python for Snowpark Python setup instructions.
Authenticating to Snowflake¶
Some parts of Snowpark ML require that you authenticate with Snowflake. You can do this with either a
Snowflake Connector for Python
Connection object or a Snowpark Python
Session. Both ways are equivalent; use whichever works best in your application.
SnowflakeLoginOptions function in the
snowflake.ml.utils.connection_params module to get the
configuration settings to create the connection or session. The function can read the parameters for the connection from
a named connection in your SnowSQL configuration file or from environment variables that you set. It returns a
dictionary containing these parameters, which can be used to create a connection or a session. The following examples
read the connection parameters from the named connection
myaccount in the SnowSQL configuration file.
To create a Snowflake Connector for Python connection, pass the configuration information returned by
from snowflake import connector from snowflake.ml.utils import connection_params params = connection_params.SnowflakeLoginOptions("myaccount") sf_connection = connector.connect(**params)
To create a Snowpark Python session, create a builder for the
Session class, and pass the connection information
to the builder’s
from snowflake.snowpark import Session from snowflake.ml.utils import connection_params params = connection_params.SnowflakeLoginOptions("myaccount") sp_session = Session.builder.configs(params).create()
You can now pass the connection or the session to any Snowpark ML function that needs it.
When you train and use models in Snowflake, you run code in a virtual warehouse, which incurs compute costs. These costs vary depending on the type of model and the quantity of data used in training and prediction.
See Understanding Compute Cost for general information about Snowflake compute costs.
See the following resources for information about Snowpark ML Modeling and Snowpark ML Ops.
The Examples folder in the Snowpark ML Google Drive contains Jupyter notebooks for exploring Snowpark ML features.
Contact your Snowflake representative for early access to documentation on upcoming features.
The Snowpark ML API reference includes documentation on
all publicly-released functionality. You can also obtain detailed API documentation for any class by using Python’s
help function in an interactive Python session. For example:
from snowflake.ml.modeling.preprocessing import OneHotEncoder help(OneHotEncoder)