Snowflake ML: End-to-End Machine Learning¶
Snowflake ML is an integrated set of capabilities for end-to-end machine learning in a single platform on top of your governed data.
For out-of-the-box ML workflows in SQL, the ready-to-use ML Functions can help shorten development time and democratize ML across your organization. These functions let you train models for business use cases such as forecasting and anomaly detection without writing any code.
For custom ML workflows in Python, data scientists and ML engineers can easily and securely develop and productionize
scalable features and models without any data movement, silos, or governance tradeoffs. The Snowpark ML Python library
(the snowflake-ml-python
package) provides APIs for developing and deploying your Snowflake ML pipelines.
To build and operationalize models, data scientists and ML engineers can leverage a suite of Snowflake ML features. For model development, Snowpark ML Modeling APIs offer scalable feature engineering and model training with distributed processing using CPUs or GPUs. For ML Operations (ML Ops), Snowflake ML includes the Feature Store and Model Registry for centralized management of features and models in production.
You can use the Python APIs in the Snowpark ML library in Snowflake Notebooks, Snowsight worksheets. or your local Python IDE of choice.
Snowflake ML components help to streamline the ML lifecycle, as shown here.
Snowflake Model Registry¶
The Snowflake Model Registry allows secure deployment and management of models in Snowflake, supporting models trained both inside and outside of Snowflake.
Snowflake Feature Store¶
The Snowflake Feature Store is an integrated solution for defining, managing, storing and discovering ML features derived from your data. The Snowflake Feature Store supports automated, incremental refresh from batch and streaming data sources, so that feature pipelines need be defined only once to be continuously updated with new data.
Snowflake Datasets¶
Snowflake Datasets provide an immutable, versioned snapshot of your data suitable for ingestion by your machine learning models.
Snowflake Notebooks¶
Snowflake Notebooks provide a familiar experience, similar to Jupyter notebooks, for working with Python inside Snowflake. They aren’t strictly part of Snowflake ML, but they’re ideal for building custom ML workflows and models using tools you already know how to use.
Snowpark ML¶
Snowpark ML (the snowflake-ml-python
Python package) is the component of Snowflake ML that
provides Python APIs for the various Snowflake ML workflow components, including the Snowflake Feature Store, the
Snowflake Model Registry, and Dataset versioned data objects. It also includes APIs, based on popular Python ML
libraries such as scikit-learn, for building and training your own models at scale completely inside the Snowflake
cloud. You can use Snowpark ML in your local Python development environment, in Snowsight worksheets, or in Snowflake
Notebooks.
Tip
See Introduction to Machine Learning with Snowpark ML for an example of an end-to-end workflow in Snowpark ML.
Snowpark ML Modeling¶
The Snowpark ML library includes the Snowpark ML Modeling APIs, which support data preprocessing, feature engineering, and model training in Snowflake using popular machine learning frameworks, such as scikit-learn, xgboost, lightgbm, and pytorch. All processing is performed in a Snowflake virtual warehouse directly from data stored in Snowflake, with no infrastructure configuration required.
Additional Resources¶
See the following resources for information about Snowflake ML.
End-to-End ML Workflows
Contact your Snowflake representative for early access to documentation on other features currently under development.