Snowflake ML: End-to-End Machine Learning

Snowflake ML is an integrated set of capabilities for end-to-end machine learning in a single platform on top of your governed data.

For out-of-the-box ML workflows in SQL, the ready-to-use ML Functions can help shorten development time and democratize ML across your organization. These functions let you train models for business use cases such as forecasting and anomaly detection without writing any code.

For custom ML workflows in Python, data scientists and ML engineers can easily and securely develop and productionize scalable features and models without any data movement, silos, or governance tradeoffs. The Snowpark ML Python library (the snowflake-ml-python package) provides APIs for developing and deploying your Snowflake ML pipelines.

To build and operationalize models, data scientists and ML engineers can leverage a suite of Snowflake ML features. For model development, Snowpark ML Modeling APIs offer scalable feature engineering and model training with distributed processing using CPUs or GPUs. For ML Operations (ML Ops), Snowflake ML includes the Feature Store and Model Registry for centralized management of features and models in production.

You can use the Python APIs in the Snowpark ML library in Snowflake Notebooks, Snowsight worksheets. or your local Python IDE of choice.

Key components of Snowflake ML: ML Modeling, Feature Store, and Model Registry

Snowflake ML components help to streamline the ML lifecycle, as shown here.

The ML development and deployment process supported by Snowflake ML

Snowflake Model Registry

The Snowflake Model Registry allows secure deployment and management of models in Snowflake, supporting models trained both inside and outside of Snowflake.

Snowflake Feature Store

The Snowflake Feature Store is an integrated solution for defining, managing, storing and discovering ML features derived from your data. The Snowflake Feature Store supports automated, incremental refresh from batch and streaming data sources, so that feature pipelines need be defined only once to be continuously updated with new data.

Snowflake Datasets

Snowflake Datasets provide an immutable, versioned snapshot of your data suitable for ingestion by your machine learning models.

Snowflake Notebooks

Snowflake Notebooks provide a familiar experience, similar to Jupyter notebooks, for working with Python inside Snowflake. They aren’t strictly part of Snowflake ML, but they’re ideal for building custom ML workflows and models using tools you already know how to use.

Snowpark ML

Snowpark ML (the snowflake-ml-python Python package) is the component of Snowflake ML that provides Python APIs for the various Snowflake ML workflow components, including the Snowflake Feature Store, the Snowflake Model Registry, and Dataset versioned data objects. It also includes APIs, based on popular Python ML libraries such as scikit-learn, for building and training your own models at scale completely inside the Snowflake cloud. You can use Snowpark ML in your local Python development environment, in Snowsight worksheets, or in Snowflake Notebooks.

Tip

See Introduction to Machine Learning with Snowpark ML for an example of an end-to-end workflow in Snowpark ML.

Snowpark ML Modeling

The Snowpark ML library includes the Snowpark ML Modeling APIs, which support data preprocessing, feature engineering, and model training in Snowflake using popular machine learning frameworks, such as scikit-learn, xgboost, lightgbm, and pytorch. All processing is performed in a Snowflake virtual warehouse directly from data stored in Snowflake, with no infrastructure configuration required.

Additional Resources

See the following resources for information about Snowflake ML.

End-to-End ML Workflows

Contact your Snowflake representative for early access to documentation on other features currently under development.