Snowflake ML: End-to-End Machine Learning

Snowflake ML is an integrated set of capabilities for end-to-end machine learning in a single platform on top of your governed data. This is a unified environment for ML development and productionization that is optimized for large-scale distributed feature engineering, model training and inference on CPU and GPU compute without any manual tuning or configuration.

Snowflake ML Overview Diagram

Scaling end-to-end ML workflows in Snowflake is seamless. You can do the following:

  • Prepare data

  • Create and use features with the Snowflake Feature Store

  • Train models with CPUs or GPUs using any open-source package from Snowflake Notebooks on Container Runtime.

  • Deploy your model for inference at scale with the Snowflake Model Registry

  • Monitor your production models with ML Observability and Explainability.

  • Use ML Lineage provides to track the source data to features, datasets, and models throughout the ML pipeline.

Snowflake ML is also flexible and modular. You can deploy the models that you’ve developed in Snowflake outside of Snowflake and externally-trained models can easily be brought into Snowflake for inference.

Features for data scientists and ML engineers

Snowflake Notebooks on Container Runtime

Container Runtime for ML offer a Jupyter-like experience for working with Python in Snowflake. Built for large-scale ML development, they require no infrastructure management. Start building with preinstalled packages, such as PyTorch, XGBoost, or Scikit-learn. You can also install any package from open-source hubs like HuggingFace or PyPi. This runtime maximizes performance with optimized data loading and distributed model training using preconfigured CPU and GPU compute pools.

Snowflake Feature Store

The Snowflake Feature Store is an integrated solution for defining, managing, storing and discovering ML features derived from your data. The Snowflake Feature Store supports automated, incremental refresh from batch and streaming data sources, so that feature pipelines need be defined only once to be continuously updated with new data.

Snowflake Model Registry and Model Serving

The Snowflake Model Registry allows for the logging and management of all your ML models, regardless of whether they’re trained on Snowflake or other platforms. You can use the models from the model registry to run inference at scale. You can use Model Serving to deploy the models to Snowpark Container Service for inference.

ML Observability

ML Observability provides tools to monitor model performance metrics in Snowflake. You can track models in production, monitor performance and drift metrics, and set alerts for performance thresholds. Additionally, use the ML Explainability function to compute Shapley values for models in the Snowflake Model Registry, regardless of where they were trained.

ML Lineage

ML Lineage is a capability to trace end-to-end lineage of ML artifacts from source data to features, datasets, and models. This enables reproducibility, compliance, and debugging across the full lifecycle of ML assets.

Snowflake Datasets

Snowflake Datasets provide an immutable, versioned snapshot of your data suitable for ingestion by your machine learning models.

Features for business analysts

For business analysts, use ML Functions to shorten development time for common scenarios such as forecasting and anomaly detection across your organization with SQL.

Additional Resources

See the following resources to get started with Snowflake ML:

Contact your Snowflake representative for early access to documentation on other features currently under development.