Example workflows¶
This page provides example workflows for deploying machine learning models for real-time inference using Snowpark Container Services (SPCS). Each example demonstrates the complete lifecycle from model registration to deployment and inference.
This includes:
How to create services, make predictions, and access models via HTTP endpoints.
How to use different model architectures (XGBoost, Hugging Face transformers, PyTorch) and compute options (CPU and GPU).
Deploy an XGBoost model for CPU-powered inference¶
The following code:
Deploys an XGBoost model for inference in SPCS
Uses the deployed model for inference.
Calling via HTTP (External Application)¶
Since this model has ingress enabled (ingress_enabled=True), you can call its public HTTP endpoint. The following example uses a PAT stored in the environment variable PAT_TOKEN to authenticate with a public Snowflake endpoint:
Deploy a Hugging Face sentence transformer for GPU-powered inference¶
The following code trains and deploys a Hugging Face sentence transformer, including an HTTP endpoint.
This example requires the sentence-transformers package, a GPU compute pool and an image repository.
In SQL, you can call the service function as follows:
Similarly, you can call its HTTP endpoint as follows.
Deploy a PyTorch model for GPU-powered inference¶
For an example of training and deploying a PyTorch deep learning recommendation model (DLRM) to SPCS for GPU inference, see this quickstart
Deploy a Snowpark ML modeling model¶
Models developed using Snowpark ML modeling classes cannot be deployed to environments that have a GPU. As a workaround, you can extract the native model and deploy that. For example: