Example workflows¶
This page provides example workflows for deploying machine learning models for real-time inference using Snowpark Container Services (SPCS). Each example demonstrates the complete lifecycle from model registration to deployment and inference.
This includes:
- How to create services, make predictions, and access models via HTTP endpoints.
- How to use different model architectures (XGBoost, Hugging Face transformers, PyTorch) and compute options (CPU and GPU).
Deploy an XGBoost model for CPU-powered inference¶
The following code:
- Deploys an XGBoost model for inference in SPCS
- Uses the deployed model for inference.
Calling via HTTP (External Application)¶
Since this model has ingress enabled (ingress_enabled=True), you can call its public HTTP endpoint. The following example uses a PAT stored in the environment variable PAT_TOKEN to authenticate with a public Snowflake endpoint:
Deploy a Hugging Face sentence transformer for GPU-powered inference¶
The following code trains and deploys a Hugging Face sentence transformer, including an HTTP endpoint.
This example requires the sentence-transformers package, a GPU compute pool and an image repository.
In SQL, you can call the service function as follows:
Similarly, you can call its HTTP endpoint as follows.
Deploy a PyTorch model for GPU-powered inference¶
For an example of training and deploying a PyTorch deep learning recommendation model (DLRM) to SPCS for GPU inference, see this quickstart
Deploy a Snowpark ML modeling model¶
Models developed using Snowpark ML modeling classes cannot be deployed to environments that have a GPU. As a workaround, you can extract the native model and deploy that. For example: