Batch inference jobs¶
Note
Preview Feature — Public
Supported in public preview since snowflake-ml-python versions 1.26.0.
Use Snowflake Batch Inference to enable efficient, large-scale model inference on static or periodically updated datasets. The Batch Inference API uses Snowpark Container Services (SPCS) to provide a distributed compute layer optimized for massive throughput and cost-efficiency.
When to use batch inference¶
Use the run_batch method for workloads to:
Process images, audio, or video files or using multimodal models with unstructured data
Execute inference over millions or billions of rows.
Run inference as a discrete, asynchronous stage in a pipeline.
Integrate inference as a step within an Airflow DAG or Snowflake Task.
Limitations¶
For the multi-modal use cases, encryption is only supported on the server side
Partitioned models aren’t supported
Get started¶
Connect to Model Registry¶
Connect to the Snowflake Model Registry and retrieve the model reference as:
Execute batch job¶
This API uses Snowpark Container Services (SPCS) job to launch the inference workload. After running inference, the compute automatically winds down to prevent you from incurring additional charges. On a high level, this API looks like the following:
Job management¶
You can get a list of jobs, cancel a job, get a job’s handle, or delete a job using the methods below:
Note
The result function in the ML Job APIs is not supported for batch inference jobs.
Specify inference data¶
You can use structured data or unstructured data for batch inference. To use structured data for your workflow, you can either provide a SQL query or a dataframe to the run_batch method.
For unstructured data, you can reference your files from a Snowflake stage. To reference your files, create a dataframe with the file paths.
You provide your dataframe to the run_batch method. run_batch provides the content of the files to the model.
Structured input¶
The following are examples illustrating the range of input possibilities:
Unstructured input (multi-modal)¶
For unstructured data, the run_batch method can read the files from the fully qualified stage paths provided in the input dataframe. The following example shows you how to specify unstructured input data:
To automatically list all files in a stage as dataframe, use code like the following:
Expressing type of data¶
Run_batch automatically converts your files to the model compatible formats.
Your model can accept data in one of the following formats:
RAW_BYTES
BASE64
For example, if you have images stored in PNG format in your stage and your model accepts RAW_BYTES, you can use the input_spec argument to specify how Snowflake converts your data.
The following example code converts files in your stage to RAW_BYTES:
The column_handling argument tells the framework that the path column of X contains a full stage path, and calls the model with raw bytes from that file.
Output (output_spec)¶
Specify a stage directory to store the file output, as shown here:
Snowflake currently supports models that output text and stores them as parquet files. You can convert the parquet files to a Snowpark data frame as follows:
Passing parameters¶
If the model’s signature includes parameters defined with
ParamSpec, you can pass parameter values at
inference time using the params argument in InputSpec. Any parameter not included in the dictionary uses its
default value from the signature.
Job specification¶
To configure job-level settings for your batch inference workload (such as the number of workers, resource allocation, and execution parameters,
pass a JobSpec instance as the job_spec arument of the run_batch method. An example is shown below:
Best practices¶
Using a sentinel file¶
A job can fail midway for various reasons. The output directory can therefore end up having partial data. To mark completion of the job, run_batch writes a completion file _SUCCESS in the output directory.
To avoid having partial or incorrect output:
Read output data only after the sentinel file is found.
Provide an empty directory to begin with.
Run run_batch with mode = SaveMode.ERROR.
Examples¶
Using a custom model¶
Using Hugging Face Model¶
Using Hugging Face Model with vLLM¶
Task: text generation¶
Task: image text to text¶
Sample notebooks¶
For end-to-end runnable examples, see the batch inference sample notebooks on GitHub.