Batch inference jobs¶
Note
Snowflake Batch Inference Jobs requires snowflake-ml-python version 1.39.0 or later.
This feature is not available in government regions.
Use Snowflake Batch Inference to enable efficient, large-scale model inference on static or periodically updated datasets. The Batch Inference API uses Snowpark Container Services (SPCS) to provide a distributed compute layer optimized for massive throughput and cost-efficiency.
When to use batch inference¶
Use the run_batch method for workloads to:
- Process images, audio, or video files or using multimodal models with unstructured data
- Execute inference over millions or billions of rows.
- Run inference as a discrete, asynchronous stage in a pipeline.
- Integrate inference as a step within an Airflow DAG or Snowflake Task.
Limitations¶
- For the multi-modal use cases, encryption is only supported on the server side
Get started¶
Connect to Model Registry¶
Connect to the Snowflake Model Registry and retrieve the model reference as:
Execute batch job¶
This API uses Snowpark Container Services (SPCS) job to launch the inference workload. After running inference, the compute automatically winds down to prevent you from incurring additional charges. On a high level, this API looks like the following:
Job management¶
You can get a list of jobs, cancel a job, get a job’s handle, or delete a job using the methods below:
Note
The result function in the ML Job APIs is not supported for batch inference jobs.
Specify inference data¶
You can use structured data or unstructured data for batch inference. To use structured data for your workflow, you can either provide a SQL query or a dataframe to the run_batch method.
For unstructured data, you can reference your files from a Snowflake stage. To reference your files, create a dataframe with the file paths.
You provide your dataframe to the run_batch method. run_batch provides the content of the files to the model.
Structured input¶
The following are examples illustrating the range of input possibilities:
Unstructured input (multi-modal)¶
For unstructured data, the run_batch method can read the files from the fully qualified stage paths provided in the input dataframe. The following example shows you how to specify unstructured input data:
To automatically list all files in a stage as dataframe, use code like the following:
Stage support¶
Supported configurations for input:
- Internal stages: all types of internal stages are supported.
- External stages: Amazon S3 only, must use server-side encryption. Azure Blob Storage and Google Cloud Storage are not supported.
Input rows can reference different stages in the same DataFrame, mixing external and internal paths. Each path is resolved independently at read time.
External stages require a one-time admin setup (S3 storage integration and IAM permissions on the bucket). For details, see CREATE STAGE and Bulk loading from Amazon S3. The role running the batch inference job must have USAGE on the external stage.
The output stage specified by OutputSpec(stage_location=...) must be an internal stage.
Expressing type of data¶
Run_batch automatically converts your files to the model compatible formats.
Your model can accept data in one of the following formats:
- RAW_BYTES
- BASE64
For example, if you have images stored in PNG format in your stage and your model accepts RAW_BYTES, you can use the input_spec argument to specify how Snowflake converts your data.
The following example code converts files in your stage to RAW_BYTES:
The column_handling argument tells the framework that the path column of X contains a full stage path, and calls the model with raw bytes from that file.
Output (output_spec)¶
Specify a stage directory to store the file output, as shown here:
Snowflake currently supports models that output text and stores them as parquet files. You can convert the parquet files to a Snowpark data frame as follows:
Passing parameters¶
If the model’s signature includes parameters defined with
ParamSpec, you can pass parameter values at
inference time using the params argument in InputSpec. Any parameter not included in the dictionary uses its
default value from the signature.
Partitioned models¶
Note
This feature requires snowflake-ml-python version 1.33.0 or later.
You can run batch inference jobs with partitioned models by passing the partition_column
argument in InputSpec. Each partition is processed independently, which is useful for
models that train or predict per group.
For more information about partitioned models, see Using partitioned models.
Job specification¶
To configure job-level settings for your batch inference workload (such as the number of workers, resource allocation, and execution parameters,
pass a JobSpec instance as the job_spec arument of the run_batch method. An example is shown below:
Best practices¶
Using a sentinel file¶
A job can fail midway for various reasons. The output directory can therefore end up having partial data. To mark completion of the job, run_batch writes a completion file _SUCCESS in the output directory.
To avoid having partial or incorrect output:
- Read output data only after the sentinel file is found.
- Provide an empty directory to begin with.
- Run run_batch with mode = SaveMode.ERROR.
Run batch inference in a Snowflake task DAG¶
Note
This feature requires the snowflake.core package.
Use BatchInferenceTask to run a batch inference job inside a Snowflake task DAG. This is useful for scheduled or recurring batch inference, multi-step DAGs that include batch inference, and downstream tasks that read the inference output.
Construct BatchInferenceTask inside a with DAG(...) block (or pass dag= explicitly) and chain it with other tasks using >>. Each scheduled run submits a new SPCS batch inference job and writes results to a per-run output subdirectory.
When you use base_stage_location in OutputSpec, each run writes to its own subdirectory under the base path (default form BATCH_INFERENCE_<UUID>/), so repeated runs don’t clobber each other.
A successor task can read the predecessor batch inference task’s output directory using SYSTEM$GET_PREDECESSOR_RETURN_VALUE():
For more information about Snowflake tasks, see Introduction to tasks.
Examples¶
Using a custom model¶
Using Hugging Face Model¶
Using Hugging Face Model with vLLM¶
Task: text generation¶
Task: image text to text¶
Sample notebooks¶
For end-to-end runnable examples, see the batch inference sample notebooks on GitHub.