Snowflake Native Batch Inference (SQL)¶
Use the Snowflake Model Registry to execute batch inference calls to your models. You can integrate these batch inference calls into your Snowflake workflows. With Snowflake Native Batch Inference, you can do the following:
Integrate into your Snowflake SQL, Snowpark Python, Streaming, and Dynamic Tables workflows.
Use third party tools such as dbt with the results of your inference calls
Where to run a model?¶
Selecting a Runtime Environment¶
You can either host a model on a Virtual Warehouse or a SPCS compute pool. Use the following information to determine where you’re hosting your model.
Runtime |
Best For… |
Avoid When… |
|---|---|---|
Virtual Warehouse |
• In-Database Batch Inference: Executing models as native SQL functions.
• Zero-Ops Experience: Leveraging existing warehouses without managing compute pools.
• Small Models: CPU-runnable models (e.g., scikit-learn, XGBoost).
|
• Hardware Constraints: The model requires a GPU for execution.
• Memory Limits: Model size exceeds 15GB. (this limit is lower for smaller warehouse sizes).
|
Snowpark Container Services (SPCS) |
• Large Models: Optimized for LLMs, deep learning models requiring high memory, or models requiring GPUs.
• Custom Environments: For specific pip packages or a custom OS-level environment not found in standard warehouses.
|
• Organizational Policy: SPCS is not yet approved or enabled in your account.
• Sufficient warehouse compute: If your warehouse can process your batch inference requests, you don’t need the additional compute from SPCS.
|
To run model on warehouse¶
To host your model on a warehouse, specify WAREHOUSE in the target_platforms argument to log your model. For more information, see working with dependencies and target platforms.
For information about whether an existing model is runnable in a warehouse, run SHOW VERSIONS IN MODEL. If the runnable_in column has WAREHOUSE as a value, you can run it.
To run model on SPCS¶
To use your model in SPCS, you must deploy the model as a service. For more information about deploying a model to SPCS, see deploying the model for online inference. Make sure auto-suspend is active.
Note
If your service is suspended, it automatically resumes when there’s an inference request. However, if service fails to resume within a specified timeframe, the query might fail. Queries might fail to resume if there are a lack of available nodes in the compute pool. To mitigate this risk, you can explicitly resume the service and wait for its availability.
Use an XSMALL or SMALL warehouse to route your inference requests to the SPCS compute pool. A warehouse can run multiple threads per node and send a large number of inference requests with each request. Consequently, the service operating within SPCS can be easily overwhelmed. Therefore, the recommendation is to utilize an XSMALL or SMALL warehouse when the model is deployed to SPCS.
Inference from Python¶
If you’re using the Snowflake Python API to make inference requests, you must have the snowflake-ml-python package.
Connect to Model Registry¶
Retrieve the model that you’re using for inference requests from the model registry. Use the following code to retrieve the model:
from snowflake.ml.registry import Registry
registry = Registry(session=session, database_name=DATABASE, schema_name=REGISTRY_SCHEMA)
mv = registry.get_model('my_model').version('my_version') # returns ModelVersion
Run batch inference job¶
Use the run method of your model version object to run a batch inference job. Using the run method, you can:
Run an inference job on either a warehouse or an SPCS compute pool.
Provide a Snowpark or pandas dataframe with the inference data.
The run method returns a dataframe that matches the type of the dataframe that you’ve specified. For example, if you specify a pandas dataframe as the input, you get a pandas dataframe as the output.
Note
Snowpark DataFrames undergo lazy evaluation. The execution only happens upon the DataFrame’s collect, show, or to_pandas method.
The following example runs a batch inference job on a warehouse:
# Call inference for predict function
# mv: snowflake.ml.model.ModelVersion
remote_prediction = mv.run(input_features, function_name="predict")
remote_prediction.show() # assuming test_features is Snowpark DataFrame
To see the methods that you can call from a model, run mv.show_functions. The return value of this method is a list of ModelFunctionInfo objects. Each of these objects includes the following attributes:
name: The name of the function that can be called from Python or SQL.
target_method: The name of the Python method in the original logged model.
# Get signature of the inference function in Python
# mv: snowflake.ml.model.ModelVersion
mv.show_functions()
Within the run method, specify your own input features and set the function_name to “predict”.
The following example runs a batch inference job on SPCS.
# Call inference for predict function
# mv: snowflake.ml.model.ModelVersion
remote_prediction = mv.run(test_features, function_name="predict", service_name="example_spcs_service")
remote_prediction.show() # assuming test_features is Snowpark DataFrame
Within the run method, specify your own input features, set the function_name to “predict”, and set service_name to the name of your SPCS service.
remote_prediction.show() shows the output dataframe.
The method is executed in a Snowflake warehouse specified in the session you’re using to connect to the registry. See specifying a warehouse.
Inference from SQL¶
Use the following command to understand the functions available and the signature for a model version:
SHOW FUNCTION IN MODEL mymodel VERSION myversion;
To run model on warehouse¶
Use the MODEL(model_name)!method_name(…) syntax to call or invoke methods of a model. The methods available on a model are determined by the underlying Python model class. For example, many types of models use a method named predict for inference.
To call a method of the default model, use the following syntax. Include any method arguments within the parentheses and specify the table containing the inference data in the FROM clause.
SELECT MODEL(<model_name>)!<method_name>(...) FROM <table_name>;
To invoke a method from a specific version of a model, create an alias to the specific version of the model and call the method through the alias.
Use the following syntax to call a method from a specific version of a model.
SELECT MODEL(<model_name>,<version_or_alias_name>)!<method_name>(...) FROM <table_name>;
The following example uses the LAST alias to call the latest version of a model.
SELECT MODEL(my_model,LAST)!predict(...) FROM my_table;
To run model on SPCS¶
Unlike running in a warehouse, functions can be called from a service by calling service_name!method_name(…).
SELECT <mservice_name>!<method_name>(...) FROM <table_name>;
Continuous Model Inference with Dynamic Tables¶
Snowflake’s dynamic tables establish a continuous transformation layer atop streaming data. By defining a dynamic table that applies the predictions of a machine learning model to incoming data, one can sustain an automated, continuously operating model inference pipeline on the data without the requirement for manual orchestration.
Consider, for instance, a stream of login events populating a table (LOGINS_RAW), which includes columns such as USER_ID, LOCATION, and a timestamp. This table is subsequently updated with the model’s predictions regarding the login risk for newly-arrived events. Crucially, only new rows are processed with the model’s predictions.
SQL¶
Dynamic Tables offer a robust capability for Snowflake users to perform incremental inference on incoming data. Use SQL to define a dynamic table that references the model and applies it to new incoming rows in LOGINS_RAW:
CREATE OR REPLACE DYNAMIC TABLE logins_with_predictions
WAREHOUSE = my_wh
TARGET_LAG = '20 minutes'
REFRESH_MODE = INCREMENTAL
INITIALIZE = on_create
COMMENT = 'Dynamic table with continuously updated model predictions'
AS
SELECT
login_id,
user_id,
location,
event_time,
MODEL(ml.registry.mymodel)!predict(l.user_id, l.location) AS prediction_result
FROM logins_raw;
Snowpark Python¶
The Snowpark Python API allows you to access the model registry programmatically and run inference directly on DataFrames. This approach can be more flexible and maintainable, especially in code-driven environments.
from snowflake.snowpark import Session
from snowflake.snowpark.functions import col
from snowflake.ml.registry import Registry
# Initialize the registry
reg = Registry(session=sp_session, database_name="ML", schema_name="REGISTRY")
# Retrieve the default model version from the registry
model = reg.get_model("MYMODEL")
# Load the source data
df_raw = sp_session.table("LOGINS_RAW")
# Run inference on the necessary features
predictions_df = model.run(df_raw.select("USER_ID", "LOCATION"))
# Join predictions back to the source data
joined_df = df_raw.join(predictions_df, on=["USER_ID", "LOCATION"])
# Create or replace a dynamic table from the joined DataFrame
joined_df.create_or_replace_dynamic_table(
name="LOGINS_WITH_PREDICTIONS",
warehouse="MY_WH",
lag='20 minutes',
refresh_mode='INCREMENTAL',
initialize="ON_CREATE",
comment="Dynamic table continuously updated with model predictions"
)
The code sample above will run inference using MYMODEL on new data in LOGINS_RAW every 20 minutes automatically.
Immutable vs volatile¶
This incrementality is essential and necessitates that all functions invoked within the Dynamic Table definition be designated as IMMUTABLE. While functions within a standard model are typically IMMUTABLE, Custom models default to VOLATILE. If the underlying model is known to be immutable, it is crucial to ensure that the corresponding model function is explicitly marked as immutable when the model is logged.