Service Management & Scaling¶
Once a model is deployed to Snowpark Container Services (SPCS), you must manage its lifecycle, resource consumption, and reliability. This page covers standard operations, observability, and configuring high availability for production workloads.
Managing services¶
Snowpark Container Services offers a SQL interface for managing services. You can use the DESCRIBE SERVICE and ALTER SERVICE commands with SPCS services created by Snowflake Model Serving just as you would for managing any other SPCS service. For example, you can:
Change MIN_INSTANCES and other properties of a service
Drop (delete) a service
Share a service to another account
Change ownership of a service (the new owner must have READ access to the model)
Note
If the owner of a service loses access to the underlying model for any reason, the service stops working after a restart. It will continue running until it is restarted.
To ensure reproducibility and debugability, you cannot change the specification of an existing inference service. You can, however, copy the specification, customize it, and use the customized specification to create your own service to host the model. However, this method does not protect the underlying model from being deleted. Furthermore, it does not track lineage. It is best to allow Snowflake Model Serving to create services.
Scaling services¶
Note
Starting with snowflake-ml-python 1.25.0, you can define the scaling boundaries for your inference service by setting min_instances and max_instances within the create_service method.
How Autoscaling Works¶
The service initializes with the number of nodes specified in min_instances and dynamically scales within your defined range based on real-time traffic volume and hardware utilization.
Scale-to-Zero (Auto-Suspend): If min_instances is set to 0 (the default), the service will automatically suspend if no traffic is detected for 30 minutes.
Scaling Latency: Scaling triggers typically activate after one minute of meeting the required condition. Note that total spin-up time includes this trigger period plus the time required to provision and initialize new service instances.
Configuration Best Practices¶
Parameter |
Recommended Strategy |
|---|---|
min_instances |
Set to 1 or more for production workloads to ensure immediate availability and avoid cold-start delays. |
max_instances |
Set to accommodate peak demand while maintaining a ceiling on resource consumption and cost. |
Suspending services¶
The default min_instances=0 setting allows the service to auto-suspend after 30 minutes of inactivity. Incoming requests will trigger a resume, with the total delay determined by compute pool availability and the model’s loading time (startup delay).
To manually suspend or resume a service, use the ALTER SERVICE command.
ALTER SERVICE my_service [ SUSPEND | RESUME ];
Deleting models¶
You can manage models and model versions as usual with either the SQL interface or the Python API, with the restriction that a model or model version that is being used by a service (whether running or suspended) cannot be dropped (deleted). To drop a model or model version, drop the service first.
Monitoring services¶
When running models in Snowpark Container Services, you can monitor service health and troubleshoot issues by accessing container logs and metrics. Model serving services generate logs that can help you understand service behavior, diagnose errors, and optimize performance.
For comprehensive information about monitoring SPCS services, including accessing metrics and logs, see Monitoring services.
In Snowsight¶
You can monitor model serving services in Snowsight:
In the navigation menu, select Monitoring » Services & jobs.
On the Services tab, select your service to view the service details page.
The Overview tab displays service information including the compute pool, endpoints, and instance count.
The Logs, Metrics, and Events tabs provide logs, performance metrics, and service events (such as instance provisioning and shutdowns). Filter results by instance and container name as needed.
Accessing service logs¶
You can access logs for your model serving services using any of the following methods:
Using the service helper function¶
Model serving include a built-in helper function that retrieves logs from the event table for running or suspended services:
-- Retrieve logs using the service helper function
SELECT * FROM TABLE(mydb.myschema.my_model_service!SPCS_GET_LOGS())
WHERE
timestamp > dateadd(hour, -1, current_timestamp())
AND instance_id = 0 -- choose all instances or one particular
AND container_name = 'model-inference';
Querying the event table directly¶
If you have an event table configured for your account, you can query it directly to retrieve service logs:
-- Find the event table for your account
SHOW PARAMETERS LIKE 'event_table' IN ACCOUNT;
-- Query the event table for model service logs
SELECT TIMESTAMP, RESOURCE_ATTRIBUTES, RECORD_ATTRIBUTES, VALUE
FROM <current_event_table_for_your_account>
WHERE timestamp > dateadd(hour, -1, current_timestamp())
AND RESOURCE_ATTRIBUTES:"snow.service.name" = '<model_service_name>'
AND RECORD_TYPE = 'LOG'
AND RESOURCE_ATTRIBUTES:"snow.service.container.instance" = '0' -- choose all instances or one particular
AND RESOURCE_ATTRIBUTES:"snow.service.container.name" = 'model-inference'
ORDER BY timestamp DESC
LIMIT 10;
Using the system function (Running instances only)¶
For real-time debugging of active containers you can use the SYSTEM$GET_SERVICE_LOGS function:
-- Retrieve logs from a specific service instance
SELECT SYSTEM$GET_SERVICE_LOGS('model_service_name', '0', 'model-inference', 10);
Note
The container name for model inference services is model-inference. For troubleshooting image build issues, use model-build as the container name.
Accessing service metrics¶
Model serving services emit performance and health metrics that can help you monitor resource utilization, request rates, latency, and other operational characteristics. These metrics are captured in the event table and can be queried to analyze service performance over time.
For more information about SPCS service metrics, see Accessing event table service metrics.
Using the service helper function¶
Model serving services include a built-in helper function that retrieves metrics from the event table for running or suspended services:
-- Retrieve metrics using the service helper function
SELECT *
FROM TABLE(mydb.myschema.my_model_service!SPCS_GET_METRICS())
WHERE
timestamp > dateadd(hour, -1, current_timestamp())
AND instance_id = 0 -- choose all instances or one particular
AND container_name = 'model-inference';
Querying the event table directly¶
You can query the event table directly to retrieve and filter specific metrics:
-- Find the event table for your account
SHOW PARAMETERS LIKE 'event_table' IN ACCOUNT;
-- Query the event table for model service metrics
SELECT
timestamp,
RESOURCE_ATTRIBUTES:"snow.service.container.instance" as instance,
RESOURCE_ATTRIBUTES:"snow.service.container.name" as container,
RECORD:metric:"name" as metric,
value
FROM my_event_table_db.my_event_table_schema.my_event_table
WHERE timestamp > DATEADD(hour, -1, CURRENT_TIMESTAMP())
AND RESOURCE_ATTRIBUTES:"snow.service.name" = '<model_service_name>'
AND RECORD_TYPE = 'METRIC'
AND RESOURCE_ATTRIBUTES:"snow.service.container.instance" = '0' -- choose all instances or one particular
AND RESOURCE_ATTRIBUTES:"snow.service.container.name" = 'model-inference'
ORDER BY timestamp DESC
LIMIT 100;
Fault tolerance¶
In any distributed system, failures happen. For mission-critical workloads it is on users to configure the service to be resilient against node and zonal failures.
Node Failure Resilience¶
To tolerate standard node failures, Snowflake recommends over-provisioning by 50% or maintaining a minimum of 3 instances (whichever is higher).
Example: If you need 4 instances to support peak traffic, you should provision 6 instances
Zonal Failure Resilience¶
By default, Compute Pools are provisioned in a single Availability Zone (AZ) to maximize network performance and minimize costs. For mission-critical workloads that require resilience against a full zonal failure, you can use a special compute pool built for multi-zone in create_service() API.
To create a multi-zone compute pool, set the PLACEMENT_GROUP parameter to DISTRIBUTED.
CREATE COMPUTE POOL my_high_availability_pool
...
PLACEMENT_GROUP = 'DISTRIBUTED'; -- Enables Multi-AZ placement
Multi-Zone Compute Pools¶
This compute pool prioritizes service uptime over strict symmetry. The system attempts to maintain a healthy footprint across multiple zones, but it does not “hard block” if conditions aren’t perfect.
Node Distribution: Uses a Round Robin strategy to spread nodes across all available zones in a region. If there are insufficient capacity errors in one zone, nodes are provisioned in other zones with capacity, which may cause uneven distribution.
Pod Distribution: Leverages Kubernetes topology spread constraints with a ScheduleAnyway policy. The scheduler tries to spread pods, but will still schedule them even if an even distribution cannot be met (e.g., due to capacity issues).
Zonal Outage Behavior: If a zone fails, the system does not automatically failover nodes to healthy zones yet. You should over-provision capacity (N+1) to ensure the remaining zones can handle the traffic load during an outage. In an event of availability zone outage:
Scheduling: The scheduler immediately stops placing new pods in the impacted zone.
Traffic: Ingress automatically routes traffic to pods in the healthy zones.
Autoscaling: If the control plane remains functional, autoscaling may trigger to place new pods in the healthy zones.
Note
In smaller regions, certain instance types may not be available in multiple AZs, reducing the effectiveness of the compute pool against zonal failure.
Recovery & Rebalancing: Once a zone recovers, the system does not automatically move pods back to the recovered zone. It will naturally rebalance eventually as node failure or service maintenance occurs.
Configuration Guide¶
Convert an Existing Pool¶
Warning
You cannot change this setting on an active pool. You must suspend it first.
ALTER COMPUTE POOL my_pool SUSPEND;
ALTER COMPUTE POOL my_pool
SET PLACEMENT_GROUP = 'DISTRIBUTED';
ALTER COMPUTE POOL my_pool RESUME;
Revert an Existing Pool¶
Warning
You cannot change this setting on an active pool. You must suspend it first.
ALTER COMPUTE POOL my_pool SUSPEND;
ALTER COMPUTE POOL my_pool
UNSET PLACEMENT_GROUP;
ALTER COMPUTE POOL my_pool RESUME;
Verification¶
To confirm your pool is correctly configured for HA, check the placement_group column:
DESCRIBE COMPUTE POOL my_service_pool;