Use Gateways to route inference requests to multiple endpoints¶
Gateways route inference requests to one or more SPCS endpoints. With Gateways, you can do the following:
- Traffic split among services
Allowing multiple services to share the same hostname. Routing is done based on the percentage given for each service. This is useful for blue-green deployments and A/B testing.
- Stable URL
Each gateway has a hostname allocated at creation. The hostname does not change for the lifetime of the gateway object. The gateway object can be altered to route to different endpoints or have different percentage configurations. Changes take effect within a minute.
Gateway routing respects the relative percentage of the specified healthy endpoints. For more information about a gateway’s failover behavior, see Gateway failover behavior.
After you’ve reviewed the following sections, you can create and alter a gateway. For information about creating a gateway, see CREATE GATEWAY. For information about altering a gateway, see ALTER GATEWAY.
Access control requirements¶
The owner role of the gateway must have the following privileges:
Privilege |
Object |
Notes |
|---|---|---|
CREATE GATEWAY |
Schema |
Required to create a gateway. |
BIND SERVICE ENDPOINT |
Account |
Required to bind service endpoints to the gateway. |
USAGE |
Database |
Required to access the database containing the gateway. |
USAGE |
Schema |
Required to access the schema containing the gateway. |
USAGE |
Target endpoints |
Required to route traffic to the target endpoints. |
MODIFY or OWNERSHIP |
Gateway |
Required to alter the gateway configuration. |
USAGE, MODIFY, or OWNERSHIP |
Gateway |
Required to view the gateway specification. |
Note
When listing gateways, Snowflake only shows gateways that the role has USAGE, MODIFY, or OWNERSHIP privileges on. The role used must also have USAGE privileges on the database and schema containing the gateway.
For gateway CREATE, ALTER, and DROP operations, see CREATE GATEWAY, ALTER GATEWAY, and DROP GATEWAY.
Configurations¶
By default, you get a maximum of 5 endpoints per gateway. For additional endpoints, contact support to split traffic into more endpoints.
Gateway failover behavior¶
Gateway failover is the process where a gateway automatically redirects traffic from one endpoint (Endpoint A) to other endpoints when Endpoint A becomes unavailable or non-operational.
Note
Snowflake does not fail over onto an endpoint with 0% traffic split. The endpoint must have at least 1% traffic split.
The relative percentage of the available endpoints is respected.
Failover from one endpoint (Endpoint A) to other endpoints with at least 1% traffic split happens if any of the following conditions is true:
The service of Endpoint A is suspended and
auto_resumeis set to false.The compute pool of Endpoint A is suspended.
The service of Endpoint A fails the readiness probe. This is updated once every 40 seconds (cache refresh rate) at the longest. At the time of the update, traffic is immediately adjusted with no ramp up period.
The service of Endpoint A is dropped.
The gateway owner role loses privilege (USAGE or OWNERSHIP) on Endpoint A.