Use Gateways to route inference requests to multiple endpoints

Gateways route inference requests to one or more SPCS endpoints. With Gateways, you can do the following:

Traffic split among services

Allowing multiple services to share the same hostname. Routing is done based on the percentage given for each service. This is useful for blue-green deployments and A/B testing.

Stable URL

Each gateway has a hostname allocated at creation. The hostname does not change for the lifetime of the gateway object. The gateway object can be altered to route to different endpoints or have different percentage configurations. Changes take effect within a minute.

Gateway routing respects the relative percentage of the specified healthy endpoints. For more information about a gateway’s failover behavior, see Gateway failover behavior.

After you’ve reviewed the following sections, you can create and alter a gateway. For information about creating a gateway, see CREATE GATEWAY. For information about altering a gateway, see ALTER GATEWAY.

Access control requirements

The owner role of the gateway must have the following privileges:

Privilege

Object

Notes

CREATE GATEWAY

Schema

Required to create a gateway.

BIND SERVICE ENDPOINT

Account

Required to bind service endpoints to the gateway.

USAGE

Database

Required to access the database containing the gateway.

USAGE

Schema

Required to access the schema containing the gateway.

USAGE

Target endpoints

Required to route traffic to the target endpoints.

MODIFY or OWNERSHIP

Gateway

Required to alter the gateway configuration.

USAGE, MODIFY, or OWNERSHIP

Gateway

Required to view the gateway specification.

Note

When listing gateways, Snowflake only shows gateways that the role has USAGE, MODIFY, or OWNERSHIP privileges on. The role used must also have USAGE privileges on the database and schema containing the gateway.

For gateway CREATE, ALTER, and DROP operations, see CREATE GATEWAY, ALTER GATEWAY, and DROP GATEWAY.

Configurations

By default, you get a maximum of 5 endpoints per gateway. For additional endpoints, contact support to split traffic into more endpoints.

Gateway failover behavior

Gateway failover is the process where a gateway automatically redirects traffic from one endpoint (Endpoint A) to other endpoints when Endpoint A becomes unavailable or non-operational.

Note

Snowflake does not fail over onto an endpoint with 0% traffic split. The endpoint must have at least 1% traffic split.

The relative percentage of the available endpoints is respected.

Failover from one endpoint (Endpoint A) to other endpoints with at least 1% traffic split happens if any of the following conditions is true:

  • The service of Endpoint A is suspended and auto_resume is set to false.

  • The compute pool of Endpoint A is suspended.

  • The service of Endpoint A fails the readiness probe. This is updated once every 40 seconds (cache refresh rate) at the longest. At the time of the update, traffic is immediately adjusted with no ramp up period.

  • The service of Endpoint A is dropped.

  • The gateway owner role loses privilege (USAGE or OWNERSHIP) on Endpoint A.