Cross-region inference

Inference is the process of using a machine learning model to get an output based on a user input. For example, when you call the SNOWFLAKE.CORTEX.COMPLETE function, you are requesting an inference from the LLM with your prompt as the input. In Snowflake, you can configure your account to allow cross-region inference processing with the CORTEX_ENABLED_CROSS_REGION parameter. This parameter enables inference requests to be processed in a different region from the default region. The cross-region inference parameter is used to determine the inference behavior for any Snowflake feature supported by cross-region inference, including Cortex LLM Functions.

When enabled, cross-region inference occurs if the LLM or feature is not supported in your default region.

By default, the parameter is set to DISABLED. This allows requests to be processed only in the default region. You can specify the regions you want to allow cross-region inference to using the ALTER ACCOUNT command.

For details on this parameter, see CORTEX_ENABLED_CROSS_REGION.

Access control requirements

This parameter can only be set at the account level, not at the user or session levels. Only the ACCOUNTADMIN role can set the parameter using the ALTER ACCOUNT command:

ALTER ACCOUNT SET CORTEX_ENABLED_CROSS_REGION = 'AWS_US';
Copy

This parameter cannot be set by the ORGADMIN role.

How to use the cross-region inference parameter

By default, this parameter is set to DISABLED, which means the inference requests are only processed in the default region. The following examples show how to set the cross-region parameter for various use cases.

Any region

To allow any of the Snowflake regions that support cross-region inference requests to process your requests, set the parameter to 'ANY_REGION'.

ALTER ACCOUNT SET CORTEX_ENABLED_CROSS_REGION = 'ANY_REGION';
Copy

Default region only

To process inference requests only in the default region, set this parameter to 'DISABLED'.

ALTER ACCOUNT SET CORTEX_ENABLED_CROSS_REGION = 'DISABLED';
Copy

Specify regions

To allow only specified regions to process your requests, set this parameter to the regions separated by commas. For a full list of regions, see CORTEX_ENABLED_CROSS_REGION.

The following example specifies AWS_US and AWS_EU regions to process your inference requests:

ALTER ACCOUNT SET CORTEX_ENABLED_CROSS_REGION = 'AWS_US,AWS_EU';
Copy

Cost considerations

  • You are charged credits for the use of LLM as listed in the consumption table. Credits are considered consumed in the requesting region. For example, if you call an LLM Function from the us-east-2 region and the request is processed in the us-west-2 region, the credits are considered consumed in the us-east-2 region.

  • You do not incur data egress charges for using cross-region inference.

Considerations

  • Latency between regions depends on the cloud provider infrastructure and network status. Snowflake recommends that you test your specific use-case with cross-region inference enabled.

  • Cross-region inference is not supported in U.S. SnowGov regions. This means you cannot make cross-region inference requests into or out of the SnowGov regions.

  • You can use this setting from GCP or Azure regions to make inference requests for features that are not supported in those regions.

  • User inputs, service generated prompts, and outputs are not stored or cached during cross-region inference.

  • The data required for the inference request traverses between regions as follows:

    • If both the source and destination regions are in AWS, the data stays within the AWS global network. All data flowing across the AWS global network that interconnects the data centers and regions is automatically encrypted at the physical layer.

    • If the regions are on different cloud providers, then the data traverses the public internet using Mutual Transport Layer Security (mTLS).

Next steps