Models and regional availability for Cortex AI Functions

Snowflake Cortex AI Functions support a range of language and embedding models with varying capabilities, context-window sizes, and regional availability. Use this reference to choose a model for your workload, check its input and output limits, confirm that it’s available in your region, and identify previous model versions that are still supported.

Choosing a model

The Snowflake Cortex AI_COMPLETE function supports multiple models of varying capability, latency, and cost. These models have been carefully chosen to align with common customer use cases. To achieve the best performance per credit, choose a model that’s a good match for the content size and complexity of your task. Here are brief overviews of the available models.

Large models

If you’re not sure where to start, try the most capable models first to establish a baseline to evaluate other models. claude-opus-4-7 and gemini-3.1-pro are the most capable models offered by Snowflake Cortex, and will give you a good idea what a state-of-the-art model can do.

  • claude-opus-4-7 is Anthropic’s flagship Claude Opus model, built for advanced reasoning, long-running agentic workflows, and complex coding tasks. With a 1,000,000-token context window and up to 128,000 output tokens, it can analyze large document collections and produce detailed responses in a single call.

Medium models

  • Claude 4-6 Sonnet is a leader in general reasoning and multimodal capabilities. It outperforms its predecessors in tasks that require reasoning across different domains and modalities. You can use its large output capacity to get more information from either structured or unstructured queries. Its reasoning capabilities and large context windows make it well-suited for agentic workflows.
  • llama3.1-70b is an open source model that demonstrates state-of-the-art performance ideal for chat applications, content creation, and enterprise applications. It is a highly performant, cost effective model that enables diverse use cases with a context window of 128K. llama3-70b is still supported and has a context window of 8K.
  • snowflake-llama3.3-70b is a model derived from the open source llama3.3 model. It uses the SwiftKV optimizations developed by the Snowflake AI research team to deliver up to a 75% inference cost reduction. SwiftKV achieves higher throughput performance with minimal accuracy loss.
  • mixtral-8x7b is ideal for text generation, classification, and question answering. Mistral models are optimized for low latency with low memory requirements, which translates into higher throughput for enterprise use cases.

Small models

  • claude-haiku-4-5 is Anthropic’s fast and cost-efficient Claude Haiku model, optimized for low-latency, high-throughput workloads. With a 200,000-token context window and up to 64,000 output tokens, it’s well-suited for simple summarization, classification, and question-answering tasks where speed and cost matter more than top-tier reasoning.
  • openai-gpt-5-mini is OpenAI’s small, fast variant of the GPT-5 family, designed for low-latency tasks where quick responses matter more than top-tier reasoning. It has a 272,000-token context window and up to 8,192 output tokens. To use it, your account must have cross-region inference enabled (cross-cloud or from Azure US).
  • llama3.1-8b is ideal for tasks that require low to moderate reasoning. It’s a light-weight, ultra-fast model with a context window of 128K. llama3-8b provides a smaller context window and relatively lower accuracy.
  • mistral-7b is ideal for your simplest summarization, structuration, and question answering tasks that need to be done quickly. It offers low latency and high throughput processing for multiple pages of text with its 32K context window.

The following table provides information on how popular models perform on various benchmarks, including the models offered by Snowflake Cortex AI_COMPLETE as well as a few other popular models.

ModelContext Window (Tokens)MMLU (Reasoning)HumanEval (Coding)GSM8K (Arithmetic Reasoning)Spider 1.0 (SQL)
GPT 4.o128,00088.790.296.4-
llama3.1-405b128,00088.68996.8-
llama3.1-70b128,0008680.595.1-
mistral-large2128,000849293-
llama3.1-8b128,0007372.684.9-
mixtral-8x7b32,00070.640.260.4-
mistral-7b32,00062.526.252.1-

Model restrictions

Models used by Snowflake Cortex have limitations on size as described in the table below. Sizes are given in tokens. According to industry estimates, tokens generally represent about four characters of text, so the number of words corresponding to a token limit is less than the number of tokens. Inputs exceeding the context window limit result in an error. Output that exceed the context window limit is truncated.

The maximum size of the output that a model can produce is limited by the following:

  • The model’s output token limit.
  • The space available in the context window after the model consumes the input tokens.

For example, claude-sonnet-4-6 has a context window of 1,000,000 tokens. If 100,000 tokens are used for the input, the model can generate up to 8,192 tokens. However, if 195,000 tokens are used as input, then the model can only generate up to 5,000 tokens for a total of 200,000 tokens.

Important

In the AWS AP Southeast 2 (Sydney) region:

  • the context window for llama3-8b and mistral-7b is 4,096 tokens.
  • the context window for llama3.1-8b is 16,384 tokens.
  • the context window for the Snowflake managed model from the SUMMARIZE function is 4,096 tokens.

In the AWS Europe West 1 (Ireland) region:

  • the context window for llama3.1-8b is 16,384 tokens.
  • the context window for mistral-7b is 4,096 tokens.
FunctionModelContext window (tokens)Max output (tokens)
AI_COMPLETEllama4-maverick128,0008,192
llama4-scout128,0008,192
deepseek-r132,7688,192
claude-sonnet-4-61,000,00064,000
claude-opus-4-71,000,000128,000
claude-opus-4-61,000,000128,000
claude-sonnet-4-5200,00064,000
claude-haiku-4-5200,00064,000
claude-opus-4-5200,00064,000
gemini-3.1-pro1,000,00064,000
mistral-large32,0008,192
mistral-large2128,0008,192
openai-gpt-5.1272,0008,192
openai-gpt-5272,0008,192
openai-gpt-5-mini272,0008,192
openai-gpt-5-nano272,0008,192
openai-gpt-4.1128,00032,000
mixtral-8x7b32,0008,192
llama3.1-8b128,0008,192
llama3.1-70b128,0008,192
llama3.3-70b128,0008,192
snowflake-llama-3.3-70b128,0008,192
llama3.1-405b128,0008,192
snowflake-llama-3.1-405b8,0008,192
mistral-7b32,0008,192
EMBED_TEXT_768e5-base-v2512n/a
snowflake-arctic-embed-m512n/a
EMBED_TEXT_1024nv-embed-qa-4512n/a
multilingual-e5-large512n/a
voyage-multilingual-232,000n/a
AI_EXTRACTarctic-extract128,00051,200
AI_FILTERSnowflake managed model128,000n/a
AI_CLASSIFYSnowflake managed model128,000n/a
AI_AGGSnowflake managed model128,000 per row can be used across multiple rows8,192
AI_SENTIMENTSnowflake managed model2,048n/a
AI_SUMMARIZE_AGGSnowflake managed model128,000 per row can be used across multiple rows8,192
ENTITY_SENTIMENTSnowflake managed model2,048n/a
EXTRACT_ANSWERSnowflake managed model2,048 for text 64 for questionn/a
SENTIMENTSnowflake managed model512n/a
SUMMARIZESnowflake managed model32,0004,096
TRANSLATESnowflake managed model4,096n/a

Regional availability

Snowflake Cortex AI functions are available in the following regions. If your region is not listed for a particular function, use cross-region inference.

Note

  • The AI_COUNT_TOKENS function is available in all regions for any model, but the models themselves are available only in the regions specified in the tables below.

The following functions and models are available in any region via cross-region inference.

Function | ModelCross Cloud (Any Region)AWS US (Cross-Region)AWS US Commercial Gov (Cross-Region)AWS EU (Cross-Region)AWS APJ (Cross-Region)AWS AU (Cross-Region)Azure US (Cross-Region)Azure EU (Cross-Region)Google Cloud US (Cross-Region)
AI_COMPLETE
claude-opus-4-7
claude-sonnet-4-6
claude-opus-4-6
claude-sonnet-4-5
claude-opus-4-5
claude-haiku-4-5
claude-4-sonnet [legacy]
gemini-3.1-pro*
llama4-maverick
llama4-scout
llama3.1-8b
llama3.1-70b
llama3.3-70b
snowflake-llama-3.3-70b
llama3.1-405b
openai-gpt-5.2
openai-gpt-5.1
openai-gpt-5
openai-gpt-5-mini
openai-gpt-5-nano
openai-gpt-4.1
snowflake-llama-3.1-405b
deepseek-r1
mistral-large2
mixtral-8x7b
mistral-7b
AI_EMBED
e5-base-v2
snowflake-arctic-embed-m
snowflake-arctic-embed-m-v1.5
snowflake-arctic-embed-l-v2.0
snowflake-arctic-embed-l-v2.0-8k
nv-embed-qa-4
multilingual-e5-large
voyage-multilingual-2
AI_CLASSIFY TEXT
AI_CLASSIFY IMAGE
AI_EXTRACT
AI_FILTER TEXT
AI_FILTER IMAGE
AI_AGG
AI_REDACT
AI_SENTIMENT
AI_SIMILARITY TEXT
AI_SIMILARITY IMAGE
AI_SUMMARIZE_AGG
AI_TRANSCRIBE
SENTIMENT
ENTITY_SENTIMENT
EXTRACT_ANSWER
SUMMARIZE
TRANSLATE
AI_TRANSLATE

***** Indicates a preview function or model. Preview features are not suitable for production workloads.

The following Snowflake Cortex AI functions and models are available in the following extended regions.

Function | ModelAWS US East 2 (Ohio)AWS CA Central 1 (Central)AWS SA East 1 (São Paulo)AWS Europe West 2 (London)AWS Europe Central 1 (Frankfurt)AWS Europe North 1 (Stockholm)AWS AP Northeast 1 (Tokyo)AWS AP South 1 (Mumbai)AWS AP Southeast 2 (Sydney)AWS AP Southeast 3 (Jakarta)Azure South Central US (Texas)Azure West US 2 (Washington)Azure UK South (London)Azure North Europe (Ireland)Azure Switzerland North (Zürich)Azure Central India (Pune)Azure Japan East (Tokyo, Saitama)Azure Southeast Asia (Singapore)Azure Australia East (New South Wales)Google Cloud Europe West 2 (London)Google Cloud Europe West 4 (Netherlands)Google Cloud US Central 1 (Iowa)Google Cloud US East 4 (N. Virginia)
AI_EMBED
| snowflake-arctic-embed-m-v1.5
| snowflake-arctic-embed-m |
| multilingual-e5-large |
AI_EXTRACTCross-region onlyCross-region onlyCross-region onlyCross-region onlyCross-region onlyCross-region onlyCross-region onlyCross-region onlyCross-region only

The following table lists availability of legacy models. These models have not been deprecated and can still be used. However, Snowflake recommends newer models for new development.

Legacy

Function (Model)AWS US West 2 (Oregon)AWS US East 1 (N. Virginia)AWS Europe Central 1 (Frankfurt)AWS Europe West 1 (Ireland)AWS AP Southeast 2 (Sydney)AWS AP Northeast 1 (Tokyo)Azure East US 2 (Virginia)Azure West Europe (Netherlands)
AI_COMPLETE
| llama3-8b
| llama3-70b
| mistral-large

Previous model versions

The Snowflake Cortex AI_COMPLETE and COMPLETE functions also supports the following older model versions. We recommend using the latest model versions instead of the versions listed in this table.

ModelContext Window (Tokens)MMLU (Reasoning)HumanEval (Coding)GSM8K (Arithmetic Reasoning)Spider 1.0 (SQL)
mistral-large32,00081.245.18181
llama-2-70b-chat4,09668.930.557.5-