Cortex Chat Completions API

The Cortex Chat Completions API is a model-agnostic superset of the OpenAI Chat Completions API, enabling compatibility with a vast ecosystem of tools, libraries and third-party AI applications.

The Cortex Chat Completions API is a companion API to the Cortex REST API with increased support for OpenAI models. To learn more about the Cortex REST API, see Cortex REST API.

Getting started with the OpenAI SDK

Important

Make sure you’re using an official version of the OpenAI SDK as specified in the OpenAI Libraries documentation, such as in one of the following languages:

  • Python

  • TypeScript/JavaScript

To get started, you need:

  • Your Snowflake account URL. This will be used to construct the base URL for the OpenAI client.

  • A Snowflake Programmatic Access Token (PAT). This will be used for authenticating to the Cortex Chat Completions API. For information about creating a PAT, see Generating a programmatic access token.

  • A valid model name to use in the request. For a list of supported models, see Model availability.

Simple code examples

The following examples show how to make requests to the OpenAI SDKs with Python, JavaScript/TypeScript, and curl.

Use the following code to help you get started with the Python SDK:

from openai import OpenAI

client = OpenAI(
  api_key="<SNOWFLAKE_PAT>",
  base_url="https://<account-identifier>.snowflakecomputing.com/api/v2/cortex/v1"
)

response = client.chat.completions.create(
model="<model_name>",
messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {
        "role": "user",
        "content": "How does a snowflake get its unique pattern?"
    }
  ]
)

print(response.choices[0].message)
Copy

In the preceding code, specify values for the following:

  • base_url: Replace <account-identifier> with your Snowflake account identifier.

  • api_key: Replace <SNOWFLAKE_PAT> with your Snowflake Programmatic Access Token (PAT).

  • model: Replace <model_name> with the name of the model you want to use. For a list of supported models, see Model availability.

Use the following code to help you get started with the JavaScript/TypeScript SDK:

import OpenAI from "openai";

const openai = new OpenAI({
  apikey="SNOWFLAKE_PAT",
  baseURL: "https://<account-identifier>.snowflakecomputing.com/api/v2/cortex/v1"
});

const response = await openai.chat.completions.create({
  model: "claude-3-7-sonnet",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    {
        role: "user",
        content: "How does a snowflake get its unique pattern?",
    },
  ],
});

console.log(response.choices[0].message);
Copy

In the preceding code, specify values for the following:

  • baseURL: Replace <account-identifier> with your Snowflake account identifier.

  • apikey: Replace SNOWFLAKE_PAT with your Snowflake Personal Access Token (PAT).

  • model: Replace <model_name> with the name of the model you want to use. For a list of supported models, see Model availability.

Use the following curl command to make a request to the Snowflake-hosted model:

curl "https://<account-identifier>.snowflakecomputing.com/api/v2/cortex/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <SNOWFLAKE_PAT>" \
-d '{
  "model": "<model_name>",
  "messages": [
      {"role": "user", "content": "How does a snowflake get its unique pattern?"}
  ]
}'
Copy

In the preceding code, specify values for the following:

  • <account-identifier>: Replace <account-identifier> with your Snowflake account identifier.

  • <SNOWFLAKE_PAT>: Replace <SNOWFLAKE_PAT> with your Snowflake Personal Access Token (PAT).

  • <model_name>: Replace <model_name> with the name of the model you want to use. For a list of supported models, see Model availability.

Stream responses

You can stream responses from the REST API by setting the stream parameter to True in the request.

The following Python code streams a response from the REST API:

from openai import OpenAI

client = OpenAI(
  api_key="<SNOWFLAKE_PAT>",
  base_url="https://<account-identifier>.snowflakecomputing.com/api/v2/cortex/v1"
)

response = client.chat.completions.create(
  model="<model_name>",
  messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {
        "role": "user",
        "content": "How does a snowflake get its unique pattern?"
    }
  ],
  stream=True
)

for chunk in response:
    print(chunk.choices[0].delta.content, end="", flush=True)
Copy

The following JavaScript/TypeScript code streams a response from the REST API:

import OpenAI from "openai";

const openai = new OpenAI({
    apikey="SNOWFLAKE_PAT",
    baseURL: "https://<account-identifier>.snowflakecomputing.com/api/v2/cortex/v1"
});

const stream = await openai.chat.completions.create({
    model: "<model_name>",
    messages: [
      { role: "system", content: "You are a helpful assistant." },
      {
          role: "user",
          content: "How does a snowflake get its unique pattern?",
      }
    ],
    stream:true,
});


for await (const event of stream) {
  console.log(event);
}
Copy

The following curl command streams a response from the Snowflake-hosted model:

curl "https://<account-identifier>.snowflakecomputing.com/api/v2/cortex/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer SNOWFLAKE_PAT" \
-d '{
  "model": "<model_name>",
  "messages": [
      {"role": "user", "content": "How does a snowflake get its unique pattern?"}
  ],
  "stream": true,
  "stream_options": {
    "include_usage": true
  }
}'
Copy

Limitations

The following are limitations with using the OpenAI SDK with Snowflake-hosted models:

  • Only Chat Completions are supported.

  • If unset, max_completion_tokens defaults to 4096. The theoretical maximum for the Cortex Chat Completions API is 131,072. Each model has its own output token limits which may be less than 131,072.

  • Tool calling is supported for OpenAI and Claude models. For an example that uses tool calling effectively, see Tool calling with chain of thought example.

  • Audio isn’t supported.

  • Image understanding is supported for OpenAI and Claude models only. Images are limited to 20 per conversation with a 20 MiB max request size.

  • Only Claude models support ephemeral cache control points for prompt caching. OpenAI models support implicit caching.

  • Only Claude models support returning their reasoning details in the response.

  • Error messages are generated by Snowflake, not OpenAI. It’s recommended to use reported errors for logging and debugging purposes only.

Detailed compatibility chart

The following tables summarize which request and response fields and headers are supported when using the Cortex Chat Completions API with different Snowflake-hosted models.

Request fields

Field

OpenAI Models

Claude Models

Other Models

model

✔ Supported

✔ Supported

✔ Supported

messages

See sub-fields

See sub-fields

See sub-fields

messages[].audio

❌ Error

❌ Ignored

❌ Ignored

messages[].role

✔ Supported

✔ Only user/assistant/system

✔ Only user/assistant/system

messages[].content (string)

✔ Supported

✔ Supported

✔ Supported

messages[].content[] (array)

See sub-fields

See sub-fields

See sub-fields

messages[].content[].text

✔ Supported

✔ Supported

✔ Supported

messages[].content[].type

✔ Supported

✔ Supported

✔ Supported

messages[].content[].image_url

✔ Supported

✔ Supported

❌ Error

messages[].content[].cache_control

❌ Ignored

✔ Supported (ephemeral only)

❌ Ignored

messages[].content[].file

❌ Error

❌ Error

❌ Ignored

messages[].content[].input_audio

❌ Error

❌ Ignored

❌ Ignored

messages[].content[].refusal

✔ Supported

❌ Ignored

❌ Ignored

messages[].function_call

✔ Supported (deprecated)

❌ Ignored

❌ Ignored

messages[].name

✔ Supported

❌ Ignored

❌ Ignored

messages[].refusal

✔ Supported

❌ Ignored

❌ Ignored

messages[].tool_call_id

✔ Supported

✔ Supported

❌ Ignored

messages[].tool_calls

✔ Supported

✔ Only function tools

❌ Ignored

messages[].reasoning_details

❌ Ignored

✔ OpenRouter format reasoning.text

❌ Ignored

audio

❌ Error

❌ Ignored

❌ Ignored

frequency_penalty

✔ Supported

❌ Ignored

❌ Ignored

logit_bias

✔ Supported

❌ Ignored

❌ Ignored

logprobs

✔ Supported

❌ Ignored

❌ Ignored

max_tokens

❌ Error (deprecated)

❌ Error (deprecated)

❌ Error (deprecated)

max_completion_tokens

✔ Supported (4096 default, 131072 max)

✔ Supported (4096 default, 131072 max)

✔ Supported (4096 default, 131072 max)

metadata

❌ Ignored

❌ Ignored

❌ Ignored

modalities

❌ Ignored

❌ Ignored

❌ Ignored

n

✔ Supported

❌ Ignored

❌ Ignored

parallel_tool_calls

✔ Supported

❌ Ignored

❌ Ignored

prediction

✔ Supported

❌ Ignored

❌ Ignored

presence_penalty

✔ Supported

❌ Ignored

❌ Ignored

prompt_cache_key

✔ Supported

❌ Ignored

❌ Ignored

reasoning_effort

✔ Supported

❌ Ignored (use reasoning object)

❌ Ignored

reasoning

See sub-fields

See sub-fields

See sub-fields

reasoning.effort

✔ Supported (overrides reasoning_effort)

✔ Converted to reasoning.max_tokens

❌ Ignored

reasoning.max_tokens

❌ Ignored

✔ Supported

❌ Ignored

response_format

✔ Supported

✔ Only json_schema and text

❌ Ignored

safety_identifier

❌ Ignored

❌ Ignored

❌ Ignored

service_tier

❌ Error

❌ Error

❌ Error

stop

✔ Supported

❌ Ignored

❌ Ignored

store

❌ Error

❌ Error

❌ Error

stream

✔ Supported

✔ Supported

✔ Supported

stream_options

See sub-fields

See sub-fields

See sub-fields

stream_options.include_obfuscation

❌ Ignored

❌ Ignored

❌ Ignored

stream_options.include_usage

✔ Supported

✔ Supported

✔ Supported

temperature

✔ Supported

✔ Supported

✔ Supported

tool_choice

✔ Supported

✔ Only function tools

❌ Ignored

tools

✔ Supported

✔ Only function tools

❌ Error

top_logprobs

✔ Supported

❌ Ignored

❌ Ignored

top_p

✔ Supported

✔ Supported

✔ Supported

verbosity

✔ Supported

❌ Ignored

❌ Ignored

web_search_options

❌ Error

❌ Ignored

❌ Ignored

Response fields

Field

OpenAI Models

Claude Models

Other Models

id

✔ Supported

✔ Supported

✔ Supported

object

✔ Supported

✔ Supported

✔ Supported

created

✔ Supported

✔ Supported

✔ Supported

model

✔ Supported

✔ Supported

✔ Supported

choices

See sub-fields

See sub-fields

See sub-fields

choices[].index

✔ Supported

✔ Single choice only

✔ Single choice only

choices[].finish_reason

✔ Supported

❌ Not supported

✔ Only stop

choices[].logprobs

✔ Supported

❌ Not supported

❌ Not supported

choices[].message (non-streaming)

See sub-fields

See sub-fields

See sub-fields

choices[].message.content

✔ Supported

✔ Supported

✔ Supported

choices[].message.role

✔ Supported

✔ Supported

✔ Supported

choices[].message.refusal

✔ Supported

❌ Not supported

❌ Not supported

choices[].message.annotations

❌ Not supported

❌ Not supported

❌ Not supported

choices[].message.audio

❌ Not supported

❌ Not supported

❌ Not supported

choices[].message.function_call

✔ Supported

❌ Not supported

❌ Not supported

choices[].message.tool_calls

✔ Supported

✔ Only function tools

❌ Not supported

choices[].message.reasoning

❌ Not supported

✔ OpenRouter format

❌ Not supported

choices[].delta (streaming)

See sub-fields

See sub-fields

See sub-fields

choices[].delta.content

✔ Supported

✔ Supported

✔ Supported

choices[].delta.role

✔ Supported

❌ Not supported

❌ Not supported

choices[].delta.refusal

✔ Supported

❌ Not supported

❌ Not supported

choices[].delta.function_call

✔ Supported

❌ Not supported

❌ Not supported

choices[].delta.tool_calls

✔ Supported

✔ Only function tools

❌ Not supported

choices[].delta.reasoning

❌ Not supported

✔ OpenRouter format

❌ Not supported

usage

See sub-fields

See sub-fields

See sub-fields

usage.prompt_tokens

✔ Supported

✔ Supported

✔ Supported

usage.completion_tokens

✔ Supported

✔ Supported

✔ Supported

usage.total_tokens

✔ Supported

✔ Supported

✔ Supported

usage.prompt_tokens_details

See sub-fields

See sub-fields

See sub-fields

usage.prompt_tokens_details.audio_tokens

❌ Not supported

❌ Not supported

❌ Not supported

usage.prompt_tokens_details.cached_tokens

✔ Only cache reads

✔ Cache read + write

❌ Not supported

usage.completion_tokens_details

See sub-fields

See sub-fields

See sub-fields

usage.completion_tokens_details.accepted_prediction_tokens

✔ Supported

❌ Not supported

❌ Not supported

usage.completion_tokens_details.audio_tokens

❌ Not supported

❌ Not supported

❌ Not supported

usage.completion_tokens_details.reasoning_tokens

✔ Supported

❌ Not supported

❌ Not supported

usage.completion_tokens_details.rejected_prediction_tokens

✔ Supported

❌ Not supported

❌ Not supported

service_tier

✔ Supported

❌ Not supported

❌ Not supported

system_fingerprint

✔ Supported

❌ Not supported

❌ Not supported

Request headers

Header

Support

Authorization

✔ Required

Content-Type

✔ Supported (application/json)

Accept

✔ Supported (application/json, text/event-stream)

Response headers

Header

Support

openai-organization

❌ Not supported

openai-version

❌ Not supported

openai-processing-ms

❌ Not supported

x-ratelimit-limit-requests

❌ Not supported

x-ratelimit-limit-tokens

❌ Not supported

x-ratelimit-remaining-requests

❌ Not supported

x-ratelimit-remaining-tokens

❌ Not supported

x-ratelimit-reset-requests

❌ Not supported

x-ratelimit-reset-tokens

❌ Not supported

retry-after

❌ Not supported

Learn more

For a full compendium of usage examples, please consult OpenAI’s Chat Completions API reference or the OpenAI Cookbook.

In addition to providing compatibility with the Chat Completions API, Snowflake supports OpenRouter-compatible features for Claude models. These features are exposed as extra fields on the request.

  1. For prompt caching, use the cache_control field. See the OpenRouter prompt caching documentation for more information.

  2. For reasoning tokens, use the reasoning field. See the OpenRouter reasoning documentation for more information.