Cortex Chat Completions API¶

The Cortex Chat Completions API is a model-agnostic superset of the OpenAI Chat Completions API, enabling compatibility with a vast ecosystem of tools, libraries and third-party AI applications.

The Cortex Chat Completions API is a companion API to the Cortex REST API with increased support for OpenAI models. To learn more about the Cortex REST API, see Cortex REST API.

Getting started with the OpenAI SDK¶

Important

Make sure you’re using an official version of the OpenAI SDK as specified in the OpenAI Libraries documentation, such as in one of the following languages:

Python
TypeScript/JavaScript

To get started, you need:

Your Snowflake account URL. This will be used to construct the base URL for the OpenAI client.
A Snowflake Programmatic Access Token (PAT). This will be used for authenticating to the Cortex Chat Completions API. For information about creating a PAT, see Generating a programmatic access token.
A valid model name to use in the request. For a list of supported models, see Model availability.

Simple code examples¶

The following examples show how to make requests to the OpenAI SDKs with Python, JavaScript/TypeScript, and curl.

Use the following code to help you get started with the Python SDK:

from openai import OpenAI

client = OpenAI(
  api_key="<SNOWFLAKE_PAT>",
  base_url="https://<account-identifier>.snowflakecomputing.com/api/v2/cortex/v1"
)

response = client.chat.completions.create(
model="<model_name>",
messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {
        "role": "user",
        "content": "How does a snowflake get its unique pattern?"
    }
  ]
)

print(response.choices[0].message)

Copy

In the preceding code, specify values for the following:

base_url: Replace <account-identifier> with your Snowflake account identifier.
api_key: Replace <SNOWFLAKE_PAT> with your Snowflake Programmatic Access Token (PAT).
model: Replace <model_name> with the name of the model you want to use. For a list of supported models, see Model availability.

Use the following code to help you get started with the JavaScript/TypeScript SDK:

import OpenAI from "openai";

const openai = new OpenAI({
  apikey="SNOWFLAKE_PAT",
  baseURL: "https://<account-identifier>.snowflakecomputing.com/api/v2/cortex/v1"
});

const response = await openai.chat.completions.create({
  model: "claude-3-7-sonnet",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    {
        role: "user",
        content: "How does a snowflake get its unique pattern?",
    },
  ],
});

console.log(response.choices[0].message);

Copy

In the preceding code, specify values for the following:

baseURL: Replace <account-identifier> with your Snowflake account identifier.
apikey: Replace SNOWFLAKE_PAT with your Snowflake Personal Access Token (PAT).
model: Replace <model_name> with the name of the model you want to use. For a list of supported models, see Model availability.

Use the following curl command to make a request to the Snowflake-hosted model:

curl "https://<account-identifier>.snowflakecomputing.com/api/v2/cortex/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <SNOWFLAKE_PAT>" \
-d '{
  "model": "<model_name>",
  "messages": [
      {"role": "user", "content": "How does a snowflake get its unique pattern?"}
  ]
}'

Copy

In the preceding code, specify values for the following:

<account-identifier>: Replace <account-identifier> with your Snowflake account identifier.
<SNOWFLAKE_PAT>: Replace <SNOWFLAKE_PAT> with your Snowflake Personal Access Token (PAT).
<model_name>: Replace <model_name> with the name of the model you want to use. For a list of supported models, see Model availability.

Stream responses¶

You can stream responses from the REST API by setting the stream parameter to True in the request.

The following Python code streams a response from the REST API:

from openai import OpenAI

client = OpenAI(
  api_key="<SNOWFLAKE_PAT>",
  base_url="https://<account-identifier>.snowflakecomputing.com/api/v2/cortex/v1"
)

response = client.chat.completions.create(
  model="<model_name>",
  messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {
        "role": "user",
        "content": "How does a snowflake get its unique pattern?"
    }
  ],
  stream=True
)

for chunk in response:
    print(chunk.choices[0].delta.content, end="", flush=True)

Copy

The following JavaScript/TypeScript code streams a response from the REST API:

import OpenAI from "openai";

const openai = new OpenAI({
    apikey="SNOWFLAKE_PAT",
    baseURL: "https://<account-identifier>.snowflakecomputing.com/api/v2/cortex/v1"
});

const stream = await openai.chat.completions.create({
    model: "<model_name>",
    messages: [
      { role: "system", content: "You are a helpful assistant." },
      {
          role: "user",
          content: "How does a snowflake get its unique pattern?",
      }
    ],
    stream:true,
});


for await (const event of stream) {
  console.log(event);
}

Copy

The following curl command streams a response from the Snowflake-hosted model:

curl "https://<account-identifier>.snowflakecomputing.com/api/v2/cortex/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer SNOWFLAKE_PAT" \
-d '{
  "model": "<model_name>",
  "messages": [
      {"role": "user", "content": "How does a snowflake get its unique pattern?"}
  ],
  "stream": true,
  "stream_options": {
    "include_usage": true
  }
}'

Copy

Limitations¶

The following are limitations with using the OpenAI SDK with Snowflake-hosted models:

Only Chat Completions are supported.
If unset, max_completion_tokens defaults to 4096. The theoretical maximum for the Cortex Chat Completions API is 131,072. Each model has its own output token limits which may be less than 131,072.
Tool calling is supported for OpenAI and Claude models. For an example that uses tool calling effectively, see Tool calling with chain of thought example.
Audio isn’t supported.
Image understanding is supported for OpenAI and Claude models only. Images are limited to 20 per conversation with a 20 MiB max request size.
Only Claude models support ephemeral cache control points for prompt caching. OpenAI models support implicit caching.
Only Claude models support returning their reasoning details in the response.
Error messages are generated by Snowflake, not OpenAI. It’s recommended to use reported errors for logging and debugging purposes only.

Detailed compatibility chart¶

The following tables summarize which request and response fields and headers are supported when using the Cortex Chat Completions API with different Snowflake-hosted models.

Request fields¶
Field	OpenAI Models	Claude Models	Other Models
`model`	✔ Supported	✔ Supported	✔ Supported
`messages`	See sub-fields	See sub-fields	See sub-fields
`messages[].audio`	❌ Error	❌ Ignored	❌ Ignored
`messages[].role`	✔ Supported	✔ Only user/assistant/system	✔ Only user/assistant/system
`messages[].content` (string)	✔ Supported	✔ Supported	✔ Supported
`messages[].content[]` (array)	See sub-fields	See sub-fields	See sub-fields
`messages[].content[].text`	✔ Supported	✔ Supported	✔ Supported
`messages[].content[].type`	✔ Supported	✔ Supported	✔ Supported
`messages[].content[].image_url`	✔ Supported	✔ Supported	❌ Error
`messages[].content[].cache_control`	❌ Ignored	✔ Supported (ephemeral only)	❌ Ignored
`messages[].content[].file`	❌ Error	❌ Error	❌ Ignored
`messages[].content[].input_audio`	❌ Error	❌ Ignored	❌ Ignored
`messages[].content[].refusal`	✔ Supported	❌ Ignored	❌ Ignored
`messages[].function_call`	✔ Supported (deprecated)	❌ Ignored	❌ Ignored
`messages[].name`	✔ Supported	❌ Ignored	❌ Ignored
`messages[].refusal`	✔ Supported	❌ Ignored	❌ Ignored
`messages[].tool_call_id`	✔ Supported	✔ Supported	❌ Ignored
`messages[].tool_calls`	✔ Supported	✔ Only `function` tools	❌ Ignored
`messages[].reasoning_details`	❌ Ignored	✔ OpenRouter format `reasoning.text`	❌ Ignored
`audio`	❌ Error	❌ Ignored	❌ Ignored
`frequency_penalty`	✔ Supported	❌ Ignored	❌ Ignored
`logit_bias`	✔ Supported	❌ Ignored	❌ Ignored
`logprobs`	✔ Supported	❌ Ignored	❌ Ignored
`max_tokens`	❌ Error (deprecated)	❌ Error (deprecated)	❌ Error (deprecated)
`max_completion_tokens`	✔ Supported (4096 default, 131072 max)	✔ Supported (4096 default, 131072 max)	✔ Supported (4096 default, 131072 max)
`metadata`	❌ Ignored	❌ Ignored	❌ Ignored
`modalities`	❌ Ignored	❌ Ignored	❌ Ignored
`n`	✔ Supported	❌ Ignored	❌ Ignored
`parallel_tool_calls`	✔ Supported	❌ Ignored	❌ Ignored
`prediction`	✔ Supported	❌ Ignored	❌ Ignored
`presence_penalty`	✔ Supported	❌ Ignored	❌ Ignored
`prompt_cache_key`	✔ Supported	❌ Ignored	❌ Ignored
`reasoning_effort`	✔ Supported	❌ Ignored (use `reasoning` object)	❌ Ignored
`reasoning`	See sub-fields	See sub-fields	See sub-fields
`reasoning.effort`	✔ Supported (overrides `reasoning_effort`)	✔ Converted to `reasoning.max_tokens`	❌ Ignored
`reasoning.max_tokens`	❌ Ignored	✔ Supported	❌ Ignored
`response_format`	✔ Supported	✔ Only `json_schema` and `text`	❌ Ignored
`safety_identifier`	❌ Ignored	❌ Ignored	❌ Ignored
`service_tier`	❌ Error	❌ Error	❌ Error
`stop`	✔ Supported	❌ Ignored	❌ Ignored
`store`	❌ Error	❌ Error	❌ Error
`stream`	✔ Supported	✔ Supported	✔ Supported
`stream_options`	See sub-fields	See sub-fields	See sub-fields
`stream_options.include_obfuscation`	❌ Ignored	❌ Ignored	❌ Ignored
`stream_options.include_usage`	✔ Supported	✔ Supported	✔ Supported
`temperature`	✔ Supported	✔ Supported	✔ Supported
`tool_choice`	✔ Supported	✔ Only `function` tools	❌ Ignored
`tools`	✔ Supported	✔ Only `function` tools	❌ Error
`top_logprobs`	✔ Supported	❌ Ignored	❌ Ignored
`top_p`	✔ Supported	✔ Supported	✔ Supported
`verbosity`	✔ Supported	❌ Ignored	❌ Ignored
`web_search_options`	❌ Error	❌ Ignored	❌ Ignored

Response fields¶
Field	OpenAI Models	Claude Models	Other Models
`id`	✔ Supported	✔ Supported	✔ Supported
`object`	✔ Supported	✔ Supported	✔ Supported
`created`	✔ Supported	✔ Supported	✔ Supported
`model`	✔ Supported	✔ Supported	✔ Supported
`choices`	See sub-fields	See sub-fields	See sub-fields
`choices[].index`	✔ Supported	✔ Single choice only	✔ Single choice only
`choices[].finish_reason`	✔ Supported	❌ Not supported	✔ Only `stop`
`choices[].logprobs`	✔ Supported	❌ Not supported	❌ Not supported
`choices[].message` (non-streaming)	See sub-fields	See sub-fields	See sub-fields
`choices[].message.content`	✔ Supported	✔ Supported	✔ Supported
`choices[].message.role`	✔ Supported	✔ Supported	✔ Supported
`choices[].message.refusal`	✔ Supported	❌ Not supported	❌ Not supported
`choices[].message.annotations`	❌ Not supported	❌ Not supported	❌ Not supported
`choices[].message.audio`	❌ Not supported	❌ Not supported	❌ Not supported
`choices[].message.function_call`	✔ Supported	❌ Not supported	❌ Not supported
`choices[].message.tool_calls`	✔ Supported	✔ Only `function` tools	❌ Not supported
`choices[].message.reasoning`	❌ Not supported	✔ OpenRouter format	❌ Not supported
`choices[].delta` (streaming)	See sub-fields	See sub-fields	See sub-fields
`choices[].delta.content`	✔ Supported	✔ Supported	✔ Supported
`choices[].delta.role`	✔ Supported	❌ Not supported	❌ Not supported
`choices[].delta.refusal`	✔ Supported	❌ Not supported	❌ Not supported
`choices[].delta.function_call`	✔ Supported	❌ Not supported	❌ Not supported
`choices[].delta.tool_calls`	✔ Supported	✔ Only `function` tools	❌ Not supported
`choices[].delta.reasoning`	❌ Not supported	✔ OpenRouter format	❌ Not supported
`usage`	See sub-fields	See sub-fields	See sub-fields
`usage.prompt_tokens`	✔ Supported	✔ Supported	✔ Supported
`usage.completion_tokens`	✔ Supported	✔ Supported	✔ Supported
`usage.total_tokens`	✔ Supported	✔ Supported	✔ Supported
`usage.prompt_tokens_details`	See sub-fields	See sub-fields	See sub-fields
`usage.prompt_tokens_details.audio_tokens`	❌ Not supported	❌ Not supported	❌ Not supported
`usage.prompt_tokens_details.cached_tokens`	✔ Only cache reads	✔ Cache read + write	❌ Not supported
`usage.completion_tokens_details`	See sub-fields	See sub-fields	See sub-fields
`usage.completion_tokens_details.accepted_prediction_tokens`	✔ Supported	❌ Not supported	❌ Not supported
`usage.completion_tokens_details.audio_tokens`	❌ Not supported	❌ Not supported	❌ Not supported
`usage.completion_tokens_details.reasoning_tokens`	✔ Supported	❌ Not supported	❌ Not supported
`usage.completion_tokens_details.rejected_prediction_tokens`	✔ Supported	❌ Not supported	❌ Not supported
`service_tier`	✔ Supported	❌ Not supported	❌ Not supported
`system_fingerprint`	✔ Supported	❌ Not supported	❌ Not supported

Request headers¶
Header	Support
`Authorization`	✔ Required
`Content-Type`	✔ Supported (`application/json`)
`Accept`	✔ Supported (`application/json`, `text/event-stream`)

Response headers¶
Header	Support
`openai-organization`	❌ Not supported
`openai-version`	❌ Not supported
`openai-processing-ms`	❌ Not supported
`x-ratelimit-limit-requests`	❌ Not supported
`x-ratelimit-limit-tokens`	❌ Not supported
`x-ratelimit-remaining-requests`	❌ Not supported
`x-ratelimit-remaining-tokens`	❌ Not supported
`x-ratelimit-reset-requests`	❌ Not supported
`x-ratelimit-reset-tokens`	❌ Not supported
`retry-after`	❌ Not supported

Learn more¶

For a full compendium of usage examples, please consult OpenAI’s Chat Completions API reference or the OpenAI Cookbook.

In addition to providing compatibility with the Chat Completions API, Snowflake supports OpenRouter-compatible features for Claude models. These features are exposed as extra fields on the request.

For prompt caching, use the cache_control field. See the OpenRouter prompt caching documentation for more information.
For reasoning tokens, use the reasoning field. See the OpenRouter reasoning documentation for more information.