Cortex Chat Completions API¶
The Cortex Chat Completions API is a model-agnostic superset of the OpenAI Chat Completions API, enabling compatibility with a vast ecosystem of tools, libraries and third-party AI applications.
The Cortex Chat Completions API is a companion API to the Cortex REST API with increased support for OpenAI models. To learn more about the Cortex REST API, see Cortex REST API.
Getting started with the OpenAI SDK¶
Important
Make sure you’re using an official version of the OpenAI SDK as specified in the OpenAI Libraries documentation, such as in one of the following languages:
Python
TypeScript/JavaScript
To get started, you need:
Your Snowflake account URL. This will be used to construct the base URL for the OpenAI client.
A Snowflake Programmatic Access Token (PAT). This will be used for authenticating to the Cortex Chat Completions API. For information about creating a PAT, see Generating a programmatic access token.
A valid model name to use in the request. For a list of supported models, see Model availability.
Simple code examples¶
The following examples show how to make requests to the OpenAI SDKs with Python, JavaScript/TypeScript, and curl
.
Use the following code to help you get started with the Python SDK:
from openai import OpenAI
client = OpenAI(
api_key="<SNOWFLAKE_PAT>",
base_url="https://<account-identifier>.snowflakecomputing.com/api/v2/cortex/v1"
)
response = client.chat.completions.create(
model="<model_name>",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{
"role": "user",
"content": "How does a snowflake get its unique pattern?"
}
]
)
print(response.choices[0].message)
In the preceding code, specify values for the following:
base_url
: Replace<account-identifier>
with your Snowflake account identifier.api_key
: Replace<SNOWFLAKE_PAT>
with your Snowflake Programmatic Access Token (PAT).model
: Replace<model_name>
with the name of the model you want to use. For a list of supported models, see Model availability.
Use the following code to help you get started with the JavaScript/TypeScript SDK:
import OpenAI from "openai";
const openai = new OpenAI({
apikey="SNOWFLAKE_PAT",
baseURL: "https://<account-identifier>.snowflakecomputing.com/api/v2/cortex/v1"
});
const response = await openai.chat.completions.create({
model: "claude-3-7-sonnet",
messages: [
{ role: "system", content: "You are a helpful assistant." },
{
role: "user",
content: "How does a snowflake get its unique pattern?",
},
],
});
console.log(response.choices[0].message);
In the preceding code, specify values for the following:
baseURL
: Replace<account-identifier>
with your Snowflake account identifier.apikey
: ReplaceSNOWFLAKE_PAT
with your Snowflake Personal Access Token (PAT).model
: Replace<model_name>
with the name of the model you want to use. For a list of supported models, see Model availability.
Use the following curl
command to make a request to the Snowflake-hosted model:
curl "https://<account-identifier>.snowflakecomputing.com/api/v2/cortex/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <SNOWFLAKE_PAT>" \
-d '{
"model": "<model_name>",
"messages": [
{"role": "user", "content": "How does a snowflake get its unique pattern?"}
]
}'
In the preceding code, specify values for the following:
<account-identifier>
: Replace<account-identifier>
with your Snowflake account identifier.<SNOWFLAKE_PAT>
: Replace<SNOWFLAKE_PAT>
with your Snowflake Personal Access Token (PAT).<model_name>
: Replace<model_name>
with the name of the model you want to use. For a list of supported models, see Model availability.
Stream responses¶
You can stream responses from the REST API by setting the stream
parameter to True
in the request.
The following Python code streams a response from the REST API:
from openai import OpenAI
client = OpenAI(
api_key="<SNOWFLAKE_PAT>",
base_url="https://<account-identifier>.snowflakecomputing.com/api/v2/cortex/v1"
)
response = client.chat.completions.create(
model="<model_name>",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{
"role": "user",
"content": "How does a snowflake get its unique pattern?"
}
],
stream=True
)
for chunk in response:
print(chunk.choices[0].delta.content, end="", flush=True)
The following JavaScript/TypeScript code streams a response from the REST API:
import OpenAI from "openai";
const openai = new OpenAI({
apikey="SNOWFLAKE_PAT",
baseURL: "https://<account-identifier>.snowflakecomputing.com/api/v2/cortex/v1"
});
const stream = await openai.chat.completions.create({
model: "<model_name>",
messages: [
{ role: "system", content: "You are a helpful assistant." },
{
role: "user",
content: "How does a snowflake get its unique pattern?",
}
],
stream:true,
});
for await (const event of stream) {
console.log(event);
}
The following curl
command streams a response from the Snowflake-hosted model:
curl "https://<account-identifier>.snowflakecomputing.com/api/v2/cortex/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer SNOWFLAKE_PAT" \
-d '{
"model": "<model_name>",
"messages": [
{"role": "user", "content": "How does a snowflake get its unique pattern?"}
],
"stream": true,
"stream_options": {
"include_usage": true
}
}'
Limitations¶
The following are limitations with using the OpenAI SDK with Snowflake-hosted models:
Only Chat Completions are supported.
If unset,
max_completion_tokens
defaults to 4096. The theoretical maximum for the Cortex Chat Completions API is 131,072. Each model has its own output token limits which may be less than 131,072.Tool calling is supported for OpenAI and Claude models. For an example that uses tool calling effectively, see Tool calling with chain of thought example.
Audio isn’t supported.
Image understanding is supported for OpenAI and Claude models only. Images are limited to 20 per conversation with a 20 MiB max request size.
Only Claude models support ephemeral cache control points for prompt caching. OpenAI models support implicit caching.
Only Claude models support returning their reasoning details in the response.
Error messages are generated by Snowflake, not OpenAI. It’s recommended to use reported errors for logging and debugging purposes only.
Detailed compatibility chart¶
The following tables summarize which request and response fields and headers are supported when using the Cortex Chat Completions API with different Snowflake-hosted models.
Field |
OpenAI Models |
Claude Models |
Other Models |
---|---|---|---|
|
✔ Supported |
✔ Supported |
✔ Supported |
|
See sub-fields |
See sub-fields |
See sub-fields |
|
❌ Error |
❌ Ignored |
❌ Ignored |
|
✔ Supported |
✔ Only user/assistant/system |
✔ Only user/assistant/system |
|
✔ Supported |
✔ Supported |
✔ Supported |
|
See sub-fields |
See sub-fields |
See sub-fields |
|
✔ Supported |
✔ Supported |
✔ Supported |
|
✔ Supported |
✔ Supported |
✔ Supported |
|
✔ Supported |
✔ Supported |
❌ Error |
|
❌ Ignored |
✔ Supported (ephemeral only) |
❌ Ignored |
|
❌ Error |
❌ Error |
❌ Ignored |
|
❌ Error |
❌ Ignored |
❌ Ignored |
|
✔ Supported |
❌ Ignored |
❌ Ignored |
|
✔ Supported (deprecated) |
❌ Ignored |
❌ Ignored |
|
✔ Supported |
❌ Ignored |
❌ Ignored |
|
✔ Supported |
❌ Ignored |
❌ Ignored |
|
✔ Supported |
✔ Supported |
❌ Ignored |
|
✔ Supported |
✔ Only |
❌ Ignored |
|
❌ Ignored |
✔ OpenRouter format |
❌ Ignored |
|
❌ Error |
❌ Ignored |
❌ Ignored |
|
✔ Supported |
❌ Ignored |
❌ Ignored |
|
✔ Supported |
❌ Ignored |
❌ Ignored |
|
✔ Supported |
❌ Ignored |
❌ Ignored |
|
❌ Error (deprecated) |
❌ Error (deprecated) |
❌ Error (deprecated) |
|
✔ Supported (4096 default, 131072 max) |
✔ Supported (4096 default, 131072 max) |
✔ Supported (4096 default, 131072 max) |
|
❌ Ignored |
❌ Ignored |
❌ Ignored |
|
❌ Ignored |
❌ Ignored |
❌ Ignored |
|
✔ Supported |
❌ Ignored |
❌ Ignored |
|
✔ Supported |
❌ Ignored |
❌ Ignored |
|
✔ Supported |
❌ Ignored |
❌ Ignored |
|
✔ Supported |
❌ Ignored |
❌ Ignored |
|
✔ Supported |
❌ Ignored |
❌ Ignored |
|
✔ Supported |
❌ Ignored (use |
❌ Ignored |
|
See sub-fields |
See sub-fields |
See sub-fields |
|
✔ Supported (overrides |
✔ Converted to |
❌ Ignored |
|
❌ Ignored |
✔ Supported |
❌ Ignored |
|
✔ Supported |
✔ Only |
❌ Ignored |
|
❌ Ignored |
❌ Ignored |
❌ Ignored |
|
❌ Error |
❌ Error |
❌ Error |
|
✔ Supported |
❌ Ignored |
❌ Ignored |
|
❌ Error |
❌ Error |
❌ Error |
|
✔ Supported |
✔ Supported |
✔ Supported |
|
See sub-fields |
See sub-fields |
See sub-fields |
|
❌ Ignored |
❌ Ignored |
❌ Ignored |
|
✔ Supported |
✔ Supported |
✔ Supported |
|
✔ Supported |
✔ Supported |
✔ Supported |
|
✔ Supported |
✔ Only |
❌ Ignored |
|
✔ Supported |
✔ Only |
❌ Error |
|
✔ Supported |
❌ Ignored |
❌ Ignored |
|
✔ Supported |
✔ Supported |
✔ Supported |
|
✔ Supported |
❌ Ignored |
❌ Ignored |
|
❌ Error |
❌ Ignored |
❌ Ignored |
Field |
OpenAI Models |
Claude Models |
Other Models |
---|---|---|---|
|
✔ Supported |
✔ Supported |
✔ Supported |
|
✔ Supported |
✔ Supported |
✔ Supported |
|
✔ Supported |
✔ Supported |
✔ Supported |
|
✔ Supported |
✔ Supported |
✔ Supported |
|
See sub-fields |
See sub-fields |
See sub-fields |
|
✔ Supported |
✔ Single choice only |
✔ Single choice only |
|
✔ Supported |
❌ Not supported |
✔ Only |
|
✔ Supported |
❌ Not supported |
❌ Not supported |
|
See sub-fields |
See sub-fields |
See sub-fields |
|
✔ Supported |
✔ Supported |
✔ Supported |
|
✔ Supported |
✔ Supported |
✔ Supported |
|
✔ Supported |
❌ Not supported |
❌ Not supported |
|
❌ Not supported |
❌ Not supported |
❌ Not supported |
|
❌ Not supported |
❌ Not supported |
❌ Not supported |
|
✔ Supported |
❌ Not supported |
❌ Not supported |
|
✔ Supported |
✔ Only |
❌ Not supported |
|
❌ Not supported |
✔ OpenRouter format |
❌ Not supported |
|
See sub-fields |
See sub-fields |
See sub-fields |
|
✔ Supported |
✔ Supported |
✔ Supported |
|
✔ Supported |
❌ Not supported |
❌ Not supported |
|
✔ Supported |
❌ Not supported |
❌ Not supported |
|
✔ Supported |
❌ Not supported |
❌ Not supported |
|
✔ Supported |
✔ Only |
❌ Not supported |
|
❌ Not supported |
✔ OpenRouter format |
❌ Not supported |
|
See sub-fields |
See sub-fields |
See sub-fields |
|
✔ Supported |
✔ Supported |
✔ Supported |
|
✔ Supported |
✔ Supported |
✔ Supported |
|
✔ Supported |
✔ Supported |
✔ Supported |
|
See sub-fields |
See sub-fields |
See sub-fields |
|
❌ Not supported |
❌ Not supported |
❌ Not supported |
|
✔ Only cache reads |
✔ Cache read + write |
❌ Not supported |
|
See sub-fields |
See sub-fields |
See sub-fields |
|
✔ Supported |
❌ Not supported |
❌ Not supported |
|
❌ Not supported |
❌ Not supported |
❌ Not supported |
|
✔ Supported |
❌ Not supported |
❌ Not supported |
|
✔ Supported |
❌ Not supported |
❌ Not supported |
|
✔ Supported |
❌ Not supported |
❌ Not supported |
|
✔ Supported |
❌ Not supported |
❌ Not supported |
Header |
Support |
---|---|
|
✔ Required |
|
✔ Supported ( |
|
✔ Supported ( |
Header |
Support |
---|---|
|
❌ Not supported |
|
❌ Not supported |
|
❌ Not supported |
|
❌ Not supported |
|
❌ Not supported |
|
❌ Not supported |
|
❌ Not supported |
|
❌ Not supported |
|
❌ Not supported |
|
❌ Not supported |
Learn more¶
For a full compendium of usage examples, please consult OpenAI’s Chat Completions API reference or the OpenAI Cookbook.
In addition to providing compatibility with the Chat Completions API, Snowflake supports OpenRouter-compatible features for Claude models. These features are exposed as extra fields on the request.
For prompt caching, use the
cache_control
field. See the OpenRouter prompt caching documentation for more information.For reasoning tokens, use the
reasoning
field. See the OpenRouter reasoning documentation for more information.