Cortex REST API¶

Cortex REST API를 사용하면 원하는 엔드포인트 또는 SDK를 통해 Anthropic, OpenAI, Meta, Mistral 등의 주요 프론티어 모델에 액세스할 수 있습니다. 모든 추론은 Snowflake 경계 내에서 실행되므로 데이터가 거버넌스 경계 내에서 안전하게 유지됩니다. 시작하는 방법은 아래를 참조하세요.

API 선택¶

Cortex REST API는 두 가지 산업 표준 API 사양을 지원합니다. 스택에 가장 적합한 사양을 선택합니다.


	Chat Completions API	Messages API
호환성	OpenAI Chat Completions API	Anthropic 메시지 API
엔드포인트	`/api/v2/cortex/v1/chat/completions`	`/api/v2/cortex/v1/messages`
지원 모델	모든 모델(OpenAI, Claude, Llama, Mistral, DeepSeek, Snowflake)	Claude 모델만 해당
SDK 지원	OpenAI Python 및 JavaScript SDKs	Anthropic Python SDK
적합한 대상	대부분의 사용 사례, 다중 모델 유연성	기존 Anthropic 통합, Anthropic API 패리티

두 APIs 모두 동일한 인증, 모델 카탈로그 및 속도 제한을 공유합니다. 유일한 차이점은 요청 및 응답 형식과 각 엔드포인트가 지원하는 모델입니다. 요금은 `Snowflake Service Consumption Table <https://www.snowflake.com/legal-files/CreditConsumptionTable.pdf>`_을 참조하세요.

Quickstart¶

전제 조건¶

시작하기 전에 다음이 필요합니다.

해당 **Snowflake 계정 URL**(예: https://<account-identifier>.snowflakecomputing.com).
인증을 위한 Snowflake 프로그래밍 방식 액세스 토큰(PAT). 프로그래밍 방식 액세스 토큰 생성하기 섹션을 참조하십시오.
요청에 사용할 모델 이름. 사용 가능한 모델은 모델 가용성 섹션을 참조하세요.

Chat Completions 빠른 시작¶

Chat Completions API는 OpenAI 사양을 따릅니다. OpenAI SDK를 직접 사용할 수 있습니다.

from openai import OpenAI

client = OpenAI(
  api_key="<SNOWFLAKE_PAT>",
  base_url="https://<account-identifier>.snowflakecomputing.com/api/v2/cortex/v1"
)

response = client.chat.completions.create(
  model="claude-sonnet-4-5",
  messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "How does a snowflake get its unique pattern?"}
  ]
)

print(response.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "<SNOWFLAKE_PAT>",
  baseURL: "https://<account-identifier>.snowflakecomputing.com/api/v2/cortex/v1"
});

const response = await client.chat.completions.create({
  model: "claude-sonnet-4-5",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "How does a snowflake get its unique pattern?" }
  ],
});

console.log(response.choices[0].message.content);

curl "https://<account-identifier>.snowflakecomputing.com/api/v2/cortex/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <SNOWFLAKE_PAT>" \
  -d '{
    "model": "claude-sonnet-4-5",
    "messages": [
      {"role": "user", "content": "How does a snowflake get its unique pattern?"}
    ]
  }'

앞의 예제에서 다음을 바꿉니다.

<account-identifier>: 해당 Snowflake 계정 식별자입니다.
<SNOWFLAKE_PAT>: 해당 Snowflake 프로그래밍 방식 액세스 토큰(PAT)입니다.
model: 모델 이름입니다. 지원되는 모델은 모델 가용성 섹션을 참조하세요.

Messages API 빠른 시작¶

Messages API는 Anthropic 사양을 따르며 Claude 모델만 지원합니다.

Anthropic SDK는 기본적으로 x-api-key``를 통해 자격 증명을 보내지만, Snowflake는 ``Bearer 토큰을 예상합니다. httpx 클라이언트를 사용하여 올바른 인증 헤더를 설정합니다.

import httpx
import anthropic

PAT = "<SNOWFLAKE_PAT>"

http_client = httpx.Client(
  headers={"Authorization": f"Bearer {PAT}"},
)

client = anthropic.Anthropic(
  api_key="not-used",
  base_url="https://<account-identifier>.snowflakecomputing.com/api/v2/cortex",
  http_client=http_client,
  default_headers={"Authorization": f"Bearer {PAT}"},
)

response = client.messages.create(
  model="claude-sonnet-4-5",
  max_tokens=1024,
  messages=[
    {"role": "user", "content": "How does a snowflake get its unique pattern?"}
  ],
)

print(response.content[0].text)

Python과 마찬가지로 ``defaultHeaders``를 통해 ``Bearer``로 기본 인증 헤더를 재정의합니다.

import Anthropic from "@anthropic-ai/sdk";

const PAT = "<SNOWFLAKE_PAT>";

const client = new Anthropic({
  apiKey: "not-used",
  baseURL: "https://<account-identifier>.snowflakecomputing.com/api/v2/cortex",
  defaultHeaders: {
    "Authorization": `Bearer ${PAT}`,
  },
});

const response = await client.messages.create({
  model: "claude-sonnet-4-5",
  max_tokens: 1024,
  messages: [
    { role: "user", content: "How does a snowflake get its unique pattern?" }
  ],
});

console.log(response.content[0].text);

curl "https://<account-identifier>.snowflakecomputing.com/api/v2/cortex/v1/messages" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <SNOWFLAKE_PAT>" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4-5",
    "max_tokens": 1024,
    "messages": [
      {"role": "user", "content": "How does a snowflake get its unique pattern?"}
    ]
  }'

앞의 예제에서 다음을 바꿉니다.

<account-identifier>: 해당 Snowflake 계정 식별자입니다.
<SNOWFLAKE_PAT>: 해당 Snowflake 프로그래밍 방식 액세스 토큰(PAT)입니다.
model: Claude 모델 이름입니다. 지원되는 모델은 모델 가용성 섹션을 참조하세요.

인증 설정하기¶

Cortex REST API 에 인증하려면 Snowflake를 사용하여 Snowflake REST APIs 인증하기 에 설명된 방법을 사용할 수 있습니다.

토큰을 포함할 Authorization 헤더(예: JSON 웹 토큰(JWT), OAuth 토큰 또는 프로그래밍 방식 액세스 토큰)를 설정합니다.

팁

Cortex REST API 요청을 위한 전용 사용자 생성을 고려하십시오.

승인 설정하기¶

REST API 요청을 보내려면 기본 역할에 SNOWFLAKE.CORTEX_USER 데이터베이스 역할이 부여되어야 합니다. 대부분의 경우 SNOWFLAKE.CORTEX_USER 역할은 PUBLIC 역할에 자동으로 부여되고 모든 역할은 PUBLIC을 상속하기 때문에 사용자는 이미 이 권한을 가지고 있습니다.

Snowflake 관리자가 이 권한 부여를 취소한 경우 다시 부여해야 합니다.

GRANT DATABASE ROLE SNOWFLAKE.CORTEX_USER TO ROLE my_role;
GRANT ROLE my_role TO USER my_user;

중요

REST API 요청은 사용자의 기본 역할을 사용하므로 해당 역할에는 필요한 권한이 있어야 합니다. ALTER USER … SET DEFAULT_ROLE 을 사용하여 사용자의 기본 역할을 변경할 수 있습니다.

ALTER USER my_user SET DEFAULT_ROLE=my_role

모델 가용성¶

다음 테이블에서는 각 리전에 대해 Cortex REST API에서 사용할 수 있는 모델을 보여줍니다.


모델	클라우드 간 (모든 리전)	AWS 전역 (리전 간)	AWS US (리전 간)	AWS EU (리전 간)	AWS APJ (리전 간)	Azure 전역 (리전 간)	Azure US (리전 간)
`claude-sonnet-4-6`	✔	✔	✔	✔
`claude-opus-4-6`	✔	✔
`claude-sonnet-4-5`	✔	✔	✔	✔
`claude-opus-4-5`	✔	✔
`claude-haiku-4-5`	✔	✔	✔
`claude-4-sonnet`	✔	✔	✔	✔
`claude-4-opus`	✔	✔
`claude-3-7-sonnet`	✔	✔
`claude-3-5-sonnet`	✔	✔
`openai-gpt-4.1`	✔				✔
`openai-gpt-5`	*				*	*	*
`openai-gpt-5-mini`	*					*
`openai-gpt-5-nano`	*					*
`openai-gpt-5-chat`	✔
`openai-gpt-oss-120b`	*
`llama4-maverick`	✔	✔
`llama3.1-8b`	✔	✔
`llama3.1-70b`	✔	✔
`llama3.1-405b`	✔	✔
`deepseek-r1`	✔	✔
`mistral-7b`	✔	✔
`mistral-large`	✔	✔
`mistral-large2`	✔	✔
`snowflake-llama-3.3-70b`	✔	✔


모델	AWS US 서부 2 (오리건)	AWS US 동부 1 (북부 버지니아)	Azure 동부 US 2 (버지니아)
`claude-3-5-sonnet`	✔	✔
`llama4-maverick`	✔
`llama3.1-8b`	✔	✔	✔
`llama3.1-70b`	✔	✔	✔
`llama3.1-405b`	✔	✔	✔
`deepseek-r1`	✔
`mistral-7b`	✔	✔	✔
`mistral-large`	✔	✔	✔
`mistral-large2`	✔	✔	✔
`snowflake-llama-3.3-70b`	✔


모델	AWS 유럽 중부 1 (프랑크푸르트)	AWS 유럽 서부 1 (아일랜드)	Azure 서유럽 (네덜란드)
`llama3.1-8b`	✔		✔
`llama3.1-70b`	✔	✔	✔
`mistral-7b`	✔		✔
`mistral-large`	✔		✔
`mistral-large2`	✔	✔	✔


모델	AWS AP 동남부 2 (시드니)	AWS AP 북동부 1 (도쿄)
`llama3.1-8b`	✔	✔
`llama3.1-70b`	✔	✔
`mistral-7b`		✔
`mistral-large`		✔
`mistral-large2`	✔	✔

* 미리 보기 함수 또는 모델을 나타냅니다. 미리 보기 기능은 프로덕션 워크로드에 적합하지 않습니다.

지원되는 모든 리전에서 미세 조정된 모델을 사용할 수도 있습니다.

기능¶

스트리밍¶

두 APIs 모두 `서버에서 전송한 이벤트 <https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events>`_를 사용한 스트리밍 응답을 지원합니다.

Chat Completions 스트리밍¶

from openai import OpenAI

client = OpenAI(
  api_key="<SNOWFLAKE_PAT>",
  base_url="https://<account-identifier>.snowflakecomputing.com/api/v2/cortex/v1"
)

response = client.chat.completions.create(
  model="claude-sonnet-4-5",
  messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "How does a snowflake get its unique pattern?"}
  ],
  stream=True
)

for chunk in response:
    print(chunk.choices[0].delta.content, end="", flush=True)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "<SNOWFLAKE_PAT>",
  baseURL: "https://<account-identifier>.snowflakecomputing.com/api/v2/cortex/v1"
});

const stream = await client.chat.completions.create({
  model: "claude-sonnet-4-5",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "How does a snowflake get its unique pattern?" }
  ],
  stream: true,
});

for await (const event of stream) {
  process.stdout.write(event.choices[0]?.delta?.content || "");
}

curl "https://<account-identifier>.snowflakecomputing.com/api/v2/cortex/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <SNOWFLAKE_PAT>" \
  -d '{
    "model": "claude-sonnet-4-5",
    "messages": [
      {"role": "user", "content": "How does a snowflake get its unique pattern?"}
    ],
    "stream": true,
    "stream_options": {
      "include_usage": true
    }
  }'

Messages API 스트리밍¶

import httpx
import anthropic

PAT = "<SNOWFLAKE_PAT>"

http_client = httpx.Client(
  headers={"Authorization": f"Bearer {PAT}"},
)

client = anthropic.Anthropic(
  api_key="not-used",
  base_url="https://<account-identifier>.snowflakecomputing.com/api/v2/cortex",
  http_client=http_client,
  default_headers={"Authorization": f"Bearer {PAT}"},
)

with client.messages.stream(
  model="claude-sonnet-4-5",
  max_tokens=1024,
  messages=[
    {"role": "user", "content": "How does a snowflake get its unique pattern?"}
  ],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

import Anthropic from "@anthropic-ai/sdk";

const PAT = "<SNOWFLAKE_PAT>";

const client = new Anthropic({
  apiKey: "not-used",
  baseURL: "https://<account-identifier>.snowflakecomputing.com/api/v2/cortex",
  defaultHeaders: {
    "Authorization": `Bearer ${PAT}`,
  },
});

const stream = client.messages.stream({
  model: "claude-sonnet-4-5",
  max_tokens: 1024,
  messages: [
    { role: "user", content: "How does a snowflake get its unique pattern?" }
  ],
});

for await (const event of stream) {
  if (event.type === "content_block_delta" && event.delta.type === "text_delta") {
    process.stdout.write(event.delta.text);
  }
}

curl "https://<account-identifier>.snowflakecomputing.com/api/v2/cortex/v1/messages" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <SNOWFLAKE_PAT>" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4-5",
    "max_tokens": 1024,
    "stream": true,
    "messages": [
      {"role": "user", "content": "How does a snowflake get its unique pattern?"}
    ]
  }'

도구 호출¶

도구 호출을 통해 모델은 대화 중에 외부 함수를 호출할 수 있습니다. 흐름은 다음과 같이 단계적으로 작동합니다.

사용 가능한 도구 목록과 함께 요청을 보냅니다.
모델은 하나 이상의 도구를 호출하기로 결정하고 도구 이름과 인자를 반환합니다.
사용자는 사용자 측에서 도구를 실행합니다.
도구 결과를 다시 보내면 모델이 최종 응답을 생성합니다.

도구 호출은 OpenAI 및 Claude 모델에 대해 지원됩니다.

Chat Completions 도구 호출¶

import json
from openai import OpenAI

client = OpenAI(
  api_key="<SNOWFLAKE_PAT>",
  base_url="https://<account-identifier>.snowflakecomputing.com/api/v2/cortex/v1"
)

tools = [
  {
    "type": "function",
    "function": {
      "name": "get_weather",
      "description": "Get the current weather for a location",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {
            "type": "string",
            "description": "The city and state, e.g. San Francisco, CA"
          }
        },
        "required": ["location"]
      }
    }
  }
]

messages = [
  {"role": "user", "content": "What is the weather like in San Francisco?"}
]

# Step 1: Send the request with tools
response = client.chat.completions.create(
  model="claude-sonnet-4-5",
  messages=messages,
  tools=tools,
)

# Step 2: The model responds with tool_calls
message = response.choices[0].message

if message.tool_calls:
    tool_call = message.tool_calls[0]

    # Step 3: Execute the tool (your implementation)
    result = json.dumps({"temperature": "69°F", "condition": "sunny"})

    # Step 4: Send the tool result back
    messages.append(message)
    messages.append({
      "role": "tool",
      "tool_call_id": tool_call.id,
      "content": result,
    })

    final_response = client.chat.completions.create(
      model="claude-sonnet-4-5",
      messages=messages,
      tools=tools,
    )

    print(final_response.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "<SNOWFLAKE_PAT>",
  baseURL: "https://<account-identifier>.snowflakecomputing.com/api/v2/cortex/v1"
});

const tools = [
  {
    type: "function",
    function: {
      name: "get_weather",
      description: "Get the current weather for a location",
      parameters: {
        type: "object",
        properties: {
          location: {
            type: "string",
            description: "The city and state, e.g. San Francisco, CA"
          }
        },
        required: ["location"]
      }
    }
  }
];

const messages = [
  { role: "user", content: "What is the weather like in San Francisco?" }
];

// Step 1: Send the request with tools
const response = await client.chat.completions.create({
  model: "claude-sonnet-4-5",
  messages,
  tools,
});

// Step 2: The model responds with tool_calls
const message = response.choices[0].message;

if (message.tool_calls) {
  const toolCall = message.tool_calls[0];

  // Step 3: Execute the tool (your implementation)
  const result = JSON.stringify({ temperature: "69°F", condition: "sunny" });

  // Step 4: Send the tool result back
  messages.push(message);
  messages.push({
    role: "tool",
    tool_call_id: toolCall.id,
    content: result,
  });

  const finalResponse = await client.chat.completions.create({
    model: "claude-sonnet-4-5",
    messages,
    tools,
  });

  console.log(finalResponse.choices[0].message.content);
}

1단계 - 도구를 사용하여 요청 보내기:

curl "https://<account-identifier>.snowflakecomputing.com/api/v2/cortex/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <SNOWFLAKE_PAT>" \
  -d '{
    "model": "claude-sonnet-4-5",
    "messages": [
      {"role": "user", "content": "What is the weather like in San Francisco?"}
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get the current weather for a location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {
                "type": "string",
                "description": "The city and state, e.g. San Francisco, CA"
              }
            },
            "required": ["location"]
          }
        }
      }
    ]
  }'

모델은 tool_calls 배열로 응답합니다.

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "tool_calls": [
          {
            "id": "call_abc123",
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\"location\": \"San Francisco, CA\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ]
}

2단계 — 도구 실행 및 결과 다시 보내기:

curl "https://<account-identifier>.snowflakecomputing.com/api/v2/cortex/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <SNOWFLAKE_PAT>" \
  -d '{
    "model": "claude-sonnet-4-5",
    "messages": [
      {"role": "user", "content": "What is the weather like in San Francisco?"},
      {
        "role": "assistant",
        "tool_calls": [
          {
            "id": "call_abc123",
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\"location\": \"San Francisco, CA\"}"
            }
          }
        ]
      },
      {
        "role": "tool",
        "tool_call_id": "call_abc123",
        "content": "{\"temperature\": \"69°F\", \"condition\": \"sunny\"}"
      }
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get the current weather for a location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {
                "type": "string",
                "description": "The city and state, e.g. San Francisco, CA"
              }
            },
            "required": ["location"]
          }
        }
      }
    ]
  }'

Messages API 도구 호출¶

import json
import httpx
import anthropic

PAT = "<SNOWFLAKE_PAT>"

http_client = httpx.Client(
  headers={"Authorization": f"Bearer {PAT}"},
)

client = anthropic.Anthropic(
  api_key="not-used",
  base_url="https://<account-identifier>.snowflakecomputing.com/api/v2/cortex",
  http_client=http_client,
  default_headers={"Authorization": f"Bearer {PAT}"},
)

tools = [
  {
    "name": "get_weather",
    "description": "Get the current weather for a location",
    "input_schema": {
      "type": "object",
      "properties": {
        "location": {
          "type": "string",
          "description": "The city and state, e.g. San Francisco, CA"
        }
      },
      "required": ["location"]
    }
  }
]

messages = [
  {"role": "user", "content": "What is the weather like in San Francisco?"}
]

# Step 1: Send the request with tools
response = client.messages.create(
  model="claude-sonnet-4-5",
  max_tokens=1024,
  messages=messages,
  tools=tools,
)

# Step 2: The model responds with a tool_use block
if response.stop_reason == "tool_use":
    tool_use = next(b for b in response.content if b.type == "tool_use")

    # Step 3: Execute the tool (your implementation)
    result = json.dumps({"temperature": "69°F", "condition": "sunny"})

    # Step 4: Send the tool result back
    messages.append({"role": "assistant", "content": response.content})
    messages.append({
      "role": "user",
      "content": [
        {
          "type": "tool_result",
          "tool_use_id": tool_use.id,
          "content": result,
        }
      ],
    })

    final_response = client.messages.create(
      model="claude-sonnet-4-5",
      max_tokens=1024,
      messages=messages,
      tools=tools,
    )

    print(final_response.content[0].text)

import Anthropic from "@anthropic-ai/sdk";

const PAT = "<SNOWFLAKE_PAT>";

const client = new Anthropic({
  apiKey: "not-used",
  baseURL: "https://<account-identifier>.snowflakecomputing.com/api/v2/cortex",
  defaultHeaders: {
    "Authorization": `Bearer ${PAT}`,
  },
});

const tools = [
  {
    name: "get_weather",
    description: "Get the current weather for a location",
    input_schema: {
      type: "object",
      properties: {
        location: {
          type: "string",
          description: "The city and state, e.g. San Francisco, CA"
        }
      },
      required: ["location"]
    }
  }
];

const messages = [
  { role: "user", content: "What is the weather like in San Francisco?" }
];

// Step 1: Send the request with tools
const response = await client.messages.create({
  model: "claude-sonnet-4-5",
  max_tokens: 1024,
  messages,
  tools,
});

// Step 2: The model responds with a tool_use block
if (response.stop_reason === "tool_use") {
  const toolUse = response.content.find(b => b.type === "tool_use");

  // Step 3: Execute the tool (your implementation)
  const result = JSON.stringify({ temperature: "69°F", condition: "sunny" });

  // Step 4: Send the tool result back
  messages.push({ role: "assistant", content: response.content });
  messages.push({
    role: "user",
    content: [
      {
        type: "tool_result",
        tool_use_id: toolUse.id,
        content: result,
      }
    ],
  });

  const finalResponse = await client.messages.create({
    model: "claude-sonnet-4-5",
    max_tokens: 1024,
    messages,
    tools,
  });

  console.log(finalResponse.content[0].text);
}

1단계 - 도구를 사용하여 요청 보내기:

curl "https://<account-identifier>.snowflakecomputing.com/api/v2/cortex/v1/messages" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <SNOWFLAKE_PAT>" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4-5",
    "max_tokens": 1024,
    "messages": [
      {"role": "user", "content": "What is the weather like in San Francisco?"}
    ],
    "tools": [
      {
        "name": "get_weather",
        "description": "Get the current weather for a location",
        "input_schema": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "The city and state, e.g. San Francisco, CA"
            }
          },
          "required": ["location"]
        }
      }
    ]
  }'

모델은 tool_use 콘텐츠 블록으로 응답합니다.

{
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "I'll check the weather for you."
    },
    {
      "type": "tool_use",
      "id": "toolu_abc123",
      "name": "get_weather",
      "input": {"location": "San Francisco, CA"}
    }
  ],
  "stop_reason": "tool_use"
}

2단계 — 도구 실행 및 결과 다시 보내기:

curl "https://<account-identifier>.snowflakecomputing.com/api/v2/cortex/v1/messages" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <SNOWFLAKE_PAT>" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4-5",
    "max_tokens": 1024,
    "messages": [
      {"role": "user", "content": "What is the weather like in San Francisco?"},
      {
        "role": "assistant",
        "content": [
          {"type": "text", "text": "I'\''ll check the weather for you."},
          {
            "type": "tool_use",
            "id": "toolu_abc123",
            "name": "get_weather",
            "input": {"location": "San Francisco, CA"}
          }
        ]
      },
      {
        "role": "user",
        "content": [
          {
            "type": "tool_result",
            "tool_use_id": "toolu_abc123",
            "content": "{\"temperature\": \"69°F\", \"condition\": \"sunny\"}"
          }
        ]
      }
    ],
    "tools": [
      {
        "name": "get_weather",
        "description": "Get the current weather for a location",
        "input_schema": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "The city and state, e.g. San Francisco, CA"
            }
          },
          "required": ["location"]
        }
      }
    ]
  }'

정형 출력¶

특정 스키마를 준수하는 정형 JSON 출력을 요청할 수 있습니다. 이는 Chat Completions API를 통해 OpenAI 및 Claude 모델에 대해 지원됩니다. Messages API의 경우 tool_use 패턴을 사용하여 정형 출력을 적용합니다.

Chat Completions 정형 출력¶

response_format 필드를 JSON 스키마와 사용하여 모델의 출력을 제한합니다.

import json
from openai import OpenAI

client = OpenAI(
  api_key="<SNOWFLAKE_PAT>",
  base_url="https://<account-identifier>.snowflakecomputing.com/api/v2/cortex/v1"
)

response = client.chat.completions.create(
  model="claude-sonnet-4-5",
  messages=[
    {"role": "user", "content": "Create a dataset of 3 people with their names and ages."}
  ],
  response_format={
    "type": "json_schema",
    "json_schema": {
      "name": "people_data",
      "schema": {
        "type": "object",
        "properties": {
          "people": {
            "type": "array",
            "items": {
              "type": "object",
              "properties": {
                "name": {"type": "string"},
                "age": {"type": "number"}
              },
              "required": ["name", "age"]
            }
          }
        },
        "required": ["people"]
      }
    }
  }
)

data = json.loads(response.choices[0].message.content)
print(data)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "<SNOWFLAKE_PAT>",
  baseURL: "https://<account-identifier>.snowflakecomputing.com/api/v2/cortex/v1"
});

const response = await client.chat.completions.create({
  model: "claude-sonnet-4-5",
  messages: [
    { role: "user", content: "Create a dataset of 3 people with their names and ages." }
  ],
  response_format: {
    type: "json_schema",
    json_schema: {
      name: "people_data",
      schema: {
        type: "object",
        properties: {
          people: {
            type: "array",
            items: {
              type: "object",
              properties: {
                name: { type: "string" },
                age: { type: "number" }
              },
              required: ["name", "age"]
            }
          }
        },
        required: ["people"]
      }
    }
  }
});

const data = JSON.parse(response.choices[0].message.content);
console.log(data);

curl "https://<account-identifier>.snowflakecomputing.com/api/v2/cortex/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <SNOWFLAKE_PAT>" \
  -d '{
    "model": "claude-sonnet-4-5",
    "messages": [
      {"role": "user", "content": "Create a dataset of 3 people with their names and ages."}
    ],
    "response_format": {
      "type": "json_schema",
      "json_schema": {
        "name": "people_data",
        "schema": {
          "type": "object",
          "properties": {
            "people": {
              "type": "array",
              "items": {
                "type": "object",
                "properties": {
                  "name": {"type": "string"},
                  "age": {"type": "number"}
                },
                "required": ["name", "age"]
              }
            }
          },
          "required": ["people"]
        }
      }
    }
  }'

참고

Claude 모델은 응답 형식 유형으로 ``json_schema``만 지원합니다. OpenAI 모델은 `OpenAI API 참조 <https://platform.openai.com/docs/api-reference/chat/create>`_에 설명된 대로 추가 응답 형식 유형을 지원합니다.

Messages API 정형 출력¶

Messages API 에는 response_format 필드가 없습니다. 대신, 원하는 출력 스키마로 도구를 정의하고 모델에 이를 사용하도록 지시합니다. 모델의 tool_use 응답에는 스키마와 일치하는 정형 JSON이 포함됩니다.

import json
import httpx
import anthropic

PAT = "<SNOWFLAKE_PAT>"

http_client = httpx.Client(
  headers={"Authorization": f"Bearer {PAT}"},
)

client = anthropic.Anthropic(
  api_key="not-used",
  base_url="https://<account-identifier>.snowflakecomputing.com/api/v2/cortex",
  http_client=http_client,
  default_headers={"Authorization": f"Bearer {PAT}"},
)

response = client.messages.create(
  model="claude-sonnet-4-5",
  max_tokens=1024,
  messages=[
    {"role": "user", "content": "Create a dataset of 3 people with their names and ages."}
  ],
  tools=[
    {
      "name": "people_data",
      "description": "Output a list of people with names and ages.",
      "input_schema": {
        "type": "object",
        "properties": {
          "people": {
            "type": "array",
            "items": {
              "type": "object",
              "properties": {
                "name": {"type": "string"},
                "age": {"type": "number"}
              },
              "required": ["name", "age"]
            }
          }
        },
        "required": ["people"]
      }
    }
  ],
  tool_choice={"type": "tool", "name": "people_data"},
)

# Extract the structured data from the tool_use block
tool_use = next(b for b in response.content if b.type == "tool_use")
print(tool_use.input)

import Anthropic from "@anthropic-ai/sdk";

const PAT = "<SNOWFLAKE_PAT>";

const client = new Anthropic({
  apiKey: "not-used",
  baseURL: "https://<account-identifier>.snowflakecomputing.com/api/v2/cortex",
  defaultHeaders: {
    "Authorization": `Bearer ${PAT}`,
  },
});

const response = await client.messages.create({
  model: "claude-sonnet-4-5",
  max_tokens: 1024,
  messages: [
    { role: "user", content: "Create a dataset of 3 people with their names and ages." }
  ],
  tools: [
    {
      name: "people_data",
      description: "Output a list of people with names and ages.",
      input_schema: {
        type: "object",
        properties: {
          people: {
            type: "array",
            items: {
              type: "object",
              properties: {
                name: { type: "string" },
                age: { type: "number" }
              },
              required: ["name", "age"]
            }
          }
        },
        required: ["people"]
      }
    }
  ],
  tool_choice: { type: "tool", name: "people_data" },
});

// Extract the structured data from the tool_use block
const toolUse = response.content.find(b => b.type === "tool_use");
console.log(toolUse.input);

curl "https://<account-identifier>.snowflakecomputing.com/api/v2/cortex/v1/messages" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <SNOWFLAKE_PAT>" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4-5",
    "max_tokens": 1024,
    "messages": [
      {"role": "user", "content": "Create a dataset of 3 people with their names and ages."}
    ],
    "tools": [
      {
        "name": "people_data",
        "description": "Output a list of people with names and ages.",
        "input_schema": {
          "type": "object",
          "properties": {
            "people": {
              "type": "array",
              "items": {
                "type": "object",
                "properties": {
                  "name": {"type": "string"},
                  "age": {"type": "number"}
                },
                "required": ["name", "age"]
              }
            }
          },
          "required": ["people"]
        }
      }
    ],
    "tool_choice": {"type": "tool", "name": "people_data"}
  }'

이미지 입력¶

비전을 지원하는 모델에 대한 요청에 이미지를 포함할 수 있습니다. 이미지는 base64로 인코딩된 문자열로 제공해야 합니다. 이미지는 20MiB로 최대 요청 크기가 제한되며 대화당 20개로 제한됩니다.

이미지 입력은 다음에 대해 지원됩니다.

Claude 모델(claude-3-7-sonnet 이상)
OpenAI 모델(openai-gpt-4.1, openai-gpt-5, openai-gpt-5-chat, openai-gpt-5-mini, openai-gpt-5-nano)

Chat Completions 이미지 입력¶

import base64
from openai import OpenAI

client = OpenAI(
  api_key="<SNOWFLAKE_PAT>",
  base_url="https://<account-identifier>.snowflakecomputing.com/api/v2/cortex/v1"
)

# Read and encode an image file
with open("image.png", "rb") as f:
    image_data = base64.b64encode(f.read()).decode("utf-8")

response = client.chat.completions.create(
  model="claude-sonnet-4-5",
  messages=[
    {
      "role": "user",
      "content": [
        {
          "type": "image_url",
          "image_url": {
            "url": f"data:image/png;base64,{image_data}"
          }
        },
        {
          "type": "text",
          "text": "What is in this image?"
        }
      ]
    }
  ]
)

print(response.choices[0].message.content)

import OpenAI from "openai";
import fs from "fs";

const client = new OpenAI({
  apiKey: "<SNOWFLAKE_PAT>",
  baseURL: "https://<account-identifier>.snowflakecomputing.com/api/v2/cortex/v1"
});

// Read and encode an image file
const imageData = fs.readFileSync("image.png").toString("base64");

const response = await client.chat.completions.create({
  model: "claude-sonnet-4-5",
  messages: [
    {
      role: "user",
      content: [
        {
          type: "image_url",
          image_url: {
            url: `data:image/png;base64,${imageData}`
          }
        },
        {
          type: "text",
          text: "What is in this image?"
        }
      ]
    }
  ],
});

console.log(response.choices[0].message.content);

curl "https://<account-identifier>.snowflakecomputing.com/api/v2/cortex/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <SNOWFLAKE_PAT>" \
  -d '{
    "model": "claude-sonnet-4-5",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "image_url",
            "image_url": {
              "url": "data:image/png;base64,<BASE64_IMAGE_DATA>"
            }
          },
          {
            "type": "text",
            "text": "What is in this image?"
          }
        ]
      }
    ]
  }'

Messages API 이미지 입력¶

Messages API는 데이터 URL 대신 type, media_type 및 data 데이터 필드가 있는 source 블록과 같은 다른 이미지 형식을 사용합니다.

import base64
import httpx
import anthropic

PAT = "<SNOWFLAKE_PAT>"

http_client = httpx.Client(
  headers={"Authorization": f"Bearer {PAT}"},
)

client = anthropic.Anthropic(
  api_key="not-used",
  base_url="https://<account-identifier>.snowflakecomputing.com/api/v2/cortex",
  http_client=http_client,
  default_headers={"Authorization": f"Bearer {PAT}"},
)

# Read and encode an image file
with open("image.png", "rb") as f:
    image_data = base64.b64encode(f.read()).decode("utf-8")

response = client.messages.create(
  model="claude-sonnet-4-5",
  max_tokens=1024,
  messages=[
    {
      "role": "user",
      "content": [
        {
          "type": "image",
          "source": {
            "type": "base64",
            "media_type": "image/png",
            "data": image_data
          }
        },
        {
          "type": "text",
          "text": "What is in this image?"
        }
      ]
    }
  ],
)

print(response.content[0].text)

import Anthropic from "@anthropic-ai/sdk";
import fs from "fs";

const PAT = "<SNOWFLAKE_PAT>";

const client = new Anthropic({
  apiKey: "not-used",
  baseURL: "https://<account-identifier>.snowflakecomputing.com/api/v2/cortex",
  defaultHeaders: {
    "Authorization": `Bearer ${PAT}`,
  },
});

// Read and encode an image file
const imageData = fs.readFileSync("image.png").toString("base64");

const response = await client.messages.create({
  model: "claude-sonnet-4-5",
  max_tokens: 1024,
  messages: [
    {
      role: "user",
      content: [
        {
          type: "image",
          source: {
            type: "base64",
            media_type: "image/png",
            data: imageData
          }
        },
        {
          type: "text",
          text: "What is in this image?"
        }
      ]
    }
  ],
});

console.log(response.content[0].text);

curl "https://<account-identifier>.snowflakecomputing.com/api/v2/cortex/v1/messages" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <SNOWFLAKE_PAT>" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4-5",
    "max_tokens": 1024,
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "image",
            "source": {
              "type": "base64",
              "media_type": "image/png",
              "data": "<BASE64_IMAGE_DATA>"
            }
          },
          {
            "type": "text",
            "text": "What is in this image?"
          }
        ]
      }
    ]
  }'

프롬프트 캐싱¶

프롬프트 캐싱을 사용하면 요청 전체에서 이전에 처리된 컨텍스트(예: 대규모 시스템 프롬프트, 문서 또는 대화 기록)를 재사용하여 대기 시간과 비용을 줄일 수 있습니다.

OpenAI 모델: 캐싱은 **암시적**입니다. 1,024개 이상의 토큰이 포함된 프롬프트는 자동으로 캐시되므로 요청을 변경할 필요가 없습니다.
Claude 모델: 캐싱은 **명시적**입니다. 캐시하려는 콘텐츠 블록에 cache_control 중단점을 추가합니다. ephemeral 캐시 유형만 지원되며 **5분의 TTL**이 적용됩니다. 요청당 최대 4개의 캐시 중단점을 설정할 수 있습니다.

Chat Completions 프롬프트 캐싱¶

Chat Completions를 통한 Claude 모델의 경우 콘텐츠 블록에 ``cache_control``을 추가합니다. OpenAI 모델은 자동으로 캐시되며 이 필드가 필요하지 않습니다.

from openai import OpenAI

client = OpenAI(
  api_key="<SNOWFLAKE_PAT>",
  base_url="https://<account-identifier>.snowflakecomputing.com/api/v2/cortex/v1"
)

response = client.chat.completions.create(
  model="claude-sonnet-4-5",
  messages=[
    {
      "role": "system",
      "content": [
        {
          "type": "text",
          "text": "<long system prompt to cache>",
          "cache_control": {"type": "ephemeral"}
        }
      ]
    },
    {"role": "user", "content": "Summarize the key points."}
  ]
)

print(response.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "<SNOWFLAKE_PAT>",
  baseURL: "https://<account-identifier>.snowflakecomputing.com/api/v2/cortex/v1"
});

const response = await client.chat.completions.create({
  model: "claude-sonnet-4-5",
  messages: [
    {
      role: "system",
      content: [
        {
          type: "text",
          text: "<long system prompt to cache>",
          cache_control: { type: "ephemeral" }
        }
      ]
    },
    { role: "user", content: "Summarize the key points." }
  ],
});

console.log(response.choices[0].message.content);

curl "https://<account-identifier>.snowflakecomputing.com/api/v2/cortex/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <SNOWFLAKE_PAT>" \
  -d '{
    "model": "claude-sonnet-4-5",
    "messages": [
      {
        "role": "system",
        "content": [
          {
            "type": "text",
            "text": "<long system prompt to cache>",
            "cache_control": {"type": "ephemeral"}
          }
        ]
      },
      {"role": "user", "content": "Summarize the key points."}
    ]
  }'

Messages API 프롬프트 캐싱¶

시스템 또는 사용자 콘텐츠 블록에서 cache_control``을 사용합니다. ``ephemeral 캐시 유형만 지원되며 5분의 TTL이 적용됩니다. 요청당 최대 4개의 캐시 중단점을 설정할 수 있습니다.

import httpx
import anthropic

PAT = "<SNOWFLAKE_PAT>"

http_client = httpx.Client(
  headers={"Authorization": f"Bearer {PAT}"},
)

client = anthropic.Anthropic(
  api_key="not-used",
  base_url="https://<account-identifier>.snowflakecomputing.com/api/v2/cortex",
  http_client=http_client,
  default_headers={"Authorization": f"Bearer {PAT}"},
)

response = client.messages.create(
  model="claude-sonnet-4-5",
  max_tokens=1024,
  system=[
    {
      "type": "text",
      "text": "<long system prompt to cache>",
      "cache_control": {"type": "ephemeral"}
    }
  ],
  messages=[
    {"role": "user", "content": "Summarize the key points."}
  ],
)

print(response.content[0].text)

import Anthropic from "@anthropic-ai/sdk";

const PAT = "<SNOWFLAKE_PAT>";

const client = new Anthropic({
  apiKey: "not-used",
  baseURL: "https://<account-identifier>.snowflakecomputing.com/api/v2/cortex",
  defaultHeaders: {
    "Authorization": `Bearer ${PAT}`,
  },
});

const response = await client.messages.create({
  model: "claude-sonnet-4-5",
  max_tokens: 1024,
  system: [
    {
      type: "text",
      text: "<long system prompt to cache>",
      cache_control: { type: "ephemeral" }
    }
  ],
  messages: [
    { role: "user", content: "Summarize the key points." }
  ],
});

console.log(response.content[0].text);

curl "https://<account-identifier>.snowflakecomputing.com/api/v2/cortex/v1/messages" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <SNOWFLAKE_PAT>" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4-5",
    "max_tokens": 1024,
    "system": [
      {
        "type": "text",
        "text": "<long system prompt to cache>",
        "cache_control": {"type": "ephemeral"}
      }
    ],
    "messages": [
      {"role": "user", "content": "Summarize the key points."}
    ]
  }'

참고

Anthropic 프롬프트 캐싱에는 **5분의 TTL**이 적용됩니다. 5분 이내에 액세스하지 않은 캐시된 콘텐츠는 제거됩니다. OpenAI 프롬프트 캐싱은 암시적이며 자동으로 관리되고 cache_control 필드가 필요하지 않습니다.

사고와 추론¶

Chat Completions 사고¶

Claude 모델의 경우 reasoning 오브젝트를 사용합니다. OpenAI 추론 모델의 경우 reasoning_effort 필드(값: minimal, low, medium, high)를 사용합니다.

from openai import OpenAI

client = OpenAI(
  api_key="<SNOWFLAKE_PAT>",
  base_url="https://<account-identifier>.snowflakecomputing.com/api/v2/cortex/v1"
)

# Claude models — use the reasoning object
response = client.chat.completions.create(
  model="claude-sonnet-4-5",
  messages=[
    {"role": "user", "content": "Are there an infinite number of prime numbers such that n mod 4 == 3?"}
  ],
  extra_body={
    "reasoning": {"effort": "high"}
  }
)

print(response.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "<SNOWFLAKE_PAT>",
  baseURL: "https://<account-identifier>.snowflakecomputing.com/api/v2/cortex/v1"
});

// Claude models — use the reasoning object
const response = await client.chat.completions.create({
  model: "claude-sonnet-4-5",
  messages: [
    { role: "user", content: "Are there an infinite number of prime numbers such that n mod 4 == 3?" }
  ],
  reasoning: { effort: "high" },
});

console.log(response.choices[0].message.content);

# Claude models — use the reasoning object
curl "https://<account-identifier>.snowflakecomputing.com/api/v2/cortex/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <SNOWFLAKE_PAT>" \
  -d '{
    "model": "claude-sonnet-4-5",
    "messages": [
      {"role": "user", "content": "Are there an infinite number of prime numbers such that n mod 4 == 3?"}
    ],
    "reasoning": {
      "effort": "high"
    }
  }'

# OpenAI reasoning models — use reasoning_effort
curl "https://<account-identifier>.snowflakecomputing.com/api/v2/cortex/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <SNOWFLAKE_PAT>" \
  -d '{
    "model": "openai-gpt-5",
    "messages": [
      {"role": "user", "content": "Are there an infinite number of prime numbers such that n mod 4 == 3?"}
    ],
    "reasoning_effort": "high"
  }'

Messages API 사고¶

일부 Claude 모델은 모델이 작업 복잡성에 따라 적용되는 추론의 양을 조정하는 **적응적 사고**를 지원합니다. 적응적 사고를 지원하는 모델은 다음과 같습니다.

claude-opus-4-6

Messages API의 경우 type: "adaptive"``의 ``thinking 매개 변수를 사용하여 적응적 사고를 활성화합니다. output_config.effort 매개 변수는 사고 깊이에 대한 높은 수준의 제어를 제공하며 다음 값을 허용합니다.


노력 수준	동작
`max`	사고 깊이에 대한 제약 없이 항상 사고합니다. Claude Opus 4.6만 해당합니다.
`high` (기본값)	항상 사고합니다. 복잡한 작업에 대한 심층적인 추론을 제공합니다.
`medium`	적당히 사고합니다. 매우 간단한 쿼리에 대한 생각을 건너뛸 수 있습니다.
`low`	사고를 최소화합니다. 속도가 가장 중요한 간단한 작업에 대한 생각을 건너뜁니다.

다음 예제에서는 적응적 사고가 활성화된 Messages API 호출을 만드는 방법을 보여줍니다.

import httpx
import anthropic

PAT = "<SNOWFLAKE_PAT>"

http_client = httpx.Client(
  headers={"Authorization": f"Bearer {PAT}"},
)

client = anthropic.Anthropic(
  api_key="not-used",
  base_url="https://<account-identifier>.snowflakecomputing.com/api/v2/cortex",
  http_client=http_client,
  default_headers={"Authorization": f"Bearer {PAT}"},
)

response = client.messages.create(
  model="claude-opus-4-6",
  max_tokens=16384,
  thinking={
    "type": "adaptive"
  },
  messages=[
    {"role": "user", "content": "Are there an infinite number of prime numbers such that n mod 4 == 3?"}
  ],
)

# The response includes thinking blocks followed by text
for block in response.content:
    if block.type == "thinking":
        print(f"Thinking: {block.thinking[:100]}...")
    elif block.type == "text":
        print(f"Answer: {block.text}")

import Anthropic from "@anthropic-ai/sdk";

const PAT = "<SNOWFLAKE_PAT>";

const client = new Anthropic({
  apiKey: "not-used",
  baseURL: "https://<account-identifier>.snowflakecomputing.com/api/v2/cortex",
  defaultHeaders: {
    "Authorization": `Bearer ${PAT}`,
  },
});

const response = await client.messages.create({
  model: "claude-opus-4-6",
  max_tokens: 16384,
  thinking: {
    type: "adaptive"
  },
  messages: [
    { role: "user", content: "Are there an infinite number of prime numbers such that n mod 4 == 3?" }
  ],
});

// The response includes thinking blocks followed by text
for (const block of response.content) {
  if (block.type === "thinking") {
    console.log(`Thinking: ${block.thinking.slice(0, 100)}...`);
  } else if (block.type === "text") {
    console.log(`Answer: ${block.text}`);
  }
}

curl "https://<account-identifier>.snowflakecomputing.com/api/v2/cortex/v1/messages" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <SNOWFLAKE_PAT>" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-opus-4-6",
    "max_tokens": 16384,
    "thinking": {
      "type": "adaptive"
    },
    "messages": [
      {"role": "user", "content": "Are there an infinite number of prime numbers such that n mod 4 == 3?"}
    ]
  }'

응답에는 요약된 사고 및 사고 서명이 있는 사고 블록이 포함됩니다. 추론 컨텍스트를 유지하기 위해 멀티턴 대화에서 이러한 블록을 다시 전달합니다.

{
  "role": "assistant",
  "content": [
    {"type": "thinking", "thinking": "<thinking>", "signature": "<signature>"},
    {"type": "text", "text": "Yes, there are infinitely many primes p where p ≡ 3 (mod 4)..."}
  ]
}

적응적 사고를 위한 Messages API 지원에 대한 전체 설명은 `Claude API 문서 – 적응적 사고 <https://platform.claude.com/docs/en/build-with-claude/adaptive-thinking>`_를 참조하세요.

베타 기능(Messages API)¶

Messages API는 anthropic-beta 헤더를 통해 Anthropic 베타 기능을 지원합니다. 하나 이상의 베타 헤더 값을 쉼표로 구분된 문자열로 전달합니다.

지원되는 베타 헤더¶
베타 헤더 값	특징
`token-efficient-tools-2025-02-19`	토큰 효율적인 도구
`interleaved-thinking-2025-05-14`	인터리브 처리된 사고
`output-128k-2025-02-19`	최대 128K의 출력 토큰 활성화
`dev-full-thinking-2025-05-14`	Claude 4 이상 모델에서 원시 사고를 위한 개발자 모드
`context-1m-2025-08-07`	백만 개의 토큰 컨텍스트 윈도우
`context-management-2025-06-27`	컨텍스트 관리
`effort-2025-11-24`	사고를 위한 노력 매개 변수
`tool-search-tool-2025-10-19`	도구 검색 도구
`tool-examples-2025-10-29`	도구 사용 예제

다음 예제에서는 ``claude-sonnet-4-6``을 사용하여 백만 개의 토큰 컨텍스트 윈도우를 활성화합니다.

import httpx
import anthropic

PAT = "<SNOWFLAKE_PAT>"

http_client = httpx.Client(
  headers={"Authorization": f"Bearer {PAT}"},
)

client = anthropic.Anthropic(
  api_key="not-used",
  base_url="https://<account-identifier>.snowflakecomputing.com/api/v2/cortex",
  http_client=http_client,
  default_headers={
    "Authorization": f"Bearer {PAT}",
    "anthropic-beta": "context-1m-2025-08-07",
  },
)

response = client.messages.create(
  model="claude-sonnet-4-6",
  max_tokens=8192,
  messages=[
    {"role": "user", "content": "<very long document text>... Summarize the key themes."}
  ],
)

print(response.content[0].text)

import Anthropic from "@anthropic-ai/sdk";

const PAT = "<SNOWFLAKE_PAT>";

const client = new Anthropic({
  apiKey: "not-used",
  baseURL: "https://<account-identifier>.snowflakecomputing.com/api/v2/cortex",
  defaultHeaders: {
    "Authorization": `Bearer ${PAT}`,
    "anthropic-beta": "context-1m-2025-08-07",
  },
});

const response = await client.messages.create({
  model: "claude-sonnet-4-6",
  max_tokens: 8192,
  messages: [
    { role: "user", content: "<very long document text>... Summarize the key themes." }
  ],
});

console.log(response.content[0].text);

curl "https://<account-identifier>.snowflakecomputing.com/api/v2/cortex/v1/messages" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <SNOWFLAKE_PAT>" \
  -H "anthropic-version: 2023-06-01" \
  -H "anthropic-beta: context-1m-2025-08-07" \
  -d '{
    "model": "claude-sonnet-4-6",
    "max_tokens": 8192,
    "messages": [
      {"role": "user", "content": "<very long document text>... Summarize the key themes."}
    ]
  }'

쉼표로 구분된 문자열을 전달하여 여러 베타 기능을 결합할 수 있습니다.

-H "anthropic-beta: context-1m-2025-08-07,interleaved-thinking-2025-05-14"

Chat Completions API 참조¶

POST/api/v2/cortex/v1/chat/completions¶

지정된 모델을 사용하여 Chat Completion을 생성합니다. 요청 및 응답 형식은 `OpenAI Chat Completions API 사양 <https://platform.openai.com/docs/api-reference/chat/create>`_을 따릅니다.

POST https://<account_identifier>.snowflakecomputing.com/api/v2/cortex/v1/chat/completions

필수 헤더¶

Authorization: Bearer token: 요청에 대한 승인입니다. token`은 JSON 웹 토큰(JWT), OAuth 토큰 또는 :doc:`프로그래밍 방식 액세스 토큰 </user-guide/programmatic-access-tokens>`입니다. 자세한 내용은 :doc:/developer-guide/snowflake-rest-api/authentication` 섹션을 참조하세요.
Content-Type: application/json: 요청 본문이 JSON 형식임을 지정합니다.

선택적 헤더¶

X-Snowflake-Authorization-Token-Type: type

승인 토큰의 유형을 정의합니다.

X-Snowflake-Authorization-Token-Type 헤더를 생략할 경우 Snowflake는 토큰을 검사하여 토큰 유형을 결정합니다.

이 헤더는 선택 사항이지만 이 헤더를 지정하도록 선택할 수 있습니다. 헤더를 다음 값 중 하나로 설정할 수 있습니다.

KEYPAIR_JWT (키 페어 인증의 경우)
OAUTH (OAuth 용)
PROGRAMMATIC_ACCESS_TOKEN (프로그래밍 방식 액세스 토큰 용)

Accept: application/json, text/event-stream

응답에 JSON(오류 사례) 또는 서버에서 보낸 이벤트가 포함되도록 지정합니다.

필수 JSON 필드¶


필드	타입	설명
`model`	문자열	사용할 모델 입니다(모델 가용성 참조). `database.schema.model` 형식으로 미세 조정된 모델의 정규화된 이름을 사용할 수도 있습니다.
`messages`	배열	대화를 나타내는 메시지 오브젝트의 배열입니다. 각 메시지에는 role``(``system, `user`, `assistant` 또는 `tool`) 및 ``content``(문자열 또는 내용 부분의 배열)가 있어야 합니다.

일반적으로 사용되는 선택적 JSON 필드¶


필드	타입	기본값	설명
`max_completion_tokens`	정수	4096	응답의 최대 토큰 수입니다. 이론적 최대값은 131,072입니다. 각 모델에는 고유한 출력 제한이 있습니다.
`temperature`	숫자	모델에 따라 다름	임의성을 제어합니다. 0에서 2 사이의 값입니다.
`top_p`	숫자	1.0	핵 샘플링을 통해 다양성을 제어합니다.
`stream`	boolean	false	부분 진행 상황을 서버에서 전송한 이벤트로 다시 스트리밍할지 여부입니다.
`tools`	배열	Null	모델이 호출할 수 있는 도구 목록입니다. 각 도구에는 `type: "function"` 및 `name`, `description`, ``parameters``가 있는 ``function``이 있어야 합니다.
`tool_choice`	문자열 또는 오브젝트	`"auto"`	모델이 도구를 선택하는 방법을 제어합니다. 옵션: `"auto"`, `"required"`, `"none"` 또는 특정 함수를 지정하는 오브젝트입니다.
`response_format`	오브젝트	Null	출력 형식을 제한합니다. 정형 출력의 경우 ``{“type”: “json_schema”, “json_schema”: {…}}``를 사용합니다.
`reasoning_effort`	문자열	Null	OpenAI 추론 모델의 경우입니다. 값은 `"minimal"`, `"low"`, `"medium"`, ``”high”``입니다.
`reasoning`	오브젝트	Null	Claude 모델의 경우입니다. `reasoning.effort` 또는 ``reasoning.max_tokens``를 설정하여 사고를 활성화합니다.

모델 패밀리당 지원되는 필드의 전체 목록은 :ref:`자세한 호환성 차트 <label-cortex_openai_sdk_compatibility>`를 참조하세요.

상태 코드¶

200 OK: 요청이 성공적으로 완료되었습니다.
400 invalid options object: 선택적 인자의 값이 잘못되었습니다.
400 unknown model model_name: 지정한 모델이 존재하지 않습니다.
400 schema validation failed: 응답 스키마 구조가 올바르지 않습니다.
400 max tokens of count exceeded: 요청이 모델에서 지원하는 최대 토큰 수를 초과했습니다.
400 all requests were throttled by remote service: 요청이 제한되었습니다. 나중에 다시 시도하세요.
402 budget exceeded: 모델 소비 예산이 초과되었습니다.
403 Not Authorized: 계정이 REST API를 사용하도록 설정되어 있지 않거나 호출 사용자의 기본 역할에 snowflake.cortex_user 데이터베이스 역할이 없습니다.
429 too many requests: 사용 할당량을 초과했습니다. 나중에 다시 시도하세요.
503 inference timed out: 요청 시간이 너무 오래 걸렸습니다.

제한 사항¶

설정 해제할 경우 ``max_completion_tokens``의 기본값은 4096입니다. 각 모델에는 고유한 출력 토큰 제한이 있습니다.
도구 호출은 OpenAI 및 Claude 모델에만 지원됩니다.
오디오는 지원되지 않습니다.
이미지 이해는 OpenAI 및 Claude 모델에 대해서만 지원됩니다. 이미지는 20MiB로 최대 요청 크기가 제한되며 대화당 20개로 제한됩니다.
Claude 모델만 프롬프트 캐싱을 위해 임시 캐시 제어 지점을 지원합니다. OpenAI 모델은 암시적 캐싱을 지원합니다.
Claude 모델만 응답에서 추론 세부 정보 반환을 지원합니다.
max_tokens``은 사용 중단되었습니다. ``max_completion_tokens 를 대신 사용하십시오.
오류 메시지는 모델 공급자가 아닌 Snowflake에서 생성됩니다.

상세 호환성 차트¶

다음 테이블에는 다양한 Snowflake 호스팅 모델 패밀리에서 Chat Completions API를 사용할 때 지원되는 요청 및 응답 필드가 요약되어 있습니다.

요청 필드¶
필드	OpenAI 모델	Claude 모델	기타 모델
`model`	✔ 지원	✔ 지원	✔ 지원
`messages`	하위 필드 참조	하위 필드 참조	하위 필드 참조
`messages[].audio`	❌ 오류	❌ 무시됨	❌ 무시됨
`messages[].role`	✔ 지원	✔ 사용자, 어시스턴트, 시스템만 해당	✔ 사용자, 어시스턴트, 시스템만 해당
`messages[].content` (문자열)	✔ 지원	✔ 지원	✔ 지원
:code:`messages[].content[]`(배열)	하위 필드 참조	하위 필드 참조	하위 필드 참조
`messages[].content[].text`	✔ 지원	✔ 지원	✔ 지원
`messages[].content[].type`	✔ 지원	✔ 지원	✔ 지원
`messages[].content[].image_url`	✔ 지원	✔ 지원	❌ 오류
`messages[].content[].cache_control`	❌ 무시됨	✔ 지원됨(임시만 해당)	❌ 무시됨
`messages[].content[].file`	❌ 오류	❌ 오류	❌ 무시됨
`messages[].content[].input_audio`	❌ 오류	❌ 무시됨	❌ 무시됨
`messages[].content[].refusal`	✔ 지원	❌ 무시됨	❌ 무시됨
`messages[].function_call`	✔ 지원됨(사용되지 않음)	❌ 무시됨	❌ 무시됨
`messages[].name`	✔ 지원	❌ 무시됨	❌ 무시됨
`messages[].refusal`	✔ 지원	❌ 무시됨	❌ 무시됨
`messages[].tool_call_id`	✔ 지원	✔ 지원	❌ 무시됨
`messages[].tool_calls`	✔ 지원	✔ `function` 도구만	❌ 무시됨
`messages[].reasoning_details`	❌ 무시됨	✔ OpenRouter 형식 `reasoning.text`	❌ 무시됨
`audio`	❌ 오류	❌ 무시됨	❌ 무시됨
`frequency_penalty`	✔ 지원	❌ 무시됨	❌ 무시됨
`logit_bias`	✔ 지원	❌ 무시됨	❌ 무시됨
`logprobs`	✔ 지원	❌ 무시됨	❌ 무시됨
`max_tokens`	❌ 오류(사용 중단됨)	❌ 오류(사용 중단됨)	❌ 오류(사용 중단됨)
`max_completion_tokens`	✔ 지원됨(기본값 4096, 최대 131072)	✔ 지원됨(기본값 4096, 최대 131072)	✔ 지원됨(기본값 4096, 최대 131072)
`metadata`	❌ 무시됨	❌ 무시됨	❌ 무시됨
`modalities`	❌ 무시됨	❌ 무시됨	❌ 무시됨
`n`	✔ 지원	❌ 무시됨	❌ 무시됨
`parallel_tool_calls`	✔ 지원	❌ 무시됨	❌ 무시됨
`prediction`	✔ 지원	❌ 무시됨	❌ 무시됨
`presence_penalty`	✔ 지원	❌ 무시됨	❌ 무시됨
`prompt_cache_key`	✔ 지원	❌ 무시됨	❌ 무시됨
`reasoning_effort`	✔ 지원	❌ 무시됨(`reasoning` 오브젝트 사용)	❌ 무시됨
`reasoning`	하위 필드 참조	하위 필드 참조	하위 필드 참조
`reasoning.effort`	✔ 지원됨(`reasoning_effort` 재정의)	✔ :code:`reasoning.max_tokens`로 변환됨	❌ 무시됨
`reasoning.max_tokens`	❌ 무시됨	✔ 지원	❌ 무시됨
`response_format`	✔ 지원	✔ :code:`json_schema`만 해당	❌ 무시됨
`safety_identifier`	❌ 무시됨	❌ 무시됨	❌ 무시됨
`service_tier`	❌ 오류	❌ 오류	❌ 오류
`stop`	✔ 지원	❌ 무시됨	❌ 무시됨
`store`	❌ 오류	❌ 오류	❌ 오류
`stream`	✔ 지원	✔ 지원	✔ 지원
`stream_options`	하위 필드 참조	하위 필드 참조	하위 필드 참조
`stream_options.include_obfuscation`	❌ 무시됨	❌ 무시됨	❌ 무시됨
`stream_options.include_usage`	✔ 지원	✔ 지원	✔ 지원
`temperature`	✔ 지원	✔ 지원	✔ 지원
`tool_choice`	✔ 지원	✔ `function` 도구만	❌ 무시됨
`tools`	✔ 지원	✔ `function` 도구만	❌ 오류
`top_logprobs`	✔ 지원	❌ 무시됨	❌ 무시됨
`top_p`	✔ 지원	✔ 지원	✔ 지원
`verbosity`	✔ 지원	❌ 무시됨	❌ 무시됨
`web_search_options`	❌ 오류	❌ 무시됨	❌ 무시됨

응답 필드¶
필드	OpenAI 모델	Claude 모델	기타 모델
`id`	✔ 지원	✔ 지원	✔ 지원
`object`	✔ 지원	✔ 지원	✔ 지원
`created`	✔ 지원	✔ 지원	✔ 지원
`model`	✔ 지원	✔ 지원	✔ 지원
`choices`	하위 필드 참조	하위 필드 참조	하위 필드 참조
`choices[].index`	✔ 지원	✔ 단일 선택 항목만	✔ 단일 선택 항목만
`choices[].finish_reason`	✔ 지원	❌ 지원되지 않음	✔ :code:`stop`만 해당
`choices[].logprobs`	✔ 지원	❌ 지원되지 않음	❌ 지원되지 않음
:code:`choices[].message`(비스트리밍)	하위 필드 참조	하위 필드 참조	하위 필드 참조
`choices[].message.content`	✔ 지원	✔ 지원	✔ 지원
`choices[].message.role`	✔ 지원	✔ 지원	✔ 지원
`choices[].message.refusal`	✔ 지원	❌ 지원되지 않음	❌ 지원되지 않음
`choices[].message.annotations`	❌ 지원되지 않음	❌ 지원되지 않음	❌ 지원되지 않음
`choices[].message.audio`	❌ 지원되지 않음	❌ 지원되지 않음	❌ 지원되지 않음
`choices[].message.function_call`	✔ 지원	❌ 지원되지 않음	❌ 지원되지 않음
`choices[].message.tool_calls`	✔ 지원	✔ `function` 도구만	❌ 지원되지 않음
`choices[].message.reasoning`	❌ 지원되지 않음	✔ OpenRouter 형식	❌ 지원되지 않음
:code:`choices[].delta`(스트리밍)	하위 필드 참조	하위 필드 참조	하위 필드 참조
`choices[].delta.content`	✔ 지원	✔ 지원	✔ 지원
`choices[].delta.role`	✔ 지원	❌ 지원되지 않음	❌ 지원되지 않음
`choices[].delta.refusal`	✔ 지원	❌ 지원되지 않음	❌ 지원되지 않음
`choices[].delta.function_call`	✔ 지원	❌ 지원되지 않음	❌ 지원되지 않음
`choices[].delta.tool_calls`	✔ 지원	✔ `function` 도구만	❌ 지원되지 않음
`choices[].delta.reasoning`	❌ 지원되지 않음	✔ OpenRouter 형식	❌ 지원되지 않음
`usage`	하위 필드 참조	하위 필드 참조	하위 필드 참조
`usage.prompt_tokens`	✔ 지원	✔ 지원	✔ 지원
`usage.completion_tokens`	✔ 지원	✔ 지원	✔ 지원
`usage.total_tokens`	✔ 지원	✔ 지원	✔ 지원
`usage.prompt_tokens_details`	하위 필드 참조	하위 필드 참조	하위 필드 참조
`usage.prompt_tokens_details.audio_tokens`	❌ 지원되지 않음	❌ 지원되지 않음	❌ 지원되지 않음
`usage.prompt_tokens_details.cached_tokens`	✔ 캐시 읽기만	✔ 캐시 읽기 + 쓰기	❌ 지원되지 않음
`usage.completion_tokens_details`	하위 필드 참조	하위 필드 참조	하위 필드 참조
`usage.completion_tokens_details.accepted_prediction_tokens`	✔ 지원	❌ 지원되지 않음	❌ 지원되지 않음
`usage.completion_tokens_details.audio_tokens`	❌ 지원되지 않음	❌ 지원되지 않음	❌ 지원되지 않음
`usage.completion_tokens_details.reasoning_tokens`	✔ 지원	❌ 지원되지 않음	❌ 지원되지 않음
`usage.completion_tokens_details.rejected_prediction_tokens`	✔ 지원	❌ 지원되지 않음	❌ 지원되지 않음
`service_tier`	✔ 지원	❌ 지원되지 않음	❌ 지원되지 않음
`system_fingerprint`	✔ 지원	❌ 지원되지 않음	❌ 지원되지 않음

헤더 요청¶
헤더	지원
`Authorization`	✔ 필수
`Content-Type`	✔ 지원됨(`application/json`)
`Accept`	✔ 지원됨(`application/json`, `text/event-stream`)

응답 헤더¶
헤더	지원
`openai-organization`	❌ 지원되지 않음
`openai-version`	❌ 지원되지 않음
`openai-processing-ms`	❌ 지원되지 않음
`x-ratelimit-limit-requests`	❌ 지원되지 않음
`x-ratelimit-limit-tokens`	❌ 지원되지 않음
`x-ratelimit-remaining-requests`	❌ 지원되지 않음
`x-ratelimit-remaining-tokens`	❌ 지원되지 않음
`x-ratelimit-reset-requests`	❌ 지원되지 않음
`x-ratelimit-reset-tokens`	❌ 지원되지 않음
`retry-after`	❌ 지원되지 않음

자세히 알아보기¶

추가적인 사용 예제는 OpenAI Chat Completions API 참조 또는 `OpenAI Cookbook <https://cookbook.openai.com/>`_을 참조하세요.

Snowflake는 Chat Completions API와의 호환성을 제공할 뿐만 아니라 Claude 모델에 대한 OpenRouter 호환 기능도 지원합니다. 이러한 기능은 요청 시 추가 필드로 노출됩니다.

프롬프트 캐싱의 경우 cache_control 필드를 사용합니다. `OpenRouter 프롬프트 캐싱 설명서 <https://openrouter.ai/docs/features/prompt-caching>`_를 참조하세요.
추론 토큰의 경우 reasoning 필드를 사용합니다. `OpenRouter 추론 설명서 <https://openrouter.ai/docs/use-cases/reasoning-tokens>`_를 참조하세요.

Messages API 참조¶

POST/api/v2/cortex/v1/messages¶

Claude 모델을 사용하여 응답을 생성합니다. 요청 및 응답 형식은 `Anthropic Messages API 사양 <https://docs.anthropic.com/en/api/messages>`_을 따릅니다.

POST https://<account_identifier>.snowflakecomputing.com/api/v2/cortex/v1/messages

참고

Messages API는 Claude 모델만 지원합니다. 다른 모델의 경우 Chat Completions API를 사용합니다.

필수 헤더¶

Authorization: Bearer token: 요청에 대한 승인입니다. token`은 JSON 웹 토큰(JWT), OAuth 토큰 또는 :doc:`프로그래밍 방식 액세스 토큰 </user-guide/programmatic-access-tokens>`입니다. 자세한 내용은 :doc:/developer-guide/snowflake-rest-api/authentication` 섹션을 참조하세요.
Content-Type: application/json: 요청 본문이 JSON 형식임을 지정합니다.
anthropic-version: 2023-06-01: 필수 Anthropic API 버전 헤더입니다.

선택적 헤더¶

X-Snowflake-Authorization-Token-Type: type

승인 토큰의 유형을 정의합니다.

X-Snowflake-Authorization-Token-Type 헤더를 생략할 경우 Snowflake는 토큰을 검사하여 토큰 유형을 결정합니다.

이 헤더는 선택 사항이지만 이 헤더를 지정하도록 선택할 수 있습니다. 헤더를 다음 값 중 하나로 설정할 수 있습니다.

KEYPAIR_JWT (키 페어 인증의 경우)
OAUTH (OAuth 용)
PROGRAMMATIC_ACCESS_TOKEN (프로그래밍 방식 액세스 토큰 용)

anthropic-beta: feature

베타 기능을 활성화합니다. Bedrock 호환 베타 헤더만 지원됩니다.

필수 JSON 필드¶


필드	타입	설명
`model`	문자열	사용할 Claude 모델입니다(모델 가용성 참조).
`max_tokens`	정수	생성할 최대 토큰 수입니다.
`messages`	배열	메시지 오브젝트의 배열입니다. 각 메시지에는 role``(``user 또는 `assistant`) 및 ``content``(문자열 또는 콘텐츠 블록 배열)가 있습니다.

지원되는 기능¶

Messages API는 다음을 포함하여 Claude 모델용 표준 Anthropic Messages API 기능 세트를 지원합니다.

텍스트 생성 및 멀티턴 대화
스트리밍("stream": true)
시스템 메시지(최상위 수준 system 필드를 통해)
도구 호출(name, description, ``input_schema``가 포함된 Anthropic 형식)
이미지 입력(base64 소스 블록)
프롬프트 캐싱(콘텐츠 블록에 대한 cache_control)
확장된 사고(budget_tokens``이 포함된 ``thinking 매개 변수)

전체 요청 및 응답 스키마 세부 정보는 `Anthropic Messages API 설명서 <https://docs.anthropic.com/en/api/messages>`_를 참조하세요.

제한 사항¶

Claude 모델만 해당합니다. OpenAI, Llama, Mistral 및 기타 모델은 이 엔드포인트를 통해 사용할 수 없습니다.
유연한 처리 또는 우선 순위 계층이 없습니다. service_tier 필드는 지원되지 않습니다.
Bedrock 베타 헤더만 해당합니다. Bedrock 호환 anthropic-beta 헤더 값만 지원됩니다.
오류 메시지는 Anthropic이 아닌 Snowflake에서 생성됩니다.

상태 코드¶

200 OK: 요청이 성공적으로 완료되었습니다.
400 invalid_request_error: 요청 본문의 형식이 잘못되었거나 유효하지 않은 값이 포함되어 있습니다.
400 unknown model model_name: 지정된 모델이 존재하지 않거나 Claude 모델이 아닙니다.
402 budget exceeded: 모델 소비 예산이 초과되었습니다.
403 Not Authorized: REST API에 대해 활성화되지 않은 계정 또는 기본 역할에 snowflake.cortex_user 데이터베이스 역할이 없습니다.
429 too many requests: 사용 할당량을 초과했습니다. 나중에 다시 시도하세요.
503 inference timed out: 요청 시간이 너무 오래 걸렸습니다.

속도 제한¶

모든 Snowflake 고객에게 높은 성능 기준을 보장하기 위해 Cortex REST API 요청에는 속도 제한이 적용됩니다. 제한을 초과하는 요청은 HTTP 429 응답을 수신할 수 있습니다. Snowflake는 때때로 이러한 제한을 조정할 수 있습니다.

다음 테이블의 기본 제한은 계정별로 적용되며 각 모델에 대해 독립적으로 적용됩니다. `지수 백오프 <https://platform.openai.com/docs/guides/rate-limits#retrying-with-exponential-backoff>`_를 통해 요청을 재시도하여 애플리케이션이 429 응답 코드를 안정적으로 처리하도록 합니다.

이 제한을 늘려야 하는 경우 Snowflake 지원으로 문의하세요.

Cortex REST API 속도 제한¶
모델	분당 처리된 토큰(TPM)	분당 요청 수(RPM)	최대 출력(토큰)
`claude-3-5-sonnet`	300,000	300	16,384
`claude-3-7-sonnet`	300,000	300	16,384
`claude-sonnet-4-5`	600,000	600	16,384
`claude-haiku-4-5`	600,000	600	16,384
`claude-4-sonnet`	300,000	300	16,384
`claude-4-opus`	75,000	75	16,384
`deepseek-r1`	100,000	100	16,384
`llama3.1-8b`	400,000	400	16,384
`llama3.1-70b`	200,000	200	16,384
`llama3.1-405b`	100,000	100	16,384
`mistral-7b`	400,000	400	16,384
`mistral-large2`	200,000	200	16,384
`openai-gpt-4.1`	300,000	300	16,384
`openai-gpt-5`	300,000	300	16,384
`openai-gpt-5-chat`	300,000	300	16,384
`openai-gpt-5-mini`	1,000,000	1,000	16,384
`openai-gpt-5-nano`	5,000,000	5,000	16,384

리전 간 추론을 통한 속도 제한 증가¶

Snowflake 계정에서 :doc:`리전 간 추론 </user-guide/snowflake-cortex/cross-region-inference>`을 설정하면 다음 모델에 대해 속도 제한이 더 높아집니다.

리전 간 추론을 통한 Cortex REST API 속도 제한¶
모델	분당 처리된 토큰(TPM)	분당 요청 수(RPM)	최대 출력(토큰)
`claude-3-7-sonnet`	600,000	600	16,384
`claude-haiku-4-5`	600,000	600	16,384
`claude-sonnet-4-5`	600,000	600	16,384
`claude-4-sonnet`	1,200,000	1,200	16,384
`claude-4-opus`	150,000	150	16,384
`llama3.1-8b`	800,000	400	16,384
`llama3.1-70b`	400,000	200	16,384
`llama3.1-405b`	200,000	100	16,384

속도 제한 이벤트 문제 해결¶

TPM 또는 RPM 제한을 위반할 경우 429 응답 코드가 반환됩니다. REST API 사용량이 분당 요청 속도 제한을 초과하지 않았음에도 429 응답 코드를 받은 경우, 토큰 사용률을 다시 확인하세요.

Cortex REST API는 `슬라이딩 윈도우 카운터<https://blog.cloudflare.com/counting-things-a-lot-of-different-things/#sliding-windows-to-the-rescue>`_ 패턴을 사용하여 속도 제한을 구현합니다. 카운터는 Snowflake의 사설 네트워크 내에서 Snowflake Cortex만이 액세스 가능한 고가용성 Redis 클러스터에 저장됩니다.

슬라이딩 윈도우 카운터는 이전 시간 윈도우에서 API로의 클라이언트 트래픽이 균일하게 분포되어 있다고 가정합니다. 트래픽이 급증할 때 이 가정은 요청 속도를 과대평가할 수 있지만, 윈도우가 짧기 때문에 빠르게 회복됩니다. 과대 추정치 적용 대상이며 제한 증가를 원하는 경우 Snowflake 지원에 문의해 주세요.

알려진 문제¶

세션 토큰 만료¶

:doc:`/developer-guide/snowflake-rest-api/authentication`에 정의된 세 가지 방법 중 하나로 인증하는 것이 좋습니다. 그러나 Snowflake 세션 토큰으로 인증하도록 선택한 경우에는 토큰 새로 고침을 처리하여 중단 없는 API 액세스를 보장해야 합니다.

세션 토큰은 주기적으로 만료됩니다. 만료된 세션 토큰으로 요청이 실행되면 REST API는 390112 오류 코드가 포함된 200 OK 응답을 반환합니다. 이 경우 작업이 수행되지 않습니다.

이 동작을 처리하려면 애플리케이션이 다음을 수행해야 합니다.

HTTP 상태 코드가 ``200 OK``인 경우라도 오류 코드 ``390112``에 대한 각 API 응답을 확인합니다.
오류 코드 ``390112``가 감지되면 세션 토큰을 새로 고치고 요청을 다시 시도합니다.

참고

이 동작은 Snowflake 세션 토큰을 사용하는 애플리케이션에만 영향을 줍니다. 키 페어 인증, OAuth 또는 :ref:`프로그래밍 방식 액세스 토큰(PATs) <label-sfrest_authenticating_pat>`을 사용하여 인증하는 경우 이 오류 처리를 구현할 필요가 없습니다.

비용 고려 사항¶

Snowflake Cortex REST API 요청은 처리된 토큰 수를 기준으로 컴퓨팅 비용이 발생합니다. 백만 토큰당 달러로 표시되는 각 모델의 비용은 `Snowflake Service Consumption Table`_을 참조하세요.

토큰은 Snowflake Cortex LLM 함수가 처리하는 가장 작은 텍스트 단위로, 대략 텍스트 4자와 같습니다. 원시 입력 또는 출력 텍스트와 토큰의 동등성은 모델에 따라 다를 수 있습니다.

입력 토큰과 출력 토큰 모두 컴퓨팅 비용이 발생합니다. API를 사용하여 대화 또는 채팅 사용자 경험을 제공하는 경우 이전의 모든 프롬프트와 응답이 처리되어 각각의 새로운 응답이 생성되며 그에 따른 비용이 발생합니다.