Cortex Analyst REST API

Use this API to answer questions about your data with natural language queries.

Send message

POST /api/v2/cortex/analyst/message

Generates a SQL query for the given question using a semantic model provided in the request. One or more models can be specified; when multiple models are specified, Cortex Analyst chooses the most appropriate one. You can have multi-turn conversations where you can ask follow-up questions that build upon previous queries. For more information, see Multi-turn conversation in Cortex Analyst.

The request includes a user question; the response includes the user question and the analyst response. Each message in a response can have multiple content blocks of different types. Three values that are currently supported for the type field of the content object are: text, suggestions, and sql.

Responses can be sent all at once after processing is complete, or incrementally as they are generated.

Request Headers

Header

Description

Authorization

(Required) Authorization token. For more information, see Authenticating to the server.

X-Snowflake-Authorization-Token-Type

Authorization token type. Defaults to OAuth; required if using any other type of token. For more information, see Authenticating to the server.

Content-Type

(Required) application/json

Request Body

The request body contains the role of the speaker which must be user, the user’s question and a path to the semantic model YAML file. The user question is contained in a content object which has two fields, type and text. text is also the only allowed value of the field type.

Field

Description

messages[].role

(Required) The role of the entity that is creating the message. Currently only supports user.

Type: string:enum

Example: user

messages[].content[]

(Required) The content object that is part of a message.

Type: object

Example:

{
  "type": "text",
  "text":  "Which company had the most revenue?"
}
Copy

messages[].content[].type

(Required) The content type. Currently only text is supported.

Type: string:enum

Example: text

messages[].content[].text

(Required) The user’s question.

Type: string

Example: Which company had the most revenue?

semantic_model_file

Path to the semantic model YAML file. Must be a fully qualified stage URL including the database and schema. You can instead provide the complete semantic model YAML in the semantic_model field.

Type: string

Example: @my_db.my_schema.my_stage/my_semantic_model.yaml

semantic_model

A string containing the entire semantic model YAML. You can instead pass the semantic model YAML as a staged file by providing its URL in the semantic_model_file field.

Type: string

stream

(Optional) If set to true, the response is streamed to the client using server-sent events as it is generated (see Streaming response). Otherwise the complete response is returned after Cortex Analyst has fully processed the user’s question.

Type: boolean

Important

You must provide either semantic_model_file OR semantic_model in the request body.

Example

{
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "which company had the most revenue?"
                }
            ]
        }
    ],
    "semantic_model_file": "@my_stage/my_semantic_model.yaml"
}
Copy

Non-streaming response

This operation can return the response codes listed below. The response always has the following structure. Currently, three content types are supported for the response, text, suggestion, and sql. The content types suggestion and sql are mutually exclusive so that if the response contains a sql content type, it won’t contain a suggestion content type, and vice versa. The suggestion content type is only included in a response if the user question was ambiguous and Cortex Analyst could not return a SQL statement for that query.

To ensure forward compatibility, make sure your implementation takes the content type into account and handles types.

Code

Description

200

The statement was executed successfully.

The body of the response contains a message object that contains the following fields:

  • message: Messages of the conversation between the user and analyst.

  • message (object): Represents a message within a chat.

  • message.role (string:enum): The entity that produced the message. One of user or analyst.

  • message.content[] (object): The content object that is part of a message.

  • message.content[].type (string:enum): The content type of the message. One of text, suggestion, or sql.

  • message.content[].text (string): The text of the content. Only returned for content type text.

  • message.content[].statement (string): A SQL statement. Only returned for content type sql.

  • message.content[].confidence (object): Contains confidence-related information. Only returned for the sql content type.

  • message.content[].confidence.verified_query_used (object): Represents the verified query from Verified Query Repository used in SQL response generation. If no verified query used, the field value is null.

  • message.content[].confidence.verified_query_used.name (string): The name of the verified query used, extracted from the Verified Query Repository.

  • message.content[].confidence.verified_query_used.question (string): The question that is answered by the verified query, extracted from the Verified Query Repository.

  • message.content[].confidence.verified_query_used.sql (string): The SQL statement of the verified query used, extracted from the Verified Query Repository.

  • message.content[].confidence.verified_query_used.verified_at (integer): The numeric representation of the timestamp when the query is verified, extracted from the Verified Query Repository.

  • message.content[].confidence.verified_query_used.verified_by (string): The person who verified the query, extracted from the Verified Query Repository.

  • message.content[].suggestions (string): If SQL cannot be generated, a list of questions the semantic model can generate SQL for. Only returned for content type suggestion.

  • message.content[].warnings: List of warnings from the analyst about the user’s request.

  • message.content[].warnings (list): Represents a collection of warnings regarding user requests and semantic models.

  • message.content[].warnings[].message (string): Contains a detailed description of one individual warning.

By default, the response is returned all at once after Cortex Analyst has fully processed the user’s question. See Streaming response for the format of streaming mode responses.

{
    "request_id": "75d343ee-699c-483f-83a1-e314609fb563",
    "message": {
        "role": "analyst",
        "content": [
            {
                "type": "text",
                "text": "We interpreted your question as ..."
            },
            {
                "type": "sql",
                "statement": "SELECT * FROM table",
                "confidence": {
                    "verified_query_used": {
                        "name": "My verified query",
                        "question": "What was the total revenue?",
                        "sql": "SELECT * FROM table2",
                        "verified_at": 1714497970,
                        "verified_by": "Jane Doe"
                    }
                }
            }
        ]
    },
    "warnings": [
        {
            "message": "Table table1 has (30) columns, which exceeds the recommended maximum of 10"
        },
        {
            "message": "Table table2 has (40) columns, which exceeds the recommended maximum of 10"
        }
    ]
}
Copy

Streaming response

Streaming mode lets your client receive responses as they are generated by Cortex Analyst, rather than waiting for the entire response to be generated. This improves the perceived responsiveness of your application, especially for long-running queries, because users begin seeing output much sooner. Streaming responses also provide status information that can help you understand where Cortex Analyst is in the process of generating a response, and warnings that can help understand what went wrong when Cortex Analyst doesn’t work as you expected.

To receive a streaming response, set the stream field in the request body to true. Streaming responses use server-sent events.

Cortex Analyst sends five distinct types of events in a streaming response:

  • status: Conveys status updates about the SQL generation process.

  • message.content.delta: Contains a piece of the response. This event is sent multiple times.

  • error: Indicates that Cortex Analyst has encountered an error and cannot continue processing the request. No further message.content.delta events will be sent.

  • warnings: Sent at the end of a response to convey any warnings encountered during processing. Warnings do not stop processing.

  • done: Sent to indicate that processing is complete and no further message.content.delta events will be sent.

Of these, the message.content.delta events are the most crucial to understand, because they contain the actual response content. Each delta contains tokens from some field in the complete response. It is possible for each delta event to contain anywhere between a single character to the full response, and they may be of different lengths. You receive these tokens as they are generated; it is up to you to assemble them into the final response.

Important

Events from different responses (even extremely similar ones) can vary. There is no guarantee that events will be sent in the same order or with the same content.

Simple example

The following is a sample non-streaming response for a simple query:

{
    "message": {
        "role": "analyst",
        "content": [
            {
                "type": "text",
                "text": "This is how we interpreted your question and this is how the sql is generated"
            },
            {
                "type": "sql",
                "statement": "SELECT * FROM table"
            }
        ]
    }
}
Copy

And this is one possible series of streaming events for that response (a different series of events is also possible):

event: status
data: { status: "interpreting_question" }

event: message.content.delta
data: {
  index: 0,
  type: "text",
  text_delta: "This is how we interpreted your question"
}

event: status
data: { status: "generating_sql" }

event: status
data: { status: "validating_sql" }

event: message.content.delta
data: {
  index: 0,
  type: "text",
  text_delta: " and this is how the sql is generated"
}

event: message.content.delta
data: {
  index: 1,
  type: "sql",
  statement_delta: "SELECT * FROM table"
}

event: status
data: { status: "done" }

Use the index field in the message.content.delta respnoses to determine which field in the full response the event is part of. For example, here the first two delta events use index 0, which means they are part of the first field (element 0) in the content array of the non-streaming response. Similarly, the delta event that contains the SQL response uses index 1.

Example with suggestions

This example contains suggested questions for an ambiguous question. The following is the non-streaming response:

{
    "message": {
        "role": "analyst",
        "content": [
            {
                "type": "text",
                "text": "Your question is ambigous, here are some alternatives:"
            },
            {
                "type": "suggestions",
                "suggestions": [
                    "which company had the most revenue?",
                    "which company placed the most orders?"
                ]
            }
        ]
    }
}
Copy

And here is a possible series of streaming events that constitute that response:

event: status
data: { status: "interpreting_question" }

event: message.content.delta
data: {
  index: 0,
  type: "text",
  text_delta: "Your question is ambigous,"
}

event: status
data: { status: "generating_suggestions" }

event: message.content.delta
data: {
  index: 0,
  type: "text",
  text_delta: " here are some alternatives:"
}

event: message.content.delta
data: {
  index: 1,
  type: "suggestions",
  suggestions_delta: {
    index: 0,
    suggestion_delta: "which company had",
  }
}

event: message.content.delta
data: {
  index: 1,
  type: "suggestions",
  suggestions_delta: {
    index: 0,
    suggestion_delta: " the most revenue?",
  }
}

event: message.content.delta
data: {
  index: 1,
  type: "suggestions",
  suggestions_delta: {
    index: 1,
    suggestion_delta: "which company placed",
  }
}

event: message.content.delta
data: {
  index: 1,
  type: "suggestions",
  suggestions_delta: {
    index: 1,
    suggestion_delta: " the most orders?",
  }
}

event: status
data: { status: "done" }

In this example, the content field of the non-streaming response is an array. One of the elements of content is the suggestions array. So the meaning of index fields for text and suggestions delta events refer to the location of elements in these two different arrays. You will need to keep track of these indexes separately when assembling the full response.

Note

Currently, the generated SQL statement is always sent in a single event. This may not be the case in the future. Your client must be prepared to receive the SQL statement in multiple events.

Other examples

You can find a Streamlit streaming client for Cortex Analyst in the Cortex Analyst GitHub repo <https://github.com/Snowflake-Labs/sfguide-getting-started-with-cortex-analyst/blob/main/cortex_analyst_streaming_demo.py>. This demo must be run locally; SiS does not currently support streaming.

See the Cortex Analyst playground in the AI/ML Studio (in Snowsight) for an interactive demonstration of streaming response.

Streaming event schemas

The following are the OpenAPI/Swagger schemas of the events sent by Cortex Analyst in a streaming response.

status
message.content.delta
error
StreamingError:
type: object
properties:
  message:
    type: string
    description: A description of the error
  code:
    type: string
    description: The Snowflake error code categorizing the error
  request_id:
    type: string
    description: Unique request ID
Copy
warnings
Warnings:
type: object
description: Warnings found while processing the request
properties:
  warnings:
    type: array
    items:
      $ref: "#/components/schemas/Warning"
Warning:
type: object
title: The warning object
description: Represents a warning within a chat.
properties:
  message:
    type: string
    description: A human-readable message describing the warning
Copy

Send feedback

POST /api/v2/cortex/analyst/feedback

Provides qualitative end user feedback. Within Snowsight, the feedback is shown in Semantic Model Admins under Monitoring.

Request Headers

Header

Description

Authorization

(Required) Authorization token. For more information, see Authenticating to the server.

Content-Type

(Required) application/json

Request Body

Field

Description

request_id

(Required) The id of the request that you’ve made to send a message. Returned in the request_id field of /api/v2/cortex/analyst/message. For more information, see Non-streaming response.

Type: string

Example: 75d343ee-699c-483f-83a1-e314609fb563

positive

(Required) Whether the feedback is positive or negative. true for positive or “thumbs up”, false for negative or “thumbs down”.

Type: boolean

Example:

true

feedback_message

(Optional) The feedback message from the user.

Example: This is the best answer I've ever seen!

Response

Empty response body with status code 200.

Access control requirements

For information on the required privileges, see Access control requirements.

For details about authenticating to the API, see Authenticating Snowflake REST APIs with Snowflake.