Categories:: String & binary functions (Large Language Model)

AI_COMPLETE (Single string)¶

Note

AI_COMPLETE is the updated version of COMPLETE (SNOWFLAKE.CORTEX). For the latest functionality, use AI_COMPLETE.

Generates a response (completion) for a text prompt using a supported language model.

Syntax¶

The function contains two required arguments and four optional arguments. The function can be used with either positional or named argument syntax.

Using AI_COMPLETE with a single string input

AI_COMPLETE(
    <model>, <prompt> [ , <model_parameters>, <response_format>, <show_details> ] )

Copy

Arguments¶

model

A string specifying the model to be used. Specify one of the following models:

claude-4-opus
claude-4-sonnet
claude-3-7-sonnet
claude-3-5-sonnet
deepseek-r1
gemma-7b
jamba-1.5-mini
jamba-1.5-large
jamba-instruct
llama2-70b-chat
llama3-8b
llama3-70b
llama3.1-8b
llama3.1-70b
llama3.1-405b
llama3.2-1b
llama3.2-3b
llama3.3-70b
llama4-maverick
llama4-scout
mistral-large
mistral-large2
mistral-7b
mixtral-8x7b
openai-gpt-4.1
openai-o4-mini
reka-core
reka-flash
snowflake-arctic
snowflake-llama-3.1-405b
snowflake-llama-3.3-70b

Supported models might have different costs.

prompt

A string prompt

model_parameters

An object containing zero or more of the following options that affect the model’s hyperparameters. See LLM Settings.

temperature: A value from 0 to 1 (inclusive) that controls the randomness of the output of the language model. A higher temperature (for example, 0.7) results in more diverse and random output, while a lower temperature (such as 0.2) makes the output more deterministic and focused.

Default: 0
top_p: A value from 0 to 1 (inclusive) that controls the randomness and diversity of the language model, generally used as an alternative to temperature. The difference is that top_p restricts the set of possible tokens that the model outputs, while temperature influences which tokens are chosen at each step.

Default: 0
max_tokens: Sets the maximum number of output tokens in the response. Small values can result in truncated responses.

Default: 4096 Maximum allowed value: 8192
guardrails: Filters potentially unsafe and harmful responses from a language model using Cortex Guard. Either TRUE or FALSE.

Default: FALSE

response_format

A JSON schema that the response should follow. This is a SQL sub-object, not a string. If response_format is not specified, the response is a string containing either the response or a serialized JSON object containing the response and information about it.

For more information, see AI_COMPLETE Structured Outputs.

show_details

A boolean flag that indicates whether to return a serialized JSON object containing the response and information about it.

Returns¶

When the show_details argument is not specified or set to FALSE and the response_format is not specified or set to NULL, returns a string containing the response.

When the show_details argument is not specified or set to FALSE and the response_format is specified, returns an object following the provided response format.

When the show_details argument is set to TRUE and the response_format is not specified, returns a a JSON object containing the following keys.

"choices": An array of the model’s responses. (Currently, only one response is provided.) Each response is an object containing a "messages" key whose value is the model’s response to the latest prompt.
"created": UNIX timestamp (seconds since midnight, January 1, 1970) when the response was generated.
"model": The name of the model that created the response.
"usage": An object recording the number of tokens consumed and generated by this completion. Includes the following sub-keys:
- "completion_tokens": The number of tokens in the generated response.
- "prompt_tokens": The number of tokens in the prompt.
- "total_tokens": The total number of tokens consumed, which is the sum of the other two values.

When the show_details argument is set to TRUE and the response_format is specified, returns a a JSON object containing the following keys

"structured_output": A json object following the specified response format.
"created": UNIX timestamp (seconds since midnight, January 1, 1970) when the response was generated.
"model": The name of the model that created the response.
"usage": An object recording the number of tokens consumed and generated by this completion. Includes the following sub-keys:
- "completion_tokens": The number of tokens in the generated response.
- "prompt_tokens": The number of tokens in the prompt.
- "total_tokens": The total number of tokens consumed, which is the sum of the other two values.

Examples¶

Single response¶

To generate a single response:

SELECT AI_COMPLETE('snowflake-arctic', 'What are large language models?');

Copy

Responses from table column¶

The following example generates a response for each row in the reviews table, using the content column as input. Each query result contains a critique of the corresponding review.

SELECT AI_COMPLETE(
    'mistral-large',
        CONCAT('Critique this review in bullet points: <review>', content, '</review>')
) FROM reviews LIMIT 10;

Copy

Tip

As shown in this example, you can use tagging in the prompt to control the kind of response generated. See A guide to prompting LLaMA 2 for tips.

Controlling model parameters¶

The following example specifies the model_parameters used to provide a response.

SELECT AI_COMPLETE(
    model => 'llama2-70b-chat',
    prompt => 'how does a snowflake get its unique pattern?',
    model_parameters => {
        'temperature': 0.7,
        'max_tokens': 10
    }
);

Copy

The response is a string containing the message from the language model and other information. Note that the response is truncated as instructed in the model_parameters argument.

"The unique pattern on a snowflake is"

Copy

Detailed output¶

The following example shows how you can use the show_details argument to return additional inference details.

SELECT AI_COMPLETE(
    model => 'llama2-70b-chat',
    prompt => 'how does a snowflake get its unique pattern?',
    model_parameters => {
        'temperature': 0.7,
        'max_tokens': 10
    },
    show_details => true
);

Copy

The response is a JSON object with the model’s message and related details. The options argument was used to truncate the output.

{
    "choices": [
        {
            "messages": " The unique pattern on a snowflake is"
        }
    ],
    "created": 1708536426,
    "model": "llama2-70b-chat",
    "usage": {
        "completion_tokens": 10,
        "prompt_tokens": 22,
        "guardrail_tokens": 0,
        "total_tokens": 32
    }
}

Copy

Specifying a JSON response format¶

This example illustrates the use of the function’s response_format argument to return a structured response

SELECT AI_COMPLETE(
    model => 'llama2-70b-chat',
    prompt => 'Extract structured data from this customer interaction note: Customer Sarah Jones complained about the mobile app crashing during checkout. She tried to purchase 3 items: a red XL jacket ($89.99), blue running shoes ($129.50), and a fitness tracker ($199.00). The app crashed after she entered her shipping address at 123 Main St, Portland OR, 97201. She has been a premium member since January 2024.',
    model_parameters => {
        'temperature': 0,
        'max_tokens': 4096
    },
    response_format => {
            'type':'json',
            'schema':{'type' : 'object','properties' : {'sentiment_categories':{'type':'array','items':{'type':'object','properties':
            {'food_quality' : {'type' : 'string'},'food_taste': {'type':'string'}, 'wait_time': {'type':'string'}, 'food_cost': {'type':'string'}},'required':['food_quality','food_taste' ,'wait_time','food_cost']}}}}
    }
);

Copy

The response is a json object following the structured response format.

Response:

{
    "sentiment_categories": [
        {
            "food_cost": "negative",
            "food_quality": "positive",
            "food_taste": "positive",
            "wait_time": "neutral"
        }
    ]
}

Specifying a JSON response format with detailed output¶

This example illustrates the use of the function’s response_format argument to return a structured response combined with show_details to get additional inference information

SELECT AI_COMPLETE(
    model => 'llama2-70b-chat',
    prompt => 'Extract structured data from this customer interaction note: Customer Sarah Jones complained about the mobile app crashing during checkout. She tried to purchase 3 items: a red XL jacket ($89.99), blue running shoes ($129.50), and a fitness tracker ($199.00). The app crashed after she entered her shipping address at 123 Main St, Portland OR, 97201. She has been a premium member since January 2024.',
    model_parameters => {
        'temperature': 0,
        'max_tokens': 4096
    },
    response_format => {
            'type':'json',
            'schema':{'type' : 'object','properties' : {'sentiment_categories':{'type':'array','items':{'type':'object','properties':
            {'food_quality' : {'type' : 'string'},'food_taste': {'type':'string'}, 'wait_time': {'type':'string'}, 'food_cost': {'type':'string'}},'required':['food_quality','food_taste' ,'wait_time','food_cost']}}}}
    },
    show_details => true
);

Copy

The response is a json object containing structured response with additional inference metadata.

{
    "created": 1738683744,
    "model": "mistral-large2",
    "structured_output": [
        {
            "raw_message": {
                "sentiment_categories": [
                    {
                        "food_cost": "negative",
                        "food_quality": "positive",
                        "food_taste": "positive",
                        "wait_time": "neutral"
                    }
                ]
            },
            "type": "json"
        }
    ],
    "usage": {
        "completion_tokens": 60,
        "prompt_tokens": 94,
        "total_tokens": 154
    }
}