- Categories:
String & Binary Functions (Large Language Model)
COMPLETE (SNOWFLAKE.CORTEX)¶
Given a prompt, generates a response (completion) using your choice of supported language model.
Syntax¶
SNOWFLAKE.CORTEX.COMPLETE(
<model>, <prompt_or_history> [ , <options> ] )
Arguments¶
Required:
model
A string specifying the model to be used. This must be one of the following values.
'mistral-large'
'reka-flash'
'mixtral-8x7b'
'llama2-70b-chat'
'mistral-7b'
'gemma-7b'
prompt_or_history
The prompt or conversation history to be used to generate a completion.
If
options
is not present, the prompt given must be a string.If
options
is present, the argument must be an array of objects representing a conversation in chronological order. Each object must contain arole
key and acontent
key. Thecontent
value is a prompt or a response, depending on the role. The role must be one of the following.role
Valuecontent
Valuesystem
An initial plain-English prompt to the language model to provide it with background information and instructions for a response style. For example, “Respond in the style of a pirate.” The model does not generate a response to a system prompt. Only one system prompt may be provided, and if it is present, it must be the first inthe array.
'user'
A prompt provided by the user. Must follow the system prompt (if there is one) or an assistant response.
'assistant'
A response previously provided by the language model. Must follow a user prompt. Past responses can be used to provide a stateful conversational experience; see Usage Notes.
Optional:
options
An object containing zero or more of the following options that affect the model’s hyperparameters. See LLM Settings.
temperature
: A value from 0 to 1 (inclusive) that controls the randomness of the output of the language model. A higher temperature (for example, 0.7) results in more diverse and random output, while a lower temperature (such as 0.2) makes the output more deterministic and focused.top_p
: A value from 0 to 1 (inclusive) that controls the randomness and diversity of the language model, generally used as an alternative totemperature
. The difference is thattop_p
restricts the set of possible tokens that the model outputs, whiletemperature
influences which tokens are chosen at each step.max_tokens
: Sets the maximum number of output tokens in the response. Small values can result in truncated responses.
Specifying the
options
argument, even if it is an empty object ({}
), affects how theprompt
argument is interpreted and how the response is formatted.
Returns¶
When the options
argument is not specified, a string.
When the options
argument is given, a string representation of a JSON object containing the following keys.
"choices"
: An array of the model’s responses. (Currently, only one response is provided.) Each response is an object containing a"messages"
key whose value is the model’s response to the latest prompt."created"
: UNIX timestamp (seconds since midnight, January 1, 1970) when the response was generated."model"
: The name of the model that created the response."usage"
: An object recording the number of tokens consumed and generated by this completion. Includes the following sub-keys:"completion_tokens"
: The number of tokens in the generated response."prompt_tokens"
: The number of tokens in the prompt.total_tokens"
: The total number of tokens consumed, which is the sum of the other two values.
Access Control¶
Users must use a role that has been granted the SNOWFLAKE.CORTEX_USER database role. See Required Privileges for more information on granting this privilege.
Usage Notes¶
COMPLETE does not retain any state from one call to the next. To use the COMPLETE function to provide a stateful,
conversational experience, pass all previous user prompts and model responses in the conversation as part of the prompt_or_history
array. (See Templates for Chat Models.)
Keep in mind that the number of tokens processed increases for each “round,” and costs increase proportionally.
Examples¶
Single Response¶
To generate a single response:
SELECT SNOWFLAKE.CORTEX.COMPLETE('mistral-large', 'What are large language models?');
Responses from Table Column¶
The following example generates a response from each row of a table (in this example, content
is a column from
the reviews
table). The reviews
table contains a column named review_content
containing the text of
reviews submitted by users. The query returns a critique of each review.
SELECT SNOWFLAKE.CORTEX.COMPLETE(
'mistral-large',
CONCAT('Critique this review in bullet points: <review>', content, '</review>')
) FROM reviews LIMIT 10;
Tip
As shown in this example, you can use tagging in the prompt to control the kind of response generated. See A guide to prompting LLaMA 2 for tips.
Controlling Temperature and Tokens¶
This example illustrates the use of the function’s options
argument to control the inference hyperparameters in a
single response. Note that in this form of the function, the prompt must be provided as an array, since this form
supports multiple prompts and responses.
SELECT SNOWFLAKE.CORTEX.COMPLETE(
'llama2-70b-chat',
[
{
'role': 'user',
'content': 'how does a snowflake get its unique pattern?'
}
],
{
'temperature': 0.7,
'max_tokens': 10
}
);
The response is a JSON object containing the message from the language model and other information. Note that the response
is truncated as instructed in the options
argument.
{
"choices": [
{
"messages": " The unique pattern on a snowflake is"
}
],
"created": 1708536426,
"model": "llama2-70b-chat",
"usage": {
"completion_tokens": 10,
"prompt_tokens": 22,
"total_tokens": 32
}
}
Example: Providing a System Prompt¶
This example illustrates the use of a system prompt to provide a sentiment analysis of movie reviews. The prompt
argument here is an array of objects, each having an appropriate role
value.
SELECT SNOWFLAKE.CORTEX.COMPLETE(
'llama2-70b-chat',
[
{'role': 'system', 'content': 'You are a helpful AI assistant. Analyze the movie review text and determine the overall sentiment. Answer with just \"Positive\", \"Negative\", or \"Neutral\"' },
{'role': 'user', 'content': 'this was really good'}
], {}
) as response;
The response is a JSON object containing the response from the language model and other information.
{
"choices": [
{
"messages": " Positive"
}
],
"created": 1708479449,
"model": "llama2-70b-chat",
"usage": {
"completion_tokens": 3,
"prompt_tokens": 64,
"total_tokens": 67
}
}
Legal Notices¶
Snowflake Cortex LLM Functions are powered by machine learning technology, including Meta’s LLaMA 2 and Google’s Gemma 7B models.
The foundation LLaMA 2 model is licensed under the LLaMA 2 Community License and is Copyright (c) Meta Platforms, Inc. All Rights Reserved. Your use of any LLM Functions based on the LLama 2 model is subject to Meta’s Acceptable Use Policy.
The foundation Gemma 7B model is licensed under the Gemma Terms of Use, and use of it is subject to the Gemma Prohibited Use Policy.
Machine learning technology and results provided may be inaccurate, inappropriate, or biased. Decisions based on machine learning outputs, including those built into automatic pipelines, should have human oversight and review processes to ensure model-generated content is accurate.
LLM function queries are treated like any other SQL query and may be considered metadata.
For further information, see Snowflake AI Trust and Safety FAQ.