Large Language Model (LLM) Functions (Snowflake Cortex)¶
Snowflake Cortex gives you instant access to industry-leading large language models (LLMs) trained by researchers at companies like Mistral, Reka, Meta, and Google, including Snowflake Arctic, an open enterprise-grade model developed by Snowflake.
Since these LLMs are fully hosted and managed by Snowflake, using them requires no setup. Your data stays within Snowflake, giving you the performance, scalability, and governance you expect.
Snowflake Cortex features are provided as SQL functions and are also available in Python. The available functions are summarized below.
COMPLETE: Given a prompt, returns a response that completes the prompt. This function accepts either a single prompt or a conversation with multiple prompts and responses.
EXTRACT_ANSWER: Given a question and unstructured data, returns the answer to the question if it can be found in the data.
SENTIMENT: Returns a sentiment score, from -1 to 1, representing the detected positive or negative sentiment of the given text.
SUMMARIZE: Returns a summary of the given text.
TRANSLATE: Translates given text from any supported language to any other.
Required Privileges¶
The CORTEX_USER database role in the SNOWFLAKE database includes the privileges that allow users to call Snowflake Cortex LLM functions. By default, this database role is granted to only the ACCOUNTADMIN role. ACCOUNTADMIN must propagate this role to user roles in order to allow users to access Cortex LLM Functions.
The SNOWFLAKE.CORTEX_USER database role cannot be granted directly to a user. A user with the ACCOUNTADMIN role must first grant it to an account role, and then grant the account role to users. For more information, see Using SNOWFLAKE Database Roles.
In the following example, you assume ACCOUNTADMIN and grant the user some_user
the CORTEX_USER database role via the
account role cortex_user_role
, which you create for this purpose.
USE ROLE ACCOUNTADMIN;
CREATE ROLE cortex_user_role;
GRANT DATABASE ROLE SNOWFLAKE.CORTEX_USER TO ROLE cortex_user_role;
GRANT ROLE cortex_user_role TO USER some_user;
You can also grant access to Snowflake Cortex LLM functions through existing roles commonly used by specific groups of
users. (See User roles.) For example, if you have created an analyst
role that is used
as a default role by analysts in your organization, you can easily grant these users access to Snowflake Cortex LLM
functions with a single GRANT statement.
GRANT DATABASE ROLE SNOWFLAKE.CORTEX_USER TO ROLE analyst;
The PUBLIC pseudo-role is automatically granted to all users and roles, so granting cortex_user_role
to PUBLIC
allows all users in your account to use the Snowflake Cortex LLM functions.
GRANT DATABASE ROLE SNOWFLAKE.CORTEX_USER TO ROLE PUBLIC;
Availability¶
Snowflake Cortex LLM functions are currently available in the following regions.
Function
(Model)
|
AWS US West 2
(Oregon)
|
AWS US East 1
(N. Virginia)
|
AWS Europe Central 1
(Frankfurt)
|
Azure East US 2
(Virginia)
|
Azure West Europe
(Netherlands)
|
---|---|---|---|---|---|
COMPLETE
(
snowflake-arctic ) |
✔ |
||||
COMPLETE
(
reka-flash ) |
✔ |
✔ |
|||
COMPLETE
(
mistral-large ) |
✔ |
✔ |
✔ |
||
COMPLETE
(
mixtral-8x7b ) |
✔ |
✔ |
✔ |
✔ |
✔ |
COMPLETE
(
llama2-70b-chat ) |
✔ |
✔ |
✔ |
✔ |
✔ |
COMPLETE
(
mistral-7b ) |
✔ |
✔ |
✔ |
✔ |
✔ |
COMPLETE
(
gemma-7b ) |
✔ |
✔ |
✔ |
✔ |
✔ |
EXTRACT_ANSWER |
✔ |
✔ |
✔ |
✔ |
✔ |
SENTIMENT |
✔ |
✔ |
✔ |
✔ |
✔ |
SUMMARIZE |
✔ |
✔ |
✔ |
✔ |
✔ |
TRANSLATE |
✔ |
✔ |
✔ |
✔ |
✔ |
Cost Considerations¶
Snowflake Cortex LLM functions incur compute cost based on the number of tokens processed. Refer to the consumption table for each function’s cost in credits per million tokens.
Note
Usage of Snowflake Arctic does not incur compute cost through June 3, 2024.
A token is the smallest unit of text processed by Snowflake Cortex LLM functions, approximately equal to four characters of text. The equivalence of raw input or output text to tokens can vary by model.
For functions that generate new text in the response (COMPLETE, SUMMARIZE, and TRANSLATE), both input and output tokens are counted.
For functions that only extract information from the input (EXTRACT_ANSWER and SENTIMENT), only input tokens are counted.
For EXTRACT_ANSWER, the number of billable tokens is the sum of the number of tokens in the
from_text
andquestion
fields.SUMMARIZE, TRANSLATE, EXTRACT_ANSWER, and SENTIMENT add a prompt to the input text in order to generate the response. As a result, the input token count is slightly higher than the number of tokens in the text you provide.
Snowflake recommends executing queries that call a Snowflake Cortex LLM Function with a smaller warehouse (no larger than MEDIUM) because larger warehouses do not increase performance. The cost associated with keeping a warehouse active will continue to apply when executing a query that calls a Snowflake Cortex LLM Function. For general information on compute costs, see Understanding compute cost.
Usage Quotas¶
To ensure that all Snowflake customers can access LLM capabilities, Snowflake Cortex LLM functions may be subject to throttling during periods of high utilization. Usage quotas are not applied at the account level.
Throttled requests will receive an error response and should be retried later.
Note
On-demand Snowflake accounts without a valid payment method (such as trial accounts) are limited to roughly one credit per day in Snowflake Cortex LLM function usage. To remove this restriction, convert your trial account to a paid account.
Managing Costs and Throttling¶
During this preview, Snowflake recommends using a warehouse size no larger than MEDIUM when calling Snowflake Cortex LLM functions. Using a larger warehouse than necessary does not increase performance, but can result in unnecessary costs and a higher risk of throttling. This recommendation may not apply in the future due to upcoming product updates.
Model Restrictions¶
Models used by Snowflake Cortex have limitations on size as described in the table below. Sizes are given in tokens. Tokens generally correspond to words, but not all tokens are words, so the number of words corresponding to a limit is slightly less than the number of tokens. Inputs that exceed the limit result in an error.
Function |
Model |
Context window (tokens) |
---|---|---|
COMPLETE |
|
4,096 |
|
32,000 |
|
|
100,000 |
|
|
32,000 |
|
|
4,096 |
|
|
32,000 |
|
|
8,000 |
|
EXTRACT_ANSWER |
Snowflake managed model |
2,048 for text
64 for question
|
SENTIMENT |
Snowflake managed model |
512 |
SUMMARIZE |
Snowflake managed model |
32,000 |
TRANSLATE |
Snowflake managed model |
1,024 |
Choosing a Model¶
The Snowflake Cortex COMPLETE function supports multiple models of varying capability, latency, and cost. These models have been carefully chosen to align with common customer use cases. To achieve the best performance per credit, choose a model that’s a good match for the content size and complexity of your task. Here are brief overviews of the available models.
snowflake-arctic
is Snowflake’s top-tier enterprise-focused LLM. Arctic excels at enterprise tasks such as SQL generation, coding and instruction following benchmarks.mistral-large
is Mistral AI’s most advanced large language model with top-tear reasoning capabilities. Mistral Large is ideal for complex tasks that require large reasoning capabilities or are highly specialized, such as synthetic text generation, code generation, or agents.reka-flash
is a highly capable multilingual language model optimized for fast workloads that require high quality, such as writing product descriptions or blog posts, coding, and extracting answers from documents with hundreds of pages.mixtral-8x7b
is ideal for text generation, classification, and question answering. Mistral models are optimized for low latency with low memory requirements, which translates into higher throughput for enterprise use cases.llama2-70b-chat
is well-suited to complex, large-scale tasks that require a moderate amount of reasoning, like extracting data or helping you to write job descriptions.mistral-7b
is ideal for your simplest summarization, structuration, and question answering tasks that need to be done quickly. It offers low latency and high throughput processing for multiple pages of text with its 32K context window.gemma-7b
is suitable for simple code and text completion tasks. It has a context window of 8,000 tokens but is surprisingly capable within that limit, and quite cost-effective.
If you’re not sure where to start, try the most powerful models first to establish a baseline to evaluate other models.
mistral-large
is the most capable model offered by Snowflake Cortex, and will give you a good idea what a
state-of-the-art model can do.
To help you decide, the following table provides information on how popular models perform on various benchmarks, including the models offered by Snowflake Cortex COMPLETE as well as a few other popular models.
Model |
Context Window
(Tokens)
|
MMLU
(Reasoning)
|
MT-Bench
(Instruction Following)
|
HumanEval
(Coding)
|
Spider 1.0
(SQL)
|
---|---|---|---|---|---|
4,096 |
67.3 |
- |
64.3 |
79.0 |
|
32,000 |
86.4 |
8.96 |
67 |
86.6 |
|
32,000 |
81.2 |
- |
45.1 |
81 |
|
100,000 |
78.5 |
8.06 |
71.2 |
- |
|
100,000 |
73.5 |
8.2 |
65.2 |
- |
|
32,000 |
70.6 |
8.30 |
40.2 |
- |
|
4,097 |
70 |
8.39 |
48.1 |
- |
|
4,096 |
68.9 |
6.86 |
30.5 |
- |
|
32,000 |
62.5 |
6.84 |
26.2 |
- |
|
8,000 |
64.3 |
- |
32.3 |
- |
|
4,096 |
45.3 |
6.27 |
12.2 |
- |
*Provided for comparison; not available in Snowflake Cortex COMPLETE.
LLM Functions Overview¶
COMPLETE¶
Given a prompt, the instruction-following COMPLETE function generates a response using your choice of language model. In the simplest use case, the prompt is a single string. You may also provide a conversation including multiple prompts and responses for interactive chat-style usage, and in this form of the function you can also specify hyperparameter options to customize the style and size of the output.
The COMPLETE function supports the following models. Different models can have different costs.
snowflake-arctic
mistral-large
reka-flash
mixtral-8x7b
llama2-70b-chat
mistral-7b
gemma-7b
See COMPLETE (SNOWFLAKE.CORTEX) for syntax and examples.
EXTRACT_ANSWER¶
The EXTRACT_ANSWER function extracts an answer to a given question from a text document. The document may be a plain-English document or a string representation of a semi-structured (JSON) data object.
See EXTRACT_ANSWER (SNOWFLAKE.CORTEX) for syntax and examples.
SENTIMENT¶
The SENTIMENT function returns sentiment as a score between -1 to 1 (with -1 being the most negative and 1 the most positive, with values around 0 neutral) for the given English-language input text.
See SENTIMENT (SNOWFLAKE.CORTEX) for syntax and examples.
SUMMARIZE¶
The SUMMARIZE function returns a summary of the given English text.
See SUMMARIZE (SNOWFLAKE.CORTEX) for syntax and examples.
TRANSLATE¶
The TRANSLATE function translates text from the indicated or detected source language to a target language.
See TRANSLATE (SNOWFLAKE.CORTEX) for syntax and examples.
Error Conditions¶
Snowflake Cortex LLM functions can produce the following error messages.
Message |
Explanation |
---|---|
|
The request was rejected due to excessive system load. Please try your request again. |
|
The |
|
The model consumption budget was exceeded. |
|
The specified model does not exist. |
|
The specified language is not supported by the TRANSLATE function. |
|
The request exceeded the maximum number of tokens supported by the model (see Model Restrictions). |
|
The request has been throttled due to a high level of usage. Try again later. |
Using Snowflake Cortex LLM Functions with Python¶
Snowflake Cortex LLM functions are available in Snowpark ML version 1.1.2 and later. See Installing Snowpark ML for instructions on setting up Snowpark ML.
If you run your Python script outside of Snowflake, you must create a Snowpark session to use these functions. See Connecting to Snowflake for instructions.
The following Python example illustrates calling Snowflake Cortex LLM functions on single values:
from snowflake.cortex import Complete, ExtractAnswer, Sentiment, Summarize, Translate
text = """
The Snowflake company was co-founded by Thierry Cruanes, Marcin Zukowski,
and Benoit Dageville in 2012 and is headquartered in Bozeman, Montana.
"""
print(Complete("llama2-70b-chat", "how do snowflakes get their unique patterns?"))
print(ExtractAnswer(text, "When was snowflake founded?"))
print(Sentiment("I really enjoyed this restaurant. Fantastic service!"))
print(Summarize(text))
print(Translate(text, "en", "fr"))
You can also call an LLM function on a table column, as shown below. This example requires a session object (stored in
session
) and a table articles
containing a text column abstract_text
, and creates a new column
abstract_summary
containing a summary of the abstract.
from snowflake.cortex import Summarize
from snowflake.snowpark.functions import col
article_df = session.table("articles")
article_df = article_df.withColumn(
"abstract_summary",
Summarize(col("abstract_text"))
)
article_df.collect()
Note
The advanced chat-style (multi-message) form of COMPLETE is not currently supported in Python.
Legal Notices¶
Snowflake Cortex LLM Functions are powered by machine learning technology, including Meta’s LLaMA 2 and Google’s Gemma 7B models.
The foundation LLaMA 2 model is licensed under the LLaMA 2 Community License and is Copyright (c) Meta Platforms, Inc. All Rights Reserved. Your use of any LLM Functions based on the LLama 2 model is subject to Meta’s Acceptable Use Policy.
The foundation Gemma 7B model is licensed under the Gemma Terms of Use, and use of it is subject to the Gemma Prohibited Use Policy.
Machine learning technology and results provided may be inaccurate, inappropriate, or biased. Decisions based on machine learning outputs, including those built into automatic pipelines, should have human oversight and review processes to ensure model-generated content is accurate.
LLM function queries are treated like any other SQL query and may be considered metadata.
For further information, see Snowflake AI Trust and Safety FAQ.