Large Language Model (LLM) Functions (Snowflake Cortex)¶
Snowflake Cortex gives you instant access to industry-leading large language models (LLMs) trained by researchers at companies like Mistral, Reka, Meta, and Google, including Snowflake Arctic, an open enterprise-grade model developed by Snowflake.
Since these LLMs are fully hosted and managed by Snowflake, using them requires no setup. Your data stays within Snowflake, giving you the performance, scalability, and governance you expect.
Snowflake Cortex features are provided as SQL functions and are also available in Python. The available functions are summarized below.
COMPLETE: Given a prompt, returns a response that completes the prompt. This function accepts either a single prompt or a conversation with multiple prompts and responses.
EMBED_TEXT_768: Given a piece of text, returns a vector embedding that represents that text.
EXTRACT_ANSWER: Given a question and unstructured data, returns the answer to the question if it can be found in the data.
SENTIMENT: Returns a sentiment score, from -1 to 1, representing the detected positive or negative sentiment of the given text.
SUMMARIZE: Returns a summary of the given text.
TRANSLATE: Translates given text from any supported language to any other.
Required Privileges¶
The CORTEX_USER database role in the SNOWFLAKE database includes the privileges that allow users to call Snowflake Cortex LLM functions. By default, the CORTEX_USER role is granted to the PUBLIC role. The PUBLIC role is automatically granted to all users and roles, so this allows all users in your account to use the Snowflake Cortex LLM functions.
If you don’t want all users to have this privilege, you can revoke access to the PUBLIC role and grant access to specific roles.
To revoke the CORTEX_USER database role from the PUBLIC role, run the following command using the ACCOUNTADMIN role:
REVOKE DATABASE ROLE SNOWFLAKE.CORTEX_USER
FROM ROLE PUBLIC;
You can then selectively provide access to specific roles. The SNOWFLAKE.CORTEX_USER database role cannot be granted directly to a user.
For more information, see Using SNOWFLAKE Database Roles. A user with the ACCOUNTADMIN role can grant this role to a custom role in order to allow
users to access Cortex LLM Functions. In the following example, use the ACCOUNTADMIN role and grant the user some_user
the
CORTEX_USER database role via the account role cortex_user_role
, which you create for this purpose.
USE ROLE ACCOUNTADMIN;
CREATE ROLE cortex_user_role;
GRANT DATABASE ROLE SNOWFLAKE.CORTEX_USER TO ROLE cortex_user_role;
GRANT ROLE cortex_user_role TO USER some_user;
You can also grant access to Snowflake Cortex LLM functions through existing roles commonly used by specific groups of
users. (See User roles.) For example, if you have created an analyst
role that is used
as a default role by analysts in your organization, you can easily grant these users access to Snowflake Cortex LLM
functions with a single GRANT statement.
GRANT DATABASE ROLE SNOWFLAKE.CORTEX_USER TO ROLE analyst;
Availability¶
Snowflake Cortex LLM functions are currently available in the following regions.
Function
(Model)
|
AWS US West 2
(Oregon)
|
AWS US East 1
(N. Virginia)
|
AWS Europe Central 1
(Frankfurt)
|
Azure East US 2
(Virginia)
|
Azure West Europe
(Netherlands)
|
---|---|---|---|---|---|
COMPLETE
(
llama3-8b ) |
✔ |
||||
COMPLETE
(
llama3-70b ) |
✔ |
||||
COMPLETE
(
snowflake-arctic ) |
✔ |
||||
COMPLETE
(
reka-core ) |
✔ |
||||
COMPLETE
(
reka-flash ) |
✔ |
✔ |
|||
COMPLETE
(
mistral-large ) |
✔ |
✔ |
✔ |
||
COMPLETE
(
mixtral-8x7b ) |
✔ |
✔ |
✔ |
✔ |
✔ |
COMPLETE
(
llama2-70b-chat ) |
✔ |
✔ |
✔ |
✔ |
✔ |
COMPLETE
(
mistral-7b ) |
✔ |
✔ |
✔ |
✔ |
✔ |
COMPLETE
(
gemma-7b ) |
✔ |
✔ |
✔ |
✔ |
✔ |
EMBED_TEXT_768 |
✔ |
✔ |
✔ |
✔ |
✔ |
EXTRACT_ANSWER |
✔ |
✔ |
✔ |
✔ |
✔ |
SENTIMENT |
✔ |
✔ |
✔ |
✔ |
✔ |
SUMMARIZE |
✔ |
✔ |
✔ |
✔ |
✔ |
TRANSLATE |
✔ |
✔ |
✔ |
✔ |
✔ |
Cost Considerations¶
Snowflake Cortex LLM functions incur compute cost based on the number of tokens processed. Refer to the consumption table for each function’s cost in credits per million tokens.
Note
Usage of Snowflake Arctic does not incur compute cost through June 3, 2024.
A token is the smallest unit of text processed by Snowflake Cortex LLM functions, approximately equal to four characters of text. The equivalence of raw input or output text to tokens can vary by model.
For functions that generate new text in the response (COMPLETE, SUMMARIZE, and TRANSLATE), both input and output tokens are counted.
For the EMBED_TEXT_768 function (in preview), only input tokens are counted.
For functions that only extract information from the input (EXTRACT_ANSWER and SENTIMENT), only input tokens are counted.
For EXTRACT_ANSWER, the number of billable tokens is the sum of the number of tokens in the
from_text
andquestion
fields.SUMMARIZE, TRANSLATE, EXTRACT_ANSWER, and SENTIMENT add a prompt to the input text in order to generate the response. As a result, the input token count is slightly higher than the number of tokens in the text you provide.
Snowflake recommends executing queries that call a Snowflake Cortex LLM Function with a smaller warehouse (no larger than MEDIUM) because larger warehouses do not increase performance. The cost associated with keeping a warehouse active will continue to apply when executing a query that calls a Snowflake Cortex LLM Function. For general information on compute costs, see Understanding compute cost.
Track costs for AI services¶
To track credits used for AI Services including LLM Functions in your account, use the METERING_HISTORY View:
SELECT *
FROM snowflake.account_usage.metering_daily_history
WHERE SERVICE_TYPE='AI_SERVICES'
Usage Quotas¶
To ensure that all Snowflake customers can access LLM capabilities, Snowflake Cortex LLM functions may be subject to throttling during periods of high utilization. Usage quotas are not applied at the account level.
Throttled requests will receive an error response and should be retried later.
Note
On-demand Snowflake accounts without a valid payment method (such as trial accounts) are limited to roughly one credit per day in Snowflake Cortex LLM function usage. To remove this restriction, convert your trial account to a paid account.
Managing Costs and Throttling¶
Snowflake recommends using a warehouse size no larger than MEDIUM when calling Snowflake Cortex LLM functions. Using a larger warehouse than necessary does not increase performance, but can result in unnecessary costs and a higher risk of throttling. This recommendation may not apply in the future due to upcoming product updates.
Model Restrictions¶
Models used by Snowflake Cortex have limitations on size as described in the table below. Sizes are given in tokens. Tokens generally correspond to words, but not all tokens are words, so the number of words corresponding to a limit is slightly less than the number of tokens. Inputs that exceed the limit result in an error.
Function |
Model |
Context window (tokens) |
---|---|---|
COMPLETE |
|
4,096 |
|
32,000 |
|
|
100,000 |
|
|
32,000 |
|
|
32,000 |
|
|
4,096 |
|
|
8,000 |
|
|
8,000 |
|
|
32,000 |
|
|
8,000 |
|
EMBED_TEXT_768 |
|
512 |
EXTRACT_ANSWER |
Snowflake managed model |
2,048 for text
64 for question
|
SENTIMENT |
Snowflake managed model |
512 |
SUMMARIZE |
Snowflake managed model |
32,000 |
TRANSLATE |
Snowflake managed model |
1,024 |
Choosing a Model¶
The Snowflake Cortex COMPLETE function supports multiple models of varying capability, latency, and cost. These models have been carefully chosen to align with common customer use cases. To achieve the best performance per credit, choose a model that’s a good match for the content size and complexity of your task. Here are brief overviews of the available models.
Large models¶
If you’re not sure where to start, try the most capable models first to establish a baseline to evaluate other models.
reka-core
, llama3-70b
,and mistral-large
are the most capable models offered by Snowflake Cortex,
and will give you a good idea what a state-of-the-art model can do.
reka-core
is Reka AI’s most advanced large language model with strong reasoning abilities, code generation, and multilingual fluency.llama3-70b
is an open source model that delivers state of the art performance ideal for chat applications, content creation, and enterprise applications.mistral-large
is Mistral AI’s most advanced large language model with top-tier reasoning capabilities. Mistral Large is ideal for complex tasks that require large reasoning capabilities or are highly specialized, such as synthetic text generation, code generation, or agents.
Medium models¶
snowflake-arctic
is Snowflake’s top-tier enterprise-focused LLM. Arctic excels at enterprise tasks such as SQL generation, coding and instruction following benchmarks.reka-flash
is a highly capable multilingual language model optimized for fast workloads that require high quality, such as writing product descriptions or blog posts, coding, and extracting answers from documents with hundreds of pages.mixtral-8x7b
is ideal for text generation, classification, and question answering. Mistral models are optimized for low latency with low memory requirements, which translates into higher throughput for enterprise use cases.llama2-70b-chat
is well-suited to tasks that require a low to moderate amount of reasoning, like extracting data or helping you to write job descriptions.
Small models¶
llama3-8b
is ideal for tasks that require low to moderate reasoning with better accuracy than thellama2-70b-chat
, like text classification, summarization, and sentiment analysis.mistral-7b
is ideal for your simplest summarization, structuration, and question answering tasks that need to be done quickly. It offers low latency and high throughput processing for multiple pages of text with its 32K context window.gemma-7b
is suitable for simple code and text completion tasks. It has a context window of 8,000 tokens but is surprisingly capable within that limit, and quite cost-effective.
The following table provides information on how popular models perform on various benchmarks, including the models offered by Snowflake Cortex COMPLETE as well as a few other popular models.
Model |
Context Window
(Tokens)
|
MMLU
(Reasoning)
|
MT-Bench
(Instruction Following)
|
HumanEval
(Coding)
|
Spider 1.0
(SQL)
|
---|---|---|---|---|---|
4,096 |
67.3 |
- |
64.3 |
79.0 |
|
32,000 |
86.4 |
8.96 |
67 |
86.6 |
|
32,000 |
83.2 |
- |
76.8 |
- |
|
8,000 |
82 |
- |
81.7 |
80.2 |
|
32,000 |
81.2 |
- |
45.1 |
81 |
|
100,000 |
78.5 |
8.06 |
71.2 |
- |
|
100,000 |
75.9 |
8.2 |
72 |
- |
|
32,000 |
70.6 |
8.30 |
40.2 |
- |
|
4,097 |
70 |
8.39 |
48.1 |
- |
|
4,096 |
68.9 |
6.86 |
30.5 |
- |
|
8,000 |
68.4 |
- |
62.2 |
69.9 |
|
32,000 |
62.5 |
6.84 |
26.2 |
- |
|
8,000 |
64.3 |
- |
32.3 |
- |
|
4,096 |
45.3 |
6.27 |
12.2 |
- |
*Provided for comparison; not available in Snowflake Cortex COMPLETE.
LLM Functions Overview¶
COMPLETE¶
Given a prompt, the instruction-following COMPLETE function generates a response using your choice of language model. In the simplest use case, the prompt is a single string. You may also provide a conversation including multiple prompts and responses for interactive chat-style usage, and in this form of the function you can also specify hyperparameter options to customize the style and size of the output.
The COMPLETE function supports the following models. Different models can have different costs.
snowflake-arctic
mistral-large
reka-flash
reka-core
mixtral-8x7b
llama2-70b-chat
llama3-8b
llama3-70b
mistral-7b
gemma-7b
See COMPLETE (SNOWFLAKE.CORTEX) for syntax and examples.
EMBED_TEXT_768¶
Note
The EMBED_TEXT_768 is in preview.
The EMBED_TEXT_768 function creates a vector embedding for a given English-language text. To learn more about embeddings and vector comparison functions, see Vector Embeddings.
For syntax and examples, see EMBED_TEXT_768 (SNOWFLAKE.CORTEX).
EXTRACT_ANSWER¶
The EXTRACT_ANSWER function extracts an answer to a given question from a text document. The document may be a plain-English document or a string representation of a semi-structured (JSON) data object.
See EXTRACT_ANSWER (SNOWFLAKE.CORTEX) for syntax and examples.
SENTIMENT¶
The SENTIMENT function returns sentiment as a score between -1 to 1 (with -1 being the most negative and 1 the most positive, with values around 0 neutral) for the given English-language input text.
See SENTIMENT (SNOWFLAKE.CORTEX) for syntax and examples.
SUMMARIZE¶
The SUMMARIZE function returns a summary of the given English text.
See SUMMARIZE (SNOWFLAKE.CORTEX) for syntax and examples.
TRANSLATE¶
The TRANSLATE function translates text from the indicated or detected source language to a target language.
See TRANSLATE (SNOWFLAKE.CORTEX) for syntax and examples.
Error Conditions¶
Snowflake Cortex LLM functions can produce the following error messages.
Message |
Explanation |
---|---|
|
The request was rejected due to excessive system load. Please try your request again. |
|
The |
|
The model consumption budget was exceeded. |
|
The specified model does not exist. |
|
The specified language is not supported by the TRANSLATE function. |
|
The request exceeded the maximum number of tokens supported by the model (see Model Restrictions). |
|
The request has been throttled due to a high level of usage. Try again later. |
Using Snowflake Cortex LLM Functions with Python¶
Snowflake Cortex LLM functions are available in Snowpark ML version 1.1.2 and later. See Installing Snowpark ML for instructions on setting up Snowpark ML.
If you run your Python script outside of Snowflake, you must create a Snowpark session to use these functions. See Connecting to Snowflake for instructions.
The following Python example illustrates calling Snowflake Cortex LLM functions on single values:
from snowflake.cortex import Complete, ExtractAnswer, Sentiment, Summarize, Translate
text = """
The Snowflake company was co-founded by Thierry Cruanes, Marcin Zukowski,
and Benoit Dageville in 2012 and is headquartered in Bozeman, Montana.
"""
print(Complete("llama2-70b-chat", "how do snowflakes get their unique patterns?"))
print(ExtractAnswer(text, "When was snowflake founded?"))
print(Sentiment("I really enjoyed this restaurant. Fantastic service!"))
print(Summarize(text))
print(Translate(text, "en", "fr"))
You can also call an LLM function on a table column, as shown below. This example requires a session object (stored in
session
) and a table articles
containing a text column abstract_text
, and creates a new column
abstract_summary
containing a summary of the abstract.
from snowflake.cortex import Summarize
from snowflake.snowpark.functions import col
article_df = session.table("articles")
article_df = article_df.withColumn(
"abstract_summary",
Summarize(col("abstract_text"))
)
article_df.collect()
Note
The advanced chat-style (multi-message) form of COMPLETE is not currently supported in Python.
Legal Notices¶
Refer to Snowflake AI Features.