Large Language Model (LLM) Functions (Snowflake Cortex)

Snowflake Cortex gives you instant access to industry-leading large language models (LLMs) trained by researchers at companies like Mistral, Meta, and Google. It also offers models that Snowflake has fine-tuned for specific use cases.

Since these LLMs are fully hosted and managed by Snowflake, using them requires no setup. Your data stays within Snowflake, giving you the performance, scalability, and governance you expect.

Snowflake Cortex features are provided as SQL functions and are also available in Python. The available functions are summarized below.

  • COMPLETE: Given a prompt, returns a response that completes the prompt. This function accepts either a single prompt or a conversation with multiple prompts and responses.

  • EXTRACT_ANSWER: Given a question and unstructured data, returns the answer to the question if it can be found in the data.

  • SENTIMENT: Returns a sentiment score, from -1 to 1, representing the detected positive or negative sentiment of the given text.

  • SUMMARIZE: Returns a summary of the given text.

  • TRANSLATE: Translates given text from any supported language to any other.

Required Privileges

The CORTEX_USER database role in the SNOWFLAKE database includes the privileges that allow users to call Snowflake Cortex LLM functions. By default, this database role is granted to only the ACCOUNTADMIN role. ACCOUNTADMIN must propagate this role to user roles in order to allow users to access Cortex LLM Functions.

The SNOWFLAKE.CORTEX_USER database role cannot be granted directly to a user. A user with the ACCOUNTADMIN role must first grant it to an account role, and then grant the account role to users. For more information, see Using SNOWFLAKE Database Roles.

In the following example, you assume ACCOUNTADMIN and grant the user some_user the CORTEX_USER database role via the account role cortex_user_role, which you create for this purpose.

USE ROLE ACCOUNTADMIN;

CREATE ROLE cortex_user_role;
GRANT DATABASE ROLE SNOWFLAKE.CORTEX_USER TO ROLE cortex_user_role;

GRANT ROLE cortex_user_role TO USER some_user;
Copy

You can also grant access to Snowflake Cortex LLM functions through existing roles commonly used by specific groups of users. (See User roles.) For example, if you have created an analyst role that is used as a default role by analysts in your organization, you can easily grant these users access to Snowflake Cortex LLM functions with a single GRANT statement.

GRANT DATABASE ROLE SNOWFLAKE.CORTEX_USER TO ROLE analyst;
Copy

The PUBLIC pseudo-role is automatically granted to all users and roles, so granting cortex_user_role to PUBLIC allows all users in your account to use the Snowflake Cortex LLM functions.

GRANT DATABASE ROLE SNOWFLAKE.CORTEX_USER TO ROLE PUBLIC;
Copy

Tip

Be mindful of the number of users to whom you are granting access and the impact their usage of Snowflake Cortex LLM functions may have on compute consumption. Establish policies around purpose of use (particularly of the most costly models) before granting widespread access to these features.

Expect users to explore the new features, potentially driving a temporary surge in cost, before settling into a more stable usage pattern.

Availability

Snowflake Cortex LLM functions are currently available in the following regions.

Function
(Model)
AWS US East
(N. Virginia)
AWS US West
(Oregon)
AWS Europe
(Frankfurt)
Azure East US 2
(Virginia)
Azure West Europe
(Netherlands)
COMPLETE
(mistral-large)

COMPLETE
(reka-flash)

COMPLETE
(mixtral-8x7b)

COMPLETE
(llama2-70b-chat)

COMPLETE
(mistral-7b)

COMPLETE
(gemma-7b)

EXTRACT_ANSWER

SENTIMENT

SUMMARIZE

TRANSLATE

Cost Considerations

Snowflake Cortex LLM functions incur compute cost based on the number of tokens processed. Refer to the consumption table for each function’s cost in credits per million tokens.

A token is the smallest unit of text processed by Snowflake Cortex LLM functions, approximately equal to four characters of text. The equivalence of raw input or output text to tokens can vary by model.

  • For functions that generate new text in the response (COMPLETE, SUMMARIZE, and TRANSLATE), both input and output tokens are counted.

  • For functions that only extract information from the input (EXTRACT_ANSWER and SENTIMENT), only input tokens are counted.

  • For EXTRACT_ANSWER, the number of billable tokens is the sum of the number of tokens in the from_text and question fields.

  • SUMMARIZE, TRANSLATE, EXTRACT_ANSWER, and SENTIMENT add a prompt to the input text in order to generate the response. As a result, the input token count is slightly higher than the number of tokens in the text you provide.

Snowflake recommends executing queries that call a Snowflake Cortex LLM Function with a smaller warehouse (no larger than MEDIUM) because larger warehouses do not increase performance. The cost associated with keeping a warehouse active will continue to apply when executing a query that calls a Snowflake Cortex LLM Function. For general information on compute costs, see Understanding compute cost.

Usage Quotas

To ensure a high standard of performance for all Snowflake customers, Snowflake Cortex LLM functions are subject to usage quotas beyond which requests may be throttled. Snowflake may adjust these quotas from time to time. The quotas in the table below are applied per account.

Function (Model)

Tokens processed per minute (TPM)

Rows processed per minute (RPM)

COMPLETE (mistral-large)

200,000

100

COMPLETE (reka-flash)

300,000

400

COMPLETE (mixtral-8x7b)

300,000

400

COMPLETE (llama2-70b-chat)

300,000

400

COMPLETE (mistral-7b)

300,000

500

COMPLETE (gemma-7b)

300,000

500

EXTRACT_ANSWER

1,000,000

3,000

SENTIMENT

1,000,000

5,000

SUMMARIZE

300,000

500

TRANSLATE

1,000,000

2,000

Note

On-demand Snowflake accounts without a valid payment method (such as trial accounts) are limited to roughly one credit per day in Snowflake Cortex LLM function usage. To remove this restriction, convert your trial account to a paid account.

Managing Costs and Throttling

During this preview, Snowflake recommends using a warehouse size no larger than MEDIUM when calling Snowflake Cortex LLM functions. Using a larger warehouse than necessary does not increase performance, but can result in unnecessary costs and a higher risk of throttling. This recommendation may not apply in the future due to upcoming product updates.

Model Restrictions

Models used by Snowflake Cortex have limitations on size as described in the table below. Sizes are given in tokens. Tokens generally correspond to words, but not all tokens are words, so the number of words corresponding to a limit is slightly less than the number of tokens. Inputs that exceed the limit result in an error.

Function

Model

Context window (tokens)

COMPLETE

mistral-large

32,000

reka-flash

100,000

mixtral-8x7b

32,000

llama2-70b-chat

4,096

mistral-7b

32,000

gemma-7b

8,000

EXTRACT_ANSWER

Snowflake managed model

2,048 for text
64 for question

SENTIMENT

Snowflake managed model

512

SUMMARIZE

Snowflake managed model

32,000

TRANSLATE

Snowflake managed model

1,024

Choosing a Model

The Snowflake Cortex COMPLETE function supports multiple models of varying capability, latency, and cost. These models have been carefully chosen to align with common customer use cases. To achieve the best performance per credit, choose a model that’s a good match for the content size and complexity of your task. Here are brief overviews of the available models.

  • mistral-large is Mistral AI’s most advanced large language model with top-tear reasoning capabilities. Mistral Large is ideal for complex tasks that require large reasoning capabilities or are highly specialized, such as synthetic text generation, code generation, or agents.

  • reka-flash is a highly capable multilingual language model optimized for fast workloads that require high quality, such as writing product descriptions or blog posts, coding, and extracting answers from documents with hundreds of pages.

  • mixtral-8x7b is ideal for text generation, classification, and question answering. Mistral models are optimized for low latency with low memory requirements, which translates into higher throughput for enterprise use cases.

  • llama2-70b-chat is well-suited to complex, large-scale tasks that require a moderate amount of reasoning, like extracting data or helping you to write job descriptions.

  • mistral-7b is ideal for your simplest summarization, structuration, and question answering tasks that need to be done quickly. It offers low latency and high throughput processing for multiple pages of text with its 32K context window.

  • gemma-7b is suitable for simple code and text completion tasks. It has a context window of 8,000 tokens but is surprisingly capable within that limit, and quite cost-effective.

If you’re not sure where to start, try the most powerful models first to establish a baseline to evaluate other models. mistral-large is the most capable model offered by Snowflake Cortex, and will give you a good idea what a state-of-the-art model can do.

To help you decide, the following table provides information on how popular models perform on various benchmarks, including the models offered by Snowflake Cortex COMPLETE as well as a few other popular models.

Model

Context Window
(Tokens)
MMLU
(Reasoning)
MT-Bench
(Instruction Following)
HumanEval
(Coding)
Spider 1.0
(SQL)

GPT 4 Turbo*

32,000

86.4

8.96

67

86.6

mistral-large

32,000

81.2

-

45.1

81

Claude 2*

100,000

78.5

8.06

71.2

-

reka-flash

100,000

73.5

8.2

65.2

-

mixtral-8x7b

32,000

70.6

8.30

40.2

-

GPT 3.5 Turbo*

4,097

70

8.39

48.1

-

llama-2-70b-chat

4,096

68.9

6.86

30.5

-

mistral-7b

32,000

62.5

6.84

26.2

-

gemma-7b

8,000

64.3

-

32.3

-

llama2-7b*

4,096

45.3

6.27

12.2

-

*Provided for comparison; not available in Snowflake Cortex COMPLETE.

LLM Functions Overview

COMPLETE

Given a prompt, the instruction-following COMPLETE function generates a response using your choice of language model. In the simplest use case, the prompt is a single string. You may also provide a conversation including multiple prompts and responses for interactive chat-style usage, and in this form of the function you can also specify hyperparameter options to customize the style and size of the output.

The COMPLETE function supports the following models. Different models can have different costs and quotas.

  • mistral-large

  • reka-flash

  • mixtral-8x7b

  • llama2-70b-chat

  • mistral-7b

  • gemma-7b

See COMPLETE (SNOWFLAKE.CORTEX) for syntax and examples.

EXTRACT_ANSWER

The EXTRACT_ANSWER function extracts an answer to a given question from a text document. The document may be a plain-English document or a string representation of a semi-structured (JSON) data object.

See EXTRACT_ANSWER (SNOWFLAKE.CORTEX) for syntax and examples.

SENTIMENT

The SENTIMENT function returns sentiment as a score between -1 to 1 (with -1 being the most negative and 1 the most positive, with values around 0 neutral) for the given English-language input text.

See SENTIMENT (SNOWFLAKE.CORTEX) for syntax and examples.

SUMMARIZE

The SUMMARIZE function returns a summary of the given English text.

See SUMMARIZE (SNOWFLAKE.CORTEX) for syntax and examples.

TRANSLATE

The TRANSLATE function translates text from the indicated or detected source language to a target language.

See TRANSLATE (SNOWFLAKE.CORTEX) for syntax and examples.

Error Conditions

Snowflake Cortex LLM functions can produce the following error messages.

Message

Explanation

too many requests

The request was rejected due to excessive system load. Please try your request again.

invalid options object

The options object passed to the function contains invalid options or values.

budget exceeded

The model consumption budget was exceeded.

unknown model "<model name>"

The specified model does not exist.

invalid language "<language>"

The specified language is not supported by the TRANSLATE function.

max tokens of <count> exceeded

The request exceeded the maximum number of tokens supported by the model (see Model Restrictions).

all requests were throttled by remote service

The number of requests exceeds the limit. Try again later.

Using Snowflake Cortex LLM Functions with Python

Snowflake Cortex LLM functions are available in Snowpark ML version 1.1.2 and later. See Installing Snowpark ML for instructions on setting up Snowpark ML.

If you run your Python script outside of Snowflake, you must create a Snowpark session to use these functions. See Connecting to Snowflake for instructions.

The following Python example illustrates calling Snowflake Cortex LLM functions on single values:

from snowflake.cortex import Complete, ExtractAnswer, Sentiment, Summarize, Translate

text = """
    The Snowflake company was co-founded by Thierry Cruanes, Marcin Zukowski,
    and Benoit Dageville in 2012 and is headquartered in Bozeman, Montana.
"""

print(Complete("llama2-70b-chat", "how do snowflakes get their unique patterns?"))
print(ExtractAnswer(text, "When was snowflake founded?"))
print(Sentiment("I really enjoyed this restaurant. Fantastic service!"))
print(Summarize(text))
print(Translate(text, "en", "fr"))
Copy

You can also call an LLM function on a table column, as shown below. This example requires a session object (stored in session) and a table articles containing a text column abstract_text, and creates a new column abstract_summary containing a summary of the abstract.

from snowflake.cortex import Summarize
from snowflake.snowpark.functions import col

article_df = session.table("articles")
article_df = article_df.withColumn(
    "abstract_summary",
    Summarize(col("abstract_text"))
)
article_df.collect()
Copy

Note

The advanced chat-style (multi-message) form of COMPLETE is not currently supported in Python.