Categories:

String & binary functions (Large Language Model)

COUNT_TOKENS (SNOWFLAKE.CORTEX)

Returns the number of tokens in a prompt for the large language model or the task-specific function specified in the argument. This function does not support fine-tuned models.

Syntax

SNOWFLAKE.CORTEX.COUNT_TOKENS( <model_name> , <input_text> )

SNOWFLAKE.CORTEX.COUNT_TOKENS( <function_name> , <input_text> )
Copy

Arguments

Required:

model_name

Name of the model you want to base the token count on. Specify one of the following values:

  • snowflake-arctic

  • mistral-large

  • reka-flash

  • reka-core

  • mixtral-8x7b

  • llama2-70b-chat

  • llama3-8b

  • llama3-70b

  • mistral-7b

  • gemma-7b

  • snowflake-arctic-embed-m

  • e5-base-v2

  • nv-embed-qa-4

function_name

Name of one of the task-specific functions. Specify one of the following values:

  • extract_answer

  • sentiment

  • summarize

  • translate

Note that the function names must be lower-case.

input_text

Input text to count the tokens in.

Returns

Returns an INT , INTEGER , BIGINT , SMALLINT , TINYINT , BYTEINT type that is the number of tokens in the input text based on the model or function specified.

Usage notes

  • If a function name is specified, the token count is based on the model used by the function.

  • When specifying a function name, use lower-case letters.

Examples

Get the token count for the prompt what is a large language model? based on the snowflake-arctic model:

SELECT SNOWFLAKE.CORTEX.COUNT_TOKENS( 'snowflake-arctic', 'what is a large language model?' );
Copy
+---+
| 6 |
+---+

Get the token count for each of the prompts in the prompt column of mytable based on the model used for the SUMMARIZE function:

SELECT SNOWFLAKE.CORTEX.COUNT_TOKENS('SUMMARIZE', prompt) FROM mydb.myschema.mytable LIMIT 10;
Copy
+-----------+
| 1 |  1932 |
+-----------+
| 2 |  2379 |
+-----------+
| 3 |  2185 |
+-----------+
| 4 |  1195 |
+-----------+
| 5 |  2908 |
+-----------+
| 6 |  2601 |
+-----------+
| 7 |  2122 |
+-----------+
| 8 |  1720 |
+-----------+
| 9 |  2512 |
+-----------+
| 10 | 1510 |
+-----------+

Get the token count for a text you want translated:

SELECT SNOWFLAKE.CORTEX.COUNT_TOKENS('translate', 'Dies ist ein kurzer Text.');
Copy
+---+
| 9 |
+---+