Vector Embeddings

An embedding refers to the reduction of high-dimensional data, such as unstructured text, to a representation with fewer dimensions, such as a vector. Modern deep learning techniques can create vector embeddings, which are structured numerical representations, from unstructured data such as text and images, preserving semantic notions of similarity and dissimilarity in the geometry of the vectors they produce.

The illustration below is a simplified example of the vector embedding and geometric similarity of natural language text. In practice, neural networks produce embedding vectors with hundreds or even thousands of dimensions, not two as shown here, but the concept is the same. Semantically similar text yields vectors that “point” in the same general direction.

A two-dimensional example of vector similarity

Many applications can benefit from the ability to find text or images similar to a target. For example, when a new support case is logged at a help desk, the support team can benefit from the ability to find similar cases that have already been resolved. The advantage of using embedding vectors in this application is that it goes beyond keyword matching to semantic similarity, so related records can be found even if they don’t contain exactly the same words.

Snowflake Cortex offers the EMBED_TEXT_768 (SNOWFLAKE.CORTEX) function to create embeddings as well as several Vector Similarity Functions to compare them for various applications.

About vector similarity functions

The measurement of similarity between vectors is a fundamental operation in semantic comparison. Snowflake Cortex provides three vector similarity functions: VECTOR_INNER_PRODUCT, VECTOR_L2_DISTANCE, and VECTOR_COSINE_SIMILARITY. To learn more about these functions, see Vector Similarity Functions.

For syntax and usage details, see the reference page for each function:

Examples

The following examples use the vector similarity functions.

This SQL example uses the VECTOR_INNER_PRODUCT function to determine which vectors in the table are closest to each other between columns a and b:

CREATE TABLE vectors (a VECTOR(float, 3), b VECTOR(float, 3));
INSERT INTO vectors SELECT [1.1,2.2,3]::VECTOR(FLOAT,3), [1,1,1]::VECTOR(FLOAT,3);
INSERT INTO vectors SELECT [1,2.2,3]::VECTOR(FLOAT,3), [4,6,8]::VECTOR(FLOAT,3);

-- Compute the pairwise inner product between columns a and b
SELECT VECTOR_INNER_PRODUCT(a, b) FROM vectors;
Copy
+------+
| 6.3  |
|------|
| 41.2 |
+------+

This SQL example calls the VECTOR_COSINE_SIMILARITY function to find the vector closes to [1,2,3]:

SELECT a, VECTOR_COSINE_SIMILARITY(a, [1,2,3]::VECTOR(FLOAT, 3)) AS similarity
    FROM vectors
ORDER BY similarity DESC
LIMIT 1;
Copy
+-------------------------+
| [1, 2.2, 3] | 0.9990... |
+-------------------------+

Snowflake Python Connector

These examples show how to use the VECTOR data type and vector similarity functions with the Python Connector.

Note

Support for the VECTOR type was introduced in version 3.6 of the Snowflake Python Connector.

import snowflake.connector

conn = ... # Set up connection
cur = conn.cursor()

# Create a table and insert some vectors
cur.execute("CREATE OR REPLACE TABLE vectors (a VECTOR(FLOAT, 3), b VECTOR(FLOAT, 3))")
values = [([1.1, 2.2, 3], [1, 1, 1]), ([1, 2.2, 3], [4, 6, 8])]
for row in values:
        cur.execute(f"""
            INSERT INTO vectors(a, b)
                SELECT {row[0]}::VECTOR(FLOAT,3), {row[1]}::VECTOR(FLOAT,3)
        """)

# Compute the pairwise inner product between columns a and b
cur.execute("SELECT VECTOR_INNER_PRODUCT(a, b) FROM vectors")
print(cur.fetchall())
Copy
[(6.30...,), (41.2...,)]
# Find the closest vector to [1,2,3]
cur.execute(f"""
    SELECT a, VECTOR_COSINE_SIMILARITY(a, {[1,2,3]}::VECTOR(FLOAT, 3))
        AS similarity
        FROM vectors
        ORDER BY similarity DESC
        LIMIT 1;
""")
print(cur.fetchall())
Copy
[([1.0, 2.2..., 3.0], 0.9990...)]

Snowpark Python

These examples show how to use the VECTOR data type and vector similarity functions with the Snowpark Python Library.

Note

  • Support for the VECTOR type was introduced in version 1.11 of Snowpark Python.

  • The Snowpark Python library does not support the VECTOR_COSINE_SIMILARITY function.

from snowflake.snowpark import Session, Row
session = ... # Set up session
from snowflake.snowpark.types import VectorType, StructType, StructField
from snowflake.snowpark.functions import col, lit, vector_l2_distance
schema = StructType([StructField("vec", VectorType(int, 3))])
data = [Row([1, 2, 3]), Row([4, 5, 6]), Row([7, 8, 9])]
df = session.create_dataframe(data, schema)
df.select(
    "vec",
    vector_l2_distance(df.vec, lit([1, 2, 2]).cast(VectorType(int, 3))).as_("dist"),
).sort("dist").limit(1).show()
Copy
----------------------
|"VEC"      |"DIST"  |
----------------------
|[1, 2, 3]  |1.0     |
----------------------

Create vector embeddings from text

Important

The snowflake-arctic-embed-m and e5-base-v2 models have an input limit of 512 tokens. Some tokens do not represent words, so the number of words supported is somewhat less. You will receive an error message if the text is too long.

To create a vector embedding from a piece of text, use the EMBED_TEXT_768 (SNOWFLAKE.CORTEX) function. This function returns the vector embedding for a given English-language text. This vector can be used with the vector comparison functions to determine the semantic similarity of two documents.

SELECT SNOWFLAKE.CORTEX.EMBED_TEXT_768(model, text)
Copy

Tip

You can use other embedding models through Snowpark Container Services. For more information, see Embed Text Container Service.

Important

EMBED_TEXT_768 is a Cortex LLM Function, so its usage is governed by the same access controls as the other Cortex LLM Functions. For instructions on accessing these functions, see the Cortex LLM Functions Required Privileges.

Example use cases

This section shows how to use embeddings, the vector similarity functions, and VECTOR data type to implement popular use cases such as vector similarity search and retrieval-augmented generation (RAG).

Retrieval-Augmented Generation (RAG)

In retrieval-augmented generation (RAG), a user’s query is used to find similar documents using vector similarity. The top document is then passed to a large language model (LLM) along with the user’s query, providing context for the generative response (completion). This can improve the appropriateness of the response significantly.

In the following example, wiki is a table with a text column content, and query is a single-row table with a text column text.

-- Create embedding vectors for wiki articles (only do once)
ALTER TABLE wiki ADD COLUMN vec VECTOR(FLOAT, 768);
UPDATE wiki SET vec = SNOWFLAKE.CORTEX.EMBED_TEXT_768('snowflake-arctic-embed-m', content);

-- Embed incoming query
SET query = 'in which year was Snowflake Computing founded?';
CREATE OR REPLACE TABLE query_table (query_vec VECTOR(FLOAT, 768));
INSERT INTO query_table SELECT SNOWFLAKE.CORTEX.EMBED_TEXT_768('snowflake-arctic-embed-m', $query);

-- Do a semantic search to find the relevant wiki for the query
WITH result AS (
    SELECT
        w.content,
        $query AS query_text,
        VECTOR_COSINE_SIMILARITY(w.vec, q.query_vec) AS similarity
    FROM wiki w, query_table q
    ORDER BY similarity DESC
    LIMIT 1
)

-- Pass to large language model as context
SELECT SNOWFLAKE.CORTEX.COMPLETE('mistral-7b',
    CONCAT('Answer this question: ', query_text, ' using this text: ', content)) FROM result;
Copy

Cost considerations

Snowflake Cortex LLM Functions, including EMBED_TEXT_768, incur compute cost based on the number of tokens processed. The table below shows the cost in credits per 1 million tokens for each function.

Note

A token is the smallest unit of text processed by Snowflake Cortex LLM Functions, approximately equal to four characters of text. The equivalence of raw input or output text to tokens can vary by model.

  • For the EMBED_TEXT_768 function, only input tokens are counted towards the billable total.

  • Vector similarity functions do not incur token-based costs.

For more information on billing of Cortex LLM Functions, see Cortex LLM Functions Cost Considerations. For general information on compute costs, see Understanding compute cost.