Vector data types¶
This topic describes the vector data types.
Data types¶
Snowflake supports a single vector data type, VECTOR.
Note
The VECTOR data type is only supported in SQL, the Python connector and the Snowpark Python library. No other languages are supported.
VECTOR¶
With the VECTOR data type, Snowflake encodes and processes vectors efficiently. This data type supports semantic vector search and retrieval applications, such as RAG-based applications, and common operations on vectors in vector-processing applications.
To specify a VECTOR type, use the following syntax:
VECTOR( <type>, <dimension> )
Where:
type
is the Snowflake data type of the elements, which can be 32-bit integers or 32-bit floating-point numbers.You can specify one of the following types:
INT
FLOAT
dimension
is the dimension (length) of the vector. This must be a positive integer value with a maximum value of 4096.
Note
Direct vector comparisons (e.g. v1 < v2) are byte-wise lexicographic and, while deterministic, won’t produce results that you’d expect from number comparisons. So while you can use VECTOR columns in ORDER BY clauses, for vector comparisons, use the vector similarity functions provided.
The following are examples of valid definitions of vectors:
Define a vector of 256 32-bit floating-point values:
VECTOR(FLOAT, 256)
Define a vector of 16 32-bit integer values:
VECTOR(INT, 16)
The following are examples of invalid definitions of vectors:
A vector definition using an invalid value type:
VECTOR(STRING, 256)
A vector definition using an invalid vector size:
VECTOR(INT, -1)
Vector conversion¶
This section describes how to convert to and from a VECTOR value. For details on casting, see Data type conversion.
Converting a value to a VECTOR value¶
VECTOR values can be explicitly cast from the following types:
Converting a value from a VECTOR value¶
VECTOR values can be explicitly cast to the following types:
Loading and unloading vector data¶
Directly loading and unloading a VECTOR column is not supported. For VECTOR columns, you must load and unload data as an ARRAY and then cast it to a VECTOR when you use it. To learn how to load and unload ARRAY data types, see Introduction to Loading Semi-structured Data. A common use case for vectors is to generate a vector embedding.
The following example shows how to unload a table with a VECTOR column to an internal stage named mystage
:
CREATE TABLE mytable (a VECTOR(float, 3), b VECTOR(float, 3));
INSERT INTO mytable SELECT [1.1,2.2,3]::VECTOR(FLOAT,3), [1,1,1]::VECTOR(FLOAT,3);
INSERT INTO mytable SELECT [1,2.2,3]::VECTOR(FLOAT,3), [4,6,8]::VECTOR(FLOAT,3);
COPY INTO @mystage/unload/
FROM (SELECT TO_ARRAY(a), TO_ARRAY(b) FROM mytable);
The following example shows how to load a table from a stage and then cast the ARRAY columns as VECTOR columns:
CREATE OR REPLACE TABLE arraytable (a ARRAY, b ARRAY);
COPY INTO arraytable
FROM @mystage/unload/mydata.csv.gz;
SELECT a::VECTOR(FLOAT, 3), b::VECTOR(FLOAT, 3)
FROM arraytable;
Examples¶
Construct a VECTOR by casting a constant ARRAY:
SELECT [1, 2, 3]::VECTOR(FLOAT, 3) as vec;
Add a column with the VECTOR data type:
ALTER TABLE issues ADD COLUMN issue_vec VECTOR(FLOAT, 768);
UPDATE TABLE issues
SET issue_vec = SNOWFLAKE.CORTEX.EMBED_TEXT_768('e5-base-v2', issue_text);
Limitations¶
There is limited language support for the VECTOR data type. Languages not represented in this table are not supported.
Snowflake feature
Python
SQL
UDFs
✔
✔
UDTFs
✔
✔
Drivers/Connectors
✔
✔
Snowpark API
✔
Vectors are not supported in VARIANT columns.
Vectors are not supported as clustering keys.
Server-side binding is not supported. This means that when writing to a VECTOR column through a Snowflake driver, you must cast the VECTOR values in the query before running the query.
Vectors are allowed in hybrid tables but not as primary keys or secondary index keys.
The VECTOR data type is not supported for use with the following Snowflake features: