snowflake.snowpark.functions.ai_similarity¶
- snowflake.snowpark.functions.ai_similarity(input1: Union[Column, str], input2: Union[Column, str], **kwargs) Column [source]¶
Computes a similarity score based on the vector cosine similarity value of the inputs’ embedding vectors. Currently supports both text and image similarity computation.
- Parameters:
input1 – The first input for comparison. Can be a string with text, an image (FILE data type), or a SQL object from
prompt()
.input2 – The second input for comparison. Can be a string with text, an image (FILE data type), or a SQL object from
prompt()
. Must be the same type as input1 (both text or both images).**kwargs –
Configuration settings specified as key/value pairs. Supported keys:
- model: The embedding model used for embedding. For STRING input, defaults to ‘snowflake-arctic-embed-l-v2’.
For IMAGE input, defaults to ‘voyage-multimodal-3’. Supported values include: ‘snowflake-arctic-embed-l-v2’, ‘nv-embed-qa-4’, ‘multilingual-e5-large’, ‘voyage-multilingual-2’, ‘snowflake-arctic-embed-m-v1.5’, ‘snowflake-arctic-embed-m’, ‘e5-base-v2’, ‘voyage-multimodal-3’ (for images).
- Returns:
A float value of range -1 to 1 that represents the similarity score computed using vector similarity between two embedding vectors for the inputs.
Note
AI_SIMILARITY does not support computing the similarity between text and image inputs. Both inputs must be of the same type.
Examples:
>>> # Text similarity >>> df = session.range(1).select(ai_similarity('I like this dish', 'This dish is very good').alias("similarity")) >>> df.collect()[0][0] > 0.8 True >>> # Text similarity with custom model >>> df = session.range(1).select( ... ai_similarity( ... 'I love programming', ... '我喜欢编程', ... model='multilingual-e5-large' ... ).alias("similarity") ... ) >>> df.collect()[0][0] > 0.8 True >>> # Using columns >>> df = session.create_dataframe([ ... ['Hello world', 'Hi there'], ... ['Good morning', 'Good evening'], ... ], schema=["text1", "text2"]) >>> df = df.select(ai_similarity(col("text1"), col("text2")).alias("similarity")) >>> result = df.collect() >>> result[0][0] < 0.6 True >>> result[1][0] > 0.7 True >>> # Image similarity >>> _ = session.sql("CREATE OR REPLACE TEMP STAGE mystage ENCRYPTION = (TYPE = 'SNOWFLAKE_SSE')").collect() >>> _ = session.file.put("tests/resources/dog.jpg", "@mystage", auto_compress=False) >>> _ = session.file.put("tests/resources/cat.jpeg", "@mystage", auto_compress=False) >>> df = session.range(1).select( ... ai_similarity( ... to_file("@mystage/dog.jpg"), ... to_file("@mystage/cat.jpeg") ... ).alias("similarity") ... ) >>> df.collect()[0][0] < 0.5 True