snowflake.snowpark.DataFrameAIFunctions.similarity¶
- DataFrameAIFunctions.similarity(input1: Union[snowflake.snowpark.column.Column, str], input2: Union[snowflake.snowpark.column.Column, str], *, output_column: Optional[str] = None, **kwargs) snowflake.snowpark.DataFrame[source]¶
Compute similarity scores between two columns using AI-powered embeddings.
This method computes a similarity score based on the vector cosine similarity of the inputs’ embedding vectors. Supports both text and image similarity.
- Parameters:
input1 – The first column (Column object or column name as string) for comparison. Can contain text strings or images (FILE data type).
input2 – The second column (Column object or column name as string) for comparison. Must be the same type as input1 (both text or both images).
output_column – The name of the output column to be appended. If not provided, a column named
AI_SIMILARITY_OUTPUTis appended.**kwargs –
Configuration settings specified as key/value pairs. Supported keys:
model: The embedding model used for embeddings. For text input, defaults to ‘snowflake-arctic-embed-l-v2’. For image input, defaults to ‘voyage-multimodal-3’. Supported models include:
Text: ‘snowflake-arctic-embed-l-v2’, ‘nv-embed-qa-4’, ‘multilingual-e5-large’, ‘voyage-multilingual-2’, ‘snowflake-arctic-embed-m-v1.5’, ‘snowflake-arctic-embed-m’, ‘e5-base-v2’
Images: ‘voyage-multimodal-3’
- Returns:
A new DataFrame with an appended output column containing similarity scores. The scores range from -1 to 1, where higher values indicate greater similarity.
Examples:
Note
Both inputs must be of the same type (both text or both images)
AI_SIMILARITY does not support computing similarity between text and image inputs
- Similarity scores range from -1 to 1, where:
1 indicates identical or very similar content
0 indicates no similarity
-1 indicates opposite or very dissimilar content
This function or method is experimental since 1.39.0.