snowflake.snowpark.functions.jarowinkler_similarity

snowflake.snowpark.functions.jarowinkler_similarity(string_expr1: Union[snowflake.snowpark.column.Column, str], string_expr2: Union[snowflake.snowpark.column.Column, str]) Column[source]

Computes the Jaro-Winkler similarity between two strings. The Jaro-Winkler similarity is a string metric measuring an edit distance between two sequences. It is a variant of the Jaro distance metric designed to give more favorable ratings to strings with common prefixes.

Parameters:
  • string_expr1 (ColumnOrName) – The first string expression to compare.

  • string_expr2 (ColumnOrName) – The second string expression to compare.

Returns:

The Jaro-Winkler similarity score as an integer between 0 and 100.

Return type:

Column

Examples::
>>> df = session.create_dataframe([
...     ("Snowflake", "Oracle"),
...     ("Ich weiß nicht", "Ich wei? nicht"),
...     ("Gute nacht", "Ich weis nicht"),
...     ("święta", "swieta"),
...     ("", ""),
...     ("test", "test")
... ], schema=["s", "t"])
>>> df.select(jarowinkler_similarity(df["s"], df["t"]).alias("similarity")).collect()
[Row(SIMILARITY=61), Row(SIMILARITY=97), Row(SIMILARITY=56), Row(SIMILARITY=77), Row(SIMILARITY=0), Row(SIMILARITY=100)]
Copy