snowflake.snowpark.functions.array_except

snowflake.snowpark.functions.array_except(source_array: Union[Column, str], array_of_elements_to_exclude: Union[Column, str], allow_duplicates: bool = True) Column[source]

Returns a new ARRAY that contains the elements from one input ARRAY that are not in another input ARRAY.

The function is NULL-safe, meaning it treats NULLs as known values for comparing equality.

When allow_duplicates is set to True (default), this function is the same as the Snowflake ARRAY_EXCEPT semantic:

This function compares arrays by using multi-set semantics (sometimes called “bag semantics”). If source_array includes multiple copies of a value, the function only removes the number of copies of that value that are specified in array_of_elements_to_exclude.

For example, if source_array contains 5 elements with the value ‘A’ and array_of_elements_to_exclude contains 2 elements with the value ‘A’, the returned array contains 3 elements with the value ‘A’.

When allow_duplicates is set to False:

This function compares arrays by using set semantics. Specifically, it will first do an element deduplication for both arrays, and then compute the array_except result.

For example, if source_array contains 5 elements with the value ‘A’ and array_of_elements_to_exclude contains 2 elements with the value ‘A’, the returned array is empty.

Parameters:
  • source_array – An array that contains elements to be included in the new ARRAY.

  • array_of_elements_to_exclude – An array that contains elements to be excluded from the new ARRAY.

  • allow_duplicates – If True, we use multi-set semantic. Otherwise use set semantic.

Example::
>>> from snowflake.snowpark import Row
>>> df = session.create_dataframe([Row(["A", "B"], ["B", "C"])], schema=["source_array", "array_of_elements_to_exclude"])
>>> df.select(array_except("source_array", "array_of_elements_to_exclude").alias("result")).show()
------------
|"RESULT"  |
------------
|[         |
|  "A"     |
|]         |
------------

>>> df = session.create_dataframe([Row(["A", "B", "B", "B", "C"], ["B"])], schema=["source_array", "array_of_elements_to_exclude"])
>>> df.select(array_except("source_array", "array_of_elements_to_exclude").alias("result")).show()
------------
|"RESULT"  |
------------
|[         |
|  "A",    |
|  "B",    |
|  "B",    |
|  "C"     |
|]         |
------------

>>> df = session.create_dataframe([Row(["A", None, None], ["B", None])], schema=["source_array", "array_of_elements_to_exclude"])
>>> df.select(array_except("source_array", "array_of_elements_to_exclude").alias("result")).show()
------------
|"RESULT"  |
------------
|[         |
|  "A",    |
|  null    |
|]         |
------------

>>> df = session.create_dataframe([Row([{'a': 1, 'b': 2}, 1], [{'a': 1, 'b': 2}, 3])], schema=["source_array", "array_of_elements_to_exclude"])
>>> df.select(array_except("source_array", "array_of_elements_to_exclude").alias("result")).show()
------------
|"RESULT"  |
------------
|[         |
|  1       |
|]         |
------------

>>> df = session.create_dataframe([Row(["A", "B"], None)], schema=["source_array", "array_of_elements_to_exclude"])
>>> df.select(array_except("source_array", "array_of_elements_to_exclude").alias("result")).show()
------------
|"RESULT"  |
------------
|NULL      |
------------

>>> df = session.create_dataframe([Row(["A", "B"], ["B", "C"])], schema=["source_array", "array_of_elements_to_exclude"])
>>> df.select(array_except("source_array", "array_of_elements_to_exclude", False).alias("result")).show()
------------
|"RESULT"  |
------------
|[         |
|  "A"     |
|]         |
------------

>>> df = session.create_dataframe([Row(["A", "B", "B", "B", "C"], ["B"])], schema=["source_array", "array_of_elements_to_exclude"])
>>> df.select(array_except("source_array", "array_of_elements_to_exclude", False).alias("result")).show()
------------
|"RESULT"  |
------------
|[         |
|  "A",    |
|  "C"     |
|]         |
------------

>>> df = session.create_dataframe([Row(["A", None, None], ["B", None])], schema=["source_array", "array_of_elements_to_exclude"])
>>> df.select(array_except("source_array", "array_of_elements_to_exclude", False).alias("result")).show()
------------
|"RESULT"  |
------------
|[         |
|  "A"     |
|]         |
------------
Copy