You are viewing documentation about an older version (1.3.0). View latest version

snowflake.snowpark.functions.countDistinct¶

snowflake.snowpark.functions.countDistinct(*cols: ColumnOrName) → Column[source]¶

Returns either the number of non-NULL distinct records for the specified columns, or the total number of the distinct records.

Example

>>> df = session.create_dataframe([[1, 2], [1, 2], [3, None], [2, 3], [3, None], [4, None]], schema=["a", "b"])
>>> df.select(count_distinct(col("a"), col("b")).alias("result")).show()
------------
|"RESULT"  |
------------
|2         |
------------

>>> #  The result should be 2 for {[1,2],[2,3]} since the rest are either duplicate or NULL records
Copy