You are viewing documentation about an older version (1.4.0). View latest version

snowflake.snowpark.DataFrame.group_by_grouping_sets¶

DataFrame.group_by_grouping_sets(*grouping_sets: GroupingSets | Iterable[GroupingSets]) → RelationalGroupedDataFrame[source]¶

Performs a SQL GROUP BY GROUPING SETS. on the DataFrame.

GROUP BY GROUPING SETS is an extension of the GROUP BY clause that allows computing multiple GROUP BY clauses in a single statement. The group set is a set of dimension columns.

GROUP BY GROUPING SETS is equivalent to the UNION of two or more GROUP BY operations in the same result set.

Examples:

>>> from snowflake.snowpark import GroupingSets
>>> df = session.create_dataframe([[1, 2, 10], [3, 4, 20], [1, 4, 30]], schema=["A", "B", "C"])
>>> df.group_by_grouping_sets(GroupingSets([col("a")])).count().collect()
[Row(A=1, COUNT=2), Row(A=3, COUNT=1)]
>>> df.group_by_grouping_sets(GroupingSets(col("a"))).count().collect()
[Row(A=1, COUNT=2), Row(A=3, COUNT=1)]
>>> df.group_by_grouping_sets(GroupingSets([col("a")], [col("b")])).count().collect()
[Row(A=1, B=None, COUNT=2), Row(A=3, B=None, COUNT=1), Row(A=None, B=2, COUNT=1), Row(A=None, B=4, COUNT=2)]
>>> df.group_by_grouping_sets(GroupingSets([col("a"), col("b")], [col("c")])).count().collect()
[Row(A=None, B=None, C=10, COUNT=1), Row(A=None, B=None, C=20, COUNT=1), Row(A=None, B=None, C=30, COUNT=1), Row(A=1, B=2, C=None, COUNT=1), Row(A=3, B=4, C=None, COUNT=1), Row(A=1, B=4, C=None, COUNT=1)]

Parameters:: grouping_sets – The list of GroupingSets to group by.