snowflake.snowpark.DataFrame.groupBy¶
- DataFrame.groupBy(*cols: ColumnOrName | Iterable[ColumnOrName]) RelationalGroupedDataFrame [source]¶
Groups rows by the columns specified by expressions (similar to GROUP BY in SQL).
This method returns a
RelationalGroupedDataFrame
that you can use to perform aggregations on each group of data.- Parameters:
*cols – The columns to group by.
Valid inputs are:
Examples
>>> from snowflake.snowpark.functions import col, lit, sum as sum_, max as max_ >>> df = session.create_dataframe([(1, 1),(1, 2),(2, 1),(2, 2),(3, 1),(3, 2)], schema=["a", "b"]) >>> df.group_by().agg(sum_("b")).collect() [Row(SUM(B)=9)] >>> df.group_by("a").agg(sum_("b")).collect() [Row(A=1, SUM(B)=3), Row(A=2, SUM(B)=3), Row(A=3, SUM(B)=3)] >>> df.group_by("a").agg(sum_("b").alias("sum_b"), max_("b").alias("max_b")).collect() [Row(A=1, SUM_B=3, MAX_B=2), Row(A=2, SUM_B=3, MAX_B=2), Row(A=3, SUM_B=3, MAX_B=2)] >>> df.group_by(["a", lit("snow")]).agg(sum_("b")).collect() [Row(A=1, LITERAL()='snow', SUM(B)=3), Row(A=2, LITERAL()='snow', SUM(B)=3), Row(A=3, LITERAL()='snow', SUM(B)=3)] >>> df.group_by("a").agg((col("*"), "count"), max_("b")).collect() [Row(A=1, COUNT(LITERAL())=2, MAX(B)=2), Row(A=2, COUNT(LITERAL())=2, MAX(B)=2), Row(A=3, COUNT(LITERAL())=2, MAX(B)=2)] >>> df.group_by("a").median("b").collect() [Row(A=2, MEDIAN(B)=Decimal('1.500')), Row(A=3, MEDIAN(B)=Decimal('1.500')), Row(A=1, MEDIAN(B)=Decimal('1.500'))] >>> df.group_by("a").function("avg")("b").collect() [Row(A=1, AVG(B)=Decimal('1.500000')), Row(A=2, AVG(B)=Decimal('1.500000')), Row(A=3, AVG(B)=Decimal('1.500000'))]