snowflake.snowpark.modin.plugin.extensions.groupby_overrides.DataFrameGroupBy.aggregate¶

DataFrameGroupBy.aggregate(func: Optional[Union[Callable, str, list[Union[Callable, str]], MutableMapping[Hashable, Union[Callable, str, list[Union[Callable, str]]]]]] = None, *args: Any, engine: Optional[Literal['cython', 'numba']] = None, engine_kwargs: Optional[dict[str, bool]] = None, **kwargs: Any)[source]¶

Aggregate using one or more operations over the specified axis.

Parameters:
  • func (function, str, list, or dict) –

    Function to use for aggregating the data. If a function, must either work when passed a DataFrame or when passed to DataFrame.apply.

    Accepted combinations are:

    • function

    • string function name

    • list of functions and/or function names, e.g. [np.sum, 'mean']

    • dict of axis labels -> functions, function names or list of such.

  • *args – Positional arguments to pass to func.

  • engine (str, default None) –

    • 'cython' : Runs the function through C-extensions from cython.

    • 'numba' : Runs the function through JIT compiled code from numba.

    • None : Defaults to 'cython' or globally setting compute.use_numba

    This parameter is ignored in Snowpark pandas. The execution engine will always be Snowflake.

  • engine_kwargs (dict, default None) –

    • For 'cython' engine, there are no accepted engine_kwargs

    • For 'numba' engine, the engine can accept nopython, nogil and parallel dictionary keys. The values must either be True or False. The default engine_kwargs for the 'numba' engine is {'nopython': True, 'nogil': False, 'parallel': False} and will be applied to the function

    This parameter is ignored in Snowpark pandas. The execution engine will always be Snowflake.

  • **kwargs – keyword arguments to be passed into func.

Return type:

DataFrame

Examples

>>> df = pd.DataFrame(
...     {
...         "A": [1, 1, 2, 2],
...         "B": [1, 2, 3, 4],
...         "C": [0.362838, 0.227877, 1.267767, -0.562860],
...     }
... )
Copy
>>> df
   A  B         C
0  1  1  0.362838
1  1  2  0.227877
2  2  3  1.267767
3  2  4 -0.562860
Copy

Apply a single aggregation to all columns:

>>> df.groupby('A').agg('min')  
    B         C
A
1  1  0.227877
2  3 -0.562860
Copy

Apply multiple aggregations to all columns:

>>> df.groupby('A').agg(['min', 'max']) 
    B             C
    min max       min       max
A
1   1   2  0.227877  0.362838
2   3   4 -0.562860  1.267767
Copy

Select a single column and apply aggregations:

>>> df.groupby('A').B.agg(['min', 'max'])   
    min  max
A
1    1    2
2    3    4
Copy

Apply different aggregations to specific columns:

>>> df.groupby('A').agg({'B': ['min', 'max'], 'C': 'sum'})  
    B             C
    min max       sum
A
1   1   2  0.590715
2   3   4  0.704907
Copy