You are viewing documentation about an older version (1.21.0). View latest version

modin.pandas.DataFrameGroupBy.aggregate¶

DataFrameGroupBy.aggregate(func: Optional[Union[Callable, str, list[Union[Callable, str]], MutableMapping[Hashable, Union[Callable, str, list[Union[Callable, str]]]]]] = None, *args: Any, engine: Optional[Literal['cython', 'numba']] = None, engine_kwargs: Optional[dict[str, bool]] = None, **kwargs: Any)[source]¶

Aggregate using one or more operations over the specified axis.

Parameters:

func (function, str, list, or dict) –
Function to use for aggregating the data. If a function, must either work when passed a DataFrame or when passed to DataFrame.apply.

Accepted combinations are:
- function
- string function name
- list of functions and/or function names, e.g. [np.sum, 'mean']
- dict of axis labels -> functions, function names or list of such.
*args – Positional arguments to pass to func.
engine (str, default None) –
- 'cython' : Runs the function through C-extensions from cython.
- 'numba' : Runs the function through JIT compiled code from numba.
- None : Defaults to 'cython' or globally setting compute.use_numba
This parameter is ignored in Snowpark pandas. The execution engine will always be Snowflake.
engine_kwargs (dict, default None) –
- For 'cython' engine, there are no accepted engine_kwargs
- For 'numba' engine, the engine can accept nopython, nogil and parallel dictionary keys. The values must either be True or False. The default engine_kwargs for the 'numba' engine is {'nopython': True, 'nogil': False, 'parallel': False} and will be applied to the function
This parameter is ignored in Snowpark pandas. The execution engine will always be Snowflake.
**kwargs – keyword arguments to be passed into func.

Return type:

DataFrame

Examples

>>> df = pd.DataFrame(
...     {
...         "A": [1, 1, 2, 2],
...         "B": [1, 2, 3, 4],
...         "C": [0.362838, 0.227877, 1.267767, -0.562860],
...     }
... )

>>> df
   A  B         C
0  1  1  0.362838
1  1  2  0.227877
2  2  3  1.267767
3  2  4 -0.562860

Apply a single aggregation to all columns:

>>> df.groupby('A').agg('min')  
    B         C
A
1  1  0.227877
2  3 -0.562860

Apply multiple aggregations to all columns:

>>> df.groupby('A').agg(['min', 'max']) 
    B             C
    min max       min       max
A
1   1   2  0.227877  0.362838
2   3   4 -0.562860  1.267767

Select a single column and apply aggregations:

>>> df.groupby('A').B.agg(['min', 'max'])   
    min  max
A
1    1    2
2    3    4

Apply different aggregations to specific columns:

>>> df.groupby('A').agg({'B': ['min', 'max'], 'C': 'sum'})  
    B             C
    min max       sum
A
1   1   2  0.590715
2   3   4  0.704907