snowflake.snowpark.modin.plugin.extensions.groupby_overrides.DataFrameGroupBy.aggregate¶
- DataFrameGroupBy.aggregate(func: Optional[Union[Callable, str, list[Union[Callable, str]], MutableMapping[Hashable, Union[Callable, str, list[Union[Callable, str]]]]]] = None, *args: Any, engine: Optional[Literal['cython', 'numba']] = None, engine_kwargs: Optional[dict[str, bool]] = None, **kwargs: Any)[source]¶
Aggregate using one or more operations over the specified axis.
- Parameters:
func (function, str, list, or dict) –
Function to use for aggregating the data. If a function, must either work when passed a
DataFrame
or when passed to DataFrame.apply.Accepted combinations are:
function
string function name
list of functions and/or function names, e.g.
[np.sum, 'mean']
dict of axis labels -> functions, function names or list of such.
*args – Positional arguments to pass to func.
engine (str, default None) –
'cython'
: Runs the function through C-extensions from cython.'numba'
: Runs the function through JIT compiled code from numba.None
: Defaults to'cython'
or globally settingcompute.use_numba
This parameter is ignored in Snowpark pandas. The execution engine will always be Snowflake.
engine_kwargs (dict, default None) –
For
'cython'
engine, there are no acceptedengine_kwargs
For
'numba'
engine, the engine can acceptnopython
,nogil
andparallel
dictionary keys. The values must either beTrue
orFalse
. The defaultengine_kwargs
for the'numba'
engine is{'nopython': True, 'nogil': False, 'parallel': False}
and will be applied to the function
This parameter is ignored in Snowpark pandas. The execution engine will always be Snowflake.
**kwargs – keyword arguments to be passed into func.
- Return type:
Examples
>>> df = pd.DataFrame( ... { ... "A": [1, 1, 2, 2], ... "B": [1, 2, 3, 4], ... "C": [0.362838, 0.227877, 1.267767, -0.562860], ... } ... )
>>> df A B C 0 1 1 0.362838 1 1 2 0.227877 2 2 3 1.267767 3 2 4 -0.562860
Apply a single aggregation to all columns:
>>> df.groupby('A').agg('min') B C A 1 1 0.227877 2 3 -0.562860
Apply multiple aggregations to all columns:
>>> df.groupby('A').agg(['min', 'max']) B C min max min max A 1 1 2 0.227877 0.362838 2 3 4 -0.562860 1.267767
Select a single column and apply aggregations:
>>> df.groupby('A').B.agg(['min', 'max']) min max A 1 1 2 2 3 4
Apply different aggregations to specific columns:
>>> df.groupby('A').agg({'B': ['min', 'max'], 'C': 'sum'}) B C min max sum A 1 1 2 0.590715 2 3 4 0.704907