You are viewing documentation about an older version (1.21.0). View latest version

modin.pandas.DataFrame.aggregate¶

DataFrame.aggregate(func: Union[Callable, str, list[Union[Callable, str]], MutableMapping[Hashable, Union[Callable, str, list[Union[Callable, str]]]]] = None, axis: Union[int, Literal['index', 'columns', 'rows']] = 0, *args: Any, **kwargs: Any)[source]¶

Aggregate using one or more operations over the specified axis.

Parameters:

func (function, str, list or dict) –
Function to use for aggregating the data. If a function, must either work when passed a DataFrame or when passed to DataFrame.apply.

Accepted combinations are:
- function
- string function name
- list of functions and/or function names, e.g. [np.sum, 'mean']
- dict of axis labels -> functions, function names or list of such.
axis ({0 or 'index', 1 or 'columns'}, default 0) – If 0 or ‘index’: apply function to each column. If 1 or ‘columns’: apply function to each row.
*args – Positional arguments to pass to func.
**kwargs – Keyword arguments to pass to func.

Returns:

The return can be:

scalar : when Snowpark pandas Series.agg is called with single function
Snowpark pandas Series : when Snowpark pandas DataFrame.agg is called with a single function
Snowpark pandas DataFrame : when Snowpark pandas DataFrame.agg is called with several functions

Return scalar, Snowpark pandas Series or Snowpark pandas DataFrame.

Return type:

scalar, Snowpark pandas Series or Snowpark pandas DataFrame

Notes

The aggregation operations are always performed over an axis, either the index (default) or the column axis. This behavior is different from numpy aggregation functions (mean, median, prod, sum, std, var), where the default is to compute the aggregation of the flattened array, e.g., numpy.mean(arr_2d) as opposed to numpy.mean(arr_2d, axis=0).

agg is an alias for aggregate. Use the alias.

Functions that mutate the passed object can produce unexpected behavior or errors and are not supported.

A passed user-defined-function will be passed a Series for evaluation.

Examples

>>> df = pd.DataFrame([[1, 2, 3],
...                    [4, 5, 6],
...                    [7, 8, 9],
...                    [np.nan, np.nan, np.nan]],
...                   columns=['A', 'B', 'C'])

Aggregate these functions over the rows.

>>> df.agg(['sum', 'min'])
        A     B     C
sum  12.0  15.0  18.0
min   1.0   2.0   3.0

Different aggregations per column.

>>> df.agg({'A' : ['sum', 'min'], 'B' : ['min', 'max']})
        A    B
sum  12.0  NaN
min   1.0  2.0
max   NaN  8.0

Aggregate over the columns.

>>> df.agg("max", axis="columns")
0    3.0
1    6.0
2    9.0
3    NaN
dtype: float64

Different aggregations per row.

>>> df.agg({ 0: ["sum"], 1: ["min"] }, axis=1)
   sum  min
0  6.0  NaN
1  NaN  4.0