modin.pandas.DataFrame.aggregate¶
- DataFrame.aggregate(func: Union[Callable, str, list[Union[Callable, str]], MutableMapping[Hashable, Union[Callable, str, list[Union[Callable, str]]]]] = None, axis: Union[int, Literal['index', 'columns', 'rows']] = 0, *args: Any, **kwargs: Any)[source]¶
Aggregate using one or more operations over the specified axis.
- Parameters:
func (function, str, list or dict) –
Function to use for aggregating the data. If a function, must either work when passed a DataFrame or when passed to DataFrame.apply.
Accepted combinations are:
function
string function name
list of functions and/or function names, e.g.
[np.sum, 'mean']
dict of axis labels -> functions, function names or list of such.
axis ({0 or 'index', 1 or 'columns'}, default 0) – If 0 or ‘index’: apply function to each column. If 1 or ‘columns’: apply function to each row.
*args – Positional arguments to pass to func.
**kwargs – Keyword arguments to pass to func.
- Returns:
The return can be:
scalar : when Snowpark pandas Series.agg is called with single function
Snowpark pandas
Series
: when Snowpark pandas DataFrame.agg is called with a single functionSnowpark pandas
DataFrame
: when Snowpark pandas DataFrame.agg is called with several functions
Return scalar, Snowpark pandas
Series
or Snowpark pandasDataFrame
.- Return type:
scalar, Snowpark pandas
Series
or Snowpark pandasDataFrame
Notes
The aggregation operations are always performed over an axis, either the index (default) or the column axis. This behavior is different from numpy aggregation functions (mean, median, prod, sum, std, var), where the default is to compute the aggregation of the flattened array, e.g.,
numpy.mean(arr_2d)
as opposed tonumpy.mean(arr_2d, axis=0)
.agg is an alias for aggregate. Use the alias.
Functions that mutate the passed object can produce unexpected behavior or errors and are not supported.
A passed user-defined-function will be passed a Series for evaluation.
Examples
>>> df = pd.DataFrame([[1, 2, 3], ... [4, 5, 6], ... [7, 8, 9], ... [np.nan, np.nan, np.nan]], ... columns=['A', 'B', 'C'])
Aggregate these functions over the rows.
>>> df.agg(['sum', 'min']) A B C sum 12.0 15.0 18.0 min 1.0 2.0 3.0
Different aggregations per column.
>>> df.agg({'A' : ['sum', 'min'], 'B' : ['min', 'max']}) A B sum 12.0 NaN min 1.0 2.0 max NaN 8.0
Aggregate over the columns.
>>> df.agg("max", axis="columns") 0 3.0 1 6.0 2 9.0 3 NaN dtype: float64
Different aggregations per row.
>>> df.agg({ 0: ["sum"], 1: ["min"] }, axis=1) sum min 0 6.0 NaN 1 NaN 4.0