modin.pandas.DataFrameGroupBy.apply¶
- DataFrameGroupBy.apply(func, *args, include_groups=True, **kwargs)[source]¶
Apply function
funcgroup-wise and combine the results together.The function passed to
applymust take a dataframe or series as its first argument and return a DataFrame, Series or scalar.applywill then take care of combining the results back together into a single dataframe or series.applyis therefore a highly flexible grouping method.While
applyis a very flexible method, its downside is that using it can be quite a bit slower than using more specific methods likeaggortransform. pandas offers a wide range of methods that will be much faster than usingapplyfor their specific purposes, so try to use them before reaching forapply.- Parameters:
func (callable) – A callable that takes a dataframe or series as its first argument, and returns a dataframe, a series or a scalar. In addition the callable may take positional and keyword arguments.
include_groups (bool, default True) – When True, will apply
functo the groups in the case that they are columns of the DataFrame.args (tuple and dict) – Optional positional and keyword arguments to pass to
func.kwargs (tuple and dict) – Optional positional and keyword arguments to pass to
func.
- Return type:
See also
pipeApply function to the full GroupBy object instead of to each group.
aggregateApply aggregate function to the GroupBy object.
transformApply function column-by-column to the GroupBy object.
Series.applyApply a function to a Series.
DataFrame.applyApply a function to each row or column of a DataFrame.
Notes
Functions that mutate the passed object can produce unexpected behavior or errors and are not supported.
Returning a Series or scalar in
funcis not yet supported in Snowpark pandas.Examples
>>> df = pd.DataFrame({'A': 'a a b'.split(), ... 'B': [1,2,3], ... 'C': [4,6,5]}) >>> g1 = df.groupby('A', group_keys=False) >>> g2 = df.groupby('A', group_keys=True)
Notice that
g1haveg2have two groups,aandb, and only differ in theirgroup_keysargument. Calling apply in various ways, we can get different grouping results:Example 1: below the function passed to apply takes a DataFrame as its argument and returns a DataFrame. apply combines the result for each group together into a new DataFrame:
>>> g1[['B', 'C']].apply(lambda x: x.select_dtypes('number') / x.select_dtypes('number').sum()) B C 0 0.333333 0.4 1 0.666667 0.6 2 1.000000 1.0
In the above, the groups are not part of the index. We can have them included by using
g2wheregroup_keys=True:>>> g2[['B', 'C']].apply(lambda x: x.select_dtypes('number') / x.select_dtypes('number').sum()) B C A a 0 0.333333 0.4 1 0.666667 0.6 b 2 1.000000 1.0