You are viewing documentation about an older version (1.22.1). View latest version

modin.pandas.DataFrameGroupBy.transform¶

DataFrameGroupBy.transform(func: Union[str, Callable], *args: Any, engine: Optional[Literal['cython', 'numba']] = None, engine_kwargs: Optional[dict[str, bool]] = None, **kwargs: Any) → DataFrame[source]¶

Call function producing a same-indexed DataFrame on each group.

Returns a DataFrame having the same indexes as the original object filled with the transformed values.

Parameters:
  • func (function, str) –

    Function to apply to each group. See the Notes section below for requirements.

    Accepted inputs are:

    • String (needs to be the name of groupby method you want to use)

    • Python function

  • *args (Any) – Positional arguments to pass to func.

  • engine (str, default None) –

    • 'cython' : Runs the function through C-extensions from cython.

    • 'numba' : Runs the function through JIT compiled code from numba.

    • None : Defaults to 'cython' or the global setting compute.use_numba

    This parameter is ignored in Snowpark pandas, as the execution is always performed in Snowflake.

  • engine_kwargs (dict, default None) –

    • For 'cython' engine, there are no accepted engine_kwargs

    • For 'numba' engine, the engine can accept nopython, nogil and parallel dictionary keys. The values must either be True or False. The default engine_kwargs for the 'numba' engine is {'nopython': True, 'nogil': False, 'parallel': False} and will be applied to the function

    This parameter is ignored in Snowpark pandas, as the execution is always performed in Snowflake.

  • **kwargs (Any) – Keyword arguments to be passed into func.

Notes

Functions that mutate the passed object can produce unexpected behavior or errors and are not supported.

Returning a Series or scalar in func is not yet supported in Snowpark pandas.

Examples

>>> df = pd.DataFrame(
...     {
...         "col1": ["Z", None, "X", "Z", "Y", "X", "X", None, "X", "Y"],
...         "col2": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
...         "col3": [40, 50, 60, 10, 20, 30, 40, 80, 90, 10],
...         "col4": [-1, -2, -3, -4, -5, -6, -7, -8, -9, -10],
...     },
...     index=list("abcdefghij")
... )
>>> df
   col1  col2  col3  col4
a     Z     1    40    -1
b  None     2    50    -2
c     X     3    60    -3
d     Z     4    10    -4
e     Y     5    20    -5
f     X     6    30    -6
g     X     7    40    -7
h  None     8    80    -8
i     X     9    90    -9
j     Y    10    10   -10
Copy
>>> df.groupby("col1", dropna=True).transform(lambda df, n: df.head(n), n=2)
   col2  col3  col4
a   1.0  40.0  -1.0
b   NaN   NaN   NaN
c   3.0  60.0  -3.0
d   4.0  10.0  -4.0
e   5.0  20.0  -5.0
f   6.0  30.0  -6.0
g   NaN   NaN   NaN
h   NaN   NaN   NaN
i   NaN   NaN   NaN
j  10.0  10.0 -10.0
Copy
>>> df.groupby("col1", dropna=False).transform("mean")
   col2  col3  col4
a  2.50  25.0 -2.50
b  5.00  65.0 -5.00
c  6.25  55.0 -6.25
d  2.50  25.0 -2.50
e  7.50  15.0 -7.50
f  6.25  55.0 -6.25
g  6.25  55.0 -6.25
h  5.00  65.0 -5.00
i  6.25  55.0 -6.25
j  7.50  15.0 -7.50
Copy