You are viewing documentation about an older version (1.25.0). View latest version

snowflake.snowpark.modin.plugin.extensions.groupby_overrides.DataFrameGroupBy.transform¶

DataFrameGroupBy.transform(func: Union[str, Callable], *args: Any, engine: Optional[Literal['cython', 'numba']] = None, engine_kwargs: Optional[dict[str, bool]] = None, **kwargs: Any) → DataFrame[source]¶

Call function producing a same-indexed DataFrame on each group.

Returns a DataFrame having the same indexes as the original object filled with the transformed values.

Parameters:

func (function, str) –
Function to apply to each group. See the Notes section below for requirements.

Accepted inputs are:
- String (needs to be the name of groupby method you want to use)
- Python function
*args (Any) – Positional arguments to pass to func.
engine (str, default None) –
- 'cython' : Runs the function through C-extensions from cython.
- 'numba' : Runs the function through JIT compiled code from numba.
- None : Defaults to 'cython' or the global setting compute.use_numba
This parameter is ignored in Snowpark pandas, as the execution is always performed in Snowflake.
engine_kwargs (dict, default None) –
- For 'cython' engine, there are no accepted engine_kwargs
- For 'numba' engine, the engine can accept nopython, nogil and parallel dictionary keys. The values must either be True or False. The default engine_kwargs for the 'numba' engine is {'nopython': True, 'nogil': False, 'parallel': False} and will be applied to the function
This parameter is ignored in Snowpark pandas, as the execution is always performed in Snowflake.
**kwargs (Any) – Keyword arguments to be passed into func.

Notes

Functions that mutate the passed object can produce unexpected behavior or errors and are not supported.

Returning a Series or scalar in func is not yet supported in Snowpark pandas.

Examples

>>> df = pd.DataFrame(
...     {
...         "col1": ["Z", None, "X", "Z", "Y", "X", "X", None, "X", "Y"],
...         "col2": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
...         "col3": [40, 50, 60, 10, 20, 30, 40, 80, 90, 10],
...         "col4": [-1, -2, -3, -4, -5, -6, -7, -8, -9, -10],
...     },
...     index=list("abcdefghij")
... )
>>> df
   col1  col2  col3  col4
a     Z     1    40    -1
b  None     2    50    -2
c     X     3    60    -3
d     Z     4    10    -4
e     Y     5    20    -5
f     X     6    30    -6
g     X     7    40    -7
h  None     8    80    -8
i     X     9    90    -9
j     Y    10    10   -10

Copy

>>> df.groupby("col1", dropna=True).transform(lambda df, n: df.head(n), n=2)  
   col2  col3  col4
a   1.0  40.0  -1.0
b   NaN   NaN   NaN
c   3.0  60.0  -3.0
d   4.0  10.0  -4.0
e   5.0  20.0  -5.0
f   6.0  30.0  -6.0
g   NaN   NaN   NaN
h   NaN   NaN   NaN
i   NaN   NaN   NaN
j  10.0  10.0 -10.0

Copy

>>> df.groupby("col1", dropna=False).transform("mean")  
   col2  col3  col4
a  2.50  25.0 -2.50
b  5.00  65.0 -5.00
c  6.25  55.0 -6.25
d  2.50  25.0 -2.50
e  7.50  15.0 -7.50
f  6.25  55.0 -6.25
g  6.25  55.0 -6.25
h  5.00  65.0 -5.00
i  6.25  55.0 -6.25
j  7.50  15.0 -7.50

Copy