GroupBy supported APIs¶

The following table is structured as follows: The first column contains the method name. The second column is a flag for whether or not there is an implementation in Snowpark for the method in the left column.

Note

Y stands for yes, i.e., supports distributed implementation, N stands for no and API simply errors out, P stands for partial (meaning some parameters may not be supported yet), and D stands for defaults to single node pandas execution via UDF/Sproc. engine and engine_kwargs are always ignored in Snowpark pandas. The execution engine will always be Snowflake.

Indexing, iteration


GroupBy method	Snowpark implemented? (Y/N/P/D)	Notes for current implementation
`get_group`	P	Implemented for DataFrameGroupBy objects only
`groups`	Y
`indices`	Y
`__iter__`	N

Function application


GroupBy method	Snowpark implemented? (Y/N/P/D)	Missing parameters	Notes for current implementation
`agg`	P	`axis` other than 0 is not implemented.	`Y`, support functions are count, mean, min, max, sum, median, std, and var (including both Python and NumPy functions) otherwise `N`.
`aggregate`	P	`axis` other than 0 is not implemented.	See `agg`
`apply`	P	`axis` other than 0 is not implemented. `include_groups = False` is not implemented.	`Y` if the following are true, otherwise `N`: `func` is a callable that always returns either a pandas DataFrame, a pandas Series, or objects that are neither DataFrame nor Series. grouping on axis=0 Not applying transform to a dataframe with a non-unique index Not applying `func` that returns two dataframes that have different labels for the column at a given position Not applying `func` that returns two dataframes that have different names for a given index label Not applying `func` that returns two Series that have different labels for the row at a given position Not applying `func` that returns two Series that have different names Not grouping by an “external” by, i.e. an object that is not a label for a column or level of the dataframe
`filter`	N
`pipe`	N
`transform`	P	`SeriesGroupBy.transform` is not implemented.	`Y` when `func` is a string or callable. A UDTF is created to run `transform` on every group via `apply`. `transform` has the same limitations as `apply` except for string `func` also being valid for `transform`.

Computations/descriptive stats


GroupBy method	Snowpark implemented? (Y/N/P/D)	Notes for current implementation
`all`	P	`N` for non-integer/boolean types
`any`	P	`N` for non-integer/boolean types
`bfill`	N
`corr`	N
`corrwith`	N
`count`	Y	SeriesGroupBy does not implement `numeric_only`
`cov`	N
`cumcount`	Y
`cummax`	Y
`cummin`	Y
`cumprod`	N
`cumsum`	Y
`describe`	N
`diff`	N
`ffill`	N
`fillna`	N
`first`	P	Does not support `min_count` parameter
`head`	Y
`idxmax`	P	When GroupBy axis is 1,``N``; GroupBy axis = 0 is fully supported.
`idxmin`	P	See `idxmax`
`last`	P	Does not support `min_count` parameter
`max`	Y	See `count`
`mean`	Y	See `count`
`median`	Y	See `count`
`min`	Y	See `count`
`ngroup`	N
`nth`	N
`nunique`	Y
`ohlc`	N
`pct_change`	N
`prod`	N
`quantile`	Y	See `count`
`rank`	Y
`resample`	N
`rolling`	N
`sample`	N
`sem`	N
`shift`	P	`Y` if `axis = 0`, `freq` is None, `level` is None, and `by` is in the columns
`size`	Y
`skew`	N
`std`	P	`N` if `ddof` is not 0 or 1
`sum`	Y	See `count`
`tail`	Y
`take`	N
`value_counts`	P	`N` if `bins` is given for SeriesGroupBy
`var`	P	See `std`

Plotting and visualization


GroupBy method	Snowpark implemented? (Y/N/P/D)	Notes for current implementation
`boxplot`	N
`hist`	N
`plot`	N