You are viewing documentation about an older version (1.22.1). View latest version

GroupBy supported APIs

The following table is structured as follows: The first column contains the method name. The second column is a flag for whether or not there is an implementation in Snowpark for the method in the left column.

Note

Y stands for yes, i.e., supports distributed implementation, N stands for no and API simply errors out, P stands for partial (meaning some parameters may not be supported yet), and D stands for defaults to single node pandas execution via UDF/Sproc. engine and engine_kwargs are always ignored in Snowpark pandas. The execution engine will always be Snowflake.

Indexing, iteration

GroupBy method

Snowpark implemented? (Y/N/P/D)

Notes for current implementation

get_group

P

Implemented for DataFrameGroupBy objects only

groups

Y

indices

Y

__iter__

N

Function application

GroupBy method

Snowpark implemented? (Y/N/P/D)

Missing parameters

Notes for current implementation

agg

P

axis other than 0 is not implemented.

Y, support functions are count, mean, min, max, sum, median, std, and var (including both Python and NumPy functions) otherwise N.

aggregate

P

axis other than 0 is not implemented.

See agg

apply

P

axis other than 0 is not implemented. include_groups = False is not implemented.

Y if the following are true, otherwise N:
  • func is a callable that always returns either a pandas DataFrame, a pandas Series, or objects that are neither DataFrame nor Series.

  • grouping on axis=0

  • Not applying transform to a dataframe with a non-unique index

  • Not applying func that returns two dataframes that have different labels for the column at a given position

  • Not applying func that returns two dataframes that have different names for a given index label

  • Not applying func that returns two Series that have different labels for the row at a given position

  • Not applying func that returns two Series that have different names

  • Not grouping by an “external” by, i.e. an object that is not a label for a column or level of the dataframe

filter

N

pipe

N

transform

P

SeriesGroupBy.transform is not implemented.

Y when func is a string or callable. A UDTF is created to run transform on every group via apply. transform has the same limitations as apply except for string func also being valid for transform.

Computations/descriptive stats

GroupBy method

Snowpark implemented? (Y/N/P/D)

Notes for current implementation

all

P

N for non-integer/boolean types

any

P

N for non-integer/boolean types

bfill

N

corr

N

corrwith

N

count

Y

SeriesGroupBy does not implement numeric_only

cov

N

cumcount

Y

cummax

Y

cummin

Y

cumprod

N

cumsum

Y

describe

N

diff

N

ffill

N

fillna

N

first

P

Does not support min_count parameter

head

Y

idxmax

P

When GroupBy axis is 1,``N``; GroupBy axis = 0 is fully supported.

idxmin

P

See idxmax

last

P

Does not support min_count parameter

max

Y

See count

mean

Y

See count

median

Y

See count

min

Y

See count

ngroup

N

nth

N

nunique

Y

ohlc

N

pct_change

N

prod

N

quantile

Y

See count

rank

Y

resample

N

rolling

N

sample

N

sem

N

shift

P

Y if axis = 0, freq is None, level is None, and by is in the columns

size

Y

skew

N

std

P

N if ddof is not 0 or 1

sum

Y

See count

tail

Y

take

N

value_counts

P

N if bins is given for SeriesGroupBy

var

P

See std

Plotting and visualization

GroupBy method

Snowpark implemented? (Y/N/P/D)

Notes for current implementation

boxplot

N

hist

N

plot

N