`pd.DataFrame` supported APIs¶

The following table is structured as follows: The first column contains the method name. The second column is a flag for whether or not there is an implementation in Snowpark for the method in the left column.

Note

Y stands for yes, i.e., supports distributed implementation, N stands for no and API simply errors out, P stands for partial (meaning some parameters may not be supported yet), and D stands for defaults to single node pandas execution via UDF/Sproc.

Attributes

DataFrame attribute	Snowpark implemented? (Y/N/P/D)	Notes for current implementation
`T`	P	`D` if any column name is not str or tuple of str
`at`	P	`N` for set with MultiIndex
`attrs`	Y
`axes`	Y
`columns`	Y
`dtypes`	Y
`empty`	Y
`flags`	Y
`iat`	Y
`iloc`	Y
`index`	Y	This operation is not recommended since it pulls the index into local memory.
`loc`	P	`N` for set with MultiIndex.
`ndim`	Y
`shape`	Y
`size`	Y
`style`	Y	Performed locally on the client
`values`	Y

Methods

DataFrame method	Snowpark implemented? (Y/N/P/D)	Missing parameters	Notes for current implementation
`abs`	Y
`add`	P	`level`
`add_prefix`	Y
`add_suffix`	Y
`agg`	P	`margins`, `observed`, `sort`	Check Supported Aggregation Functions for a list of supported functions.
`aggregate`	P	`margins`, `observed`, `sort`	See `agg`
`align`	P	`copy`, `level`, `fill_value`	`N` for MultiIndex, for deprecated parameters `method`, `limit`, `fill_axis`, `broadcast_axis`, or if `fill_value` is not default of np.nan
`all`	P		`N` for non-integer/boolean types
`any`	P		`N` for non-integer/boolean types
`apply`	P		`N` if `func` is not callable or `result_type` is given or `args` and `kwargs` contain DataFrame or Series `N` if `func` maps to different column labels.
`applymap`	P		`N` if `na_action == "ignore"`
`asfreq`	P	`how`, `normalize`, `fill_value`	Only DatetimeIndex is supported and its `freq` will be lost. Only `rule` frequencies ‘s’, ‘min’, ‘h’, and ‘D’ are supported.
`asof`	N
`assign`	Y
`astype`	P		`N` if from string to datetime/timedelta or `errors == "ignore"`
`at_time`	N
`backfill`	P		`N` if param `downcast` is set.
`between_time`	N
`bfill`	P		`N` if param `downcast` is set.
`bool`	N
`boxplot`	Y
`clip`	N
`combine`	N
`combine_first`	N
`compare`	P	`align_axis`, `keep_shape`, `keep_equal`, `result_names`
`convert_dtypes`	N		Not supported since Snowpark pandas is already using nullable datatypes internally.
`copy`	Y
`corr`	P		`N` if `method` is not ‘pearson’
`corrwith`	N
`count`	Y
`cov`	N
`cummax`	Y
`cummin`	Y
`cumprod`	N
`cumsum`	P		`Y` if values are numeric
`describe`	Y
`diff`	Y
`div`	P	`level`
`divide`	P	`level`
`dot`	N
`drop`	Y
`drop_duplicates`	Y
`droplevel`	N
`dropna`	P		`N` if `axis == 1`
`duplicated`	Y
`eq`	P	`level`
`equals`	Y
`eval`	P		No support for dataframes with a row MultiIndex.
`ewm`	N
`expanding`	P	`method` is ignored	`N` if `axis = 1`
`explode`	N
`ffill`	P		`N` if parameter `downcast` is set. `limit` parameter only supported if `method` parameter is used.
`fillna`	P		See `ffill`
`filter`	N
`first`	Y
`first_valid_index`	Y
`floordiv`	P	`level`
`from_dict`	Y
`from_records`	P		`N` if parameter `data` is set to a DataFrame
`ge`	P	`level`
`get`	Y
`groupby`	P	`observed` is ignored since Categoricals are not implemented yet	`Y`, support `axis == 0` and `by` is column label or Series from the current DataFrame, or a `pd.Grouper` object; otherwise `N`. If a `pd.Grouper` object is passed, then only the default values of the `sort`, `closed`, `label`, and `convention` arguments are supported. The `origin` argument currently supports `"start_day"` and `"start"`. Note that supported functions are agg, count, cumcount, cummax, cummin, cumsum, first, last, max, mean, median, min, quantile, shift, size, std, sum, and var. Otherwise `N`
`gt`	P	`level`
`head`	Y
`hist`	N
`idxmax`	P		`N` for MultiIndex dataframes
`idxmin`	P		`N` for MultiIndex dataframes
`infer_objects`	N
`info`	P		Index is different, zero bytes reported for memory
`insert`	Y
`interpolate`	P		`N` if `axis == 1`, `limit` is set, `limit_area` is “outside”, or `method` is not “linear”, “bfill”, “backfill”, “ffill”, or “pad”. `limit_area="inside"` is supported only when `method` is `linear`.
`isetitem`	N
`isin`	Y
`isna`	Y
`isnull`	Y
`items`	Y
`iterrows`	Y
`itertuples`	Y
`join`	P		`N` if given the `validate` param.
`keys`	Y
`kurt`	N
`kurtosis`	N
`last`	Y
`last_valid_index`	Y
`le`	P	`level`
`lt`	P	`level`
`map`	P		`N` if `na_action == "ignore"`
`mask`	P		`N` if given `axis` when `other` is a `DataFrame` or `level` parameters; `N` if `cond` or `other` is Callable
`max`	Y
`mean`	Y
`median`	Y
`melt`	P	`col_level`	`N` when columns are MultiIndex
`memory_usage`	N
`merge`	P		`N` if param `validate` is given
`min`	Y
`mod`	P	`level`
`mode`	N
`mul`	P	`level`
`multiply`	P	`level`
`ne`	P	`level`
`nlargest`	P		`N` if `keep == "all"`
`notna`	Y
`notnull`	Y
`nsmallest`	P		`N` if `keep == "all"`
`nunique`	P		`N` if `axis == 1`
`pad`	P		See `ffill`
`pct_change`	P	`limit`, `freq`
`pipe`	N
`pivot`	P		See `pivot_table`
`pivot_table`	P	`observed`, `sort`	`N` if `index`, `columns`, or `values` is not str, list of str, or None; or MultiIndex; or any `argfunc` is not “count”, “mean”, “min”, “max”, or “sum”. N if `index` is None, `margins` is True and `aggfunc` is “count” or “mean” or a dictionary. `N` if `index` is None and `aggfunc` is a dictionary containing lists of aggfuncs to apply. `N` if `aggfunc` is an unsupported aggregation function for pivot.
`plot`	D		Performed locally on the client
`pop`	Y
`pow`	P	`level`
`prod`	N
`product`	N
`quantile`	P		`Y` if `axis == 0`, and `interpolation` is `"linear"` or `"nearest"`, and `method` is `"single"`.
`query`	P		No support for dataframes with a row MultiIndex.
`radd`	P	`level`
`rank`	P		`N` if `axis == 1`
`rdiv`	P	`level`
`reindex`	P		`N` if axis is MultiIndex or method is `nearest`.
`reindex_like`	N
`rename`	P		`N` if `mapper` is callable or the series has multiindex
`rename_axis`	Y
`reorder_levels`	N
`replace`	P	`copy` is ignored, `method`, `limit`
`resample`	P	`axis`, `label`, `convention`, `kind`, , `level`, `origin`, , `offset`, `group_keys`	Only DatetimeIndex is supported and its `freq` will be lost. `rule` frequencies ‘s’, ‘min’, ‘h’, and ‘D’ are supported. `rule` frequencies ‘W’, ‘ME’, and ‘YE’ are supported with closed = “left”
`reset_index`	Y
`rfloordiv`	P	`level`
`rmod`	P	`level`
`rmul`	P	`level`
`rolling`	P	`method` is ignored, `step`, `win_type`, `closed`, `on`	`N` for non-integer `window`, `axis = 1`, or `min_periods = 0`
`round`	P		`N` if `decimals` is Series
`rpow`	P	`level`
`rsub`	P	`level`
`rtruediv`	P	`level`
`sample`	P		`N` if `weights` is specified when `axis = 0`, or if `random_state` is not either an integer or `None`. Setting `random_state` to a value other than `None` may slow down this method because the `sample` implementation will use a sort instead of the Snowflake warehouse’s built-in SAMPLE construct.
`select_dtypes`	Y
`sem`	N
`set_axis`	Y
`set_flags`	N
`set_index`	Y
`shift`	P	`freq`	No support for `freq != None`.
`skew`	P		`N` if `axis == 1` or `skipna == False` or `numeric_only=False`
`sort_index`	P	`key`	`N` if given the `key` param. `N` if `axis == 1`, or MultiIndex.
`sort_values`	P	`key`, `kind` is ignored	`N` if given the `key` param or `axis == 1`. The `kind` parameter has no effect. Snowpark pandas always uses a stable sort algorithm, while pandas by default does not.
`squeeze`	Y
`stack`	P	`level`, `future_stack` is ignored	`N` for MultiIndex
`std`	P		`N` if `ddof` is not 0 or 1
`sub`	P	`level`
`subtract`	P	`level`
`sum`	Y
`swapaxes`	N
`swaplevel`	N
`tail`	Y
`take`	Y
`to_clipboard`	N
`to_csv`	P		Supports writing to both local and snowflake stage. Filepath starting with `@` is treated as snowflake stage location. Writing to local file supports all parameters. Writing to snowflake state does not support `float_format`, `mode`, `encoding`, `quoting`, `quotechar`, `lineterminator`, `doublequote` and `decimal` parameters.
`to_dict`	Y
`to_excel`	Y
`to_feather`	N
`to_gbq`	N
`to_hdf`	N
`to_html`	Y
`to_json`	N
`to_latex`	N
`to_markdown`	N
`to_numpy`	Y
`to_orc`	N
`to_parquet`	N
`to_period`	N
`to_pickle`	N
`to_records`	N
`to_sql`	N
`to_stata`	N
`to_string`	Y
`to_timestamp`	N
`to_xarray`	N
`to_xml`	N
`transform`	P		`Y` if `func` is callable.
`transpose`	P		See `T`
`truediv`	P	`level`
`truncate`	N
`tz_convert`	P	`axis`, `level`, `copy`	`N` if timezone format is not supported. Only timezones listed in `pytz.all_timezones` are supported. For example, `UTC` is supported but `UTC+/-<offset>`, such as `UTC+09:00`, is not supported.
`tz_localize`	P	`axis`, `level`, `copy` `ambiguous`, `nonexistent`	`N` if timezone format is not supported. Only timezones listed in `pytz.all_timezones` are supported. For example, `UTC` is supported but `UTC+/-<offset>`, such as `UTC+09:00`, is not supported.
`unstack`	P	`sort`	`N` for non-integer `level`.
`update`	Y
`value_counts`	Y
`var`	P		See `std`
`where`	P		See `mask`
`xs`	N

pd.DataFrame supported APIs¶

`pd.DataFrame` supported APIs¶