modin.pandas.DataFrame¶
- class modin.pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=None, query_compiler=None)[source]¶
Bases:
BasePandasDataset
Snowpark pandas representation of
pandas.DataFrame
with a lazily-evaluated relational dataset.A DataFrame is considered lazy because it encapsulates the computation or query required to produce the final dataset. The computation is not performed until the datasets need to be displayed, or I/O methods like to_pandas, to_snowflake are called.
Internally, the underlying data are stored as Snowflake table with rows and columns.
- Parameters:
data (DataFrame, Series, pandas.DataFrame, ndarray, Iterable or dict, optional) – Dict can contain
Series
, arrays, constants, dataclass or list-like objects. If data is a dict, column order follows insertion-order.index (Index or array-like, optional) – Index to use for resulting frame. Will default to
RangeIndex
if no indexing information part of input data and no index provided.columns (Index or array-like, optional) – Column labels to use for resulting frame. Will default to
RangeIndex
if no column labels are provided.dtype (str, np.dtype, or pandas.ExtensionDtype, optional) – Data type to force. Only a single dtype is allowed. If None, infer.
copy (bool, default: False) – Copy data from inputs. Only affects
pandas.DataFrame
/ 2d ndarray input.query_compiler (BaseQueryCompiler, optional) – A query compiler object to create the
DataFrame
from.
Notes
DataFrame
can be created either from passed data or query_compiler. If both parameters are provided, data source will be prioritized in the next order:Modin
DataFrame
orSeries
passed with data parameter.Query compiler from the query_compiler parameter.
Various pandas/NumPy/Python data structures passed with data parameter.
The last option is less desirable since import of such data structures is very inefficient, please use previously created Modin structures from the fist two options or import data using highly efficient Modin IO tools (for example
pd.read_csv
).Examples
Creating a Snowpark pandas DataFrame from a dictionary:
>>> d = {'col1': [1, 2], 'col2': [3, 4]} >>> df = pd.DataFrame(data=d) >>> df col1 col2 0 1 3 1 2 4
Constructing DataFrame from numpy ndarray:
>>> df2 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), ... columns=['a', 'b', 'c']) >>> df2 a b c 0 1 2 3 1 4 5 6 2 7 8 9
Constructing DataFrame from a numpy ndarray that has labeled columns:
>>> data = np.array([(1, 2, 3), (4, 5, 6), (7, 8, 9)], ... dtype=[("a", "i4"), ("b", "i4"), ("c", "i4")]) >>> df3 = pd.DataFrame(data, columns=['c', 'a']) ... >>> df3 c a 0 3 1 1 6 4 2 9 7
Constructing DataFrame from Series/DataFrame:
>>> ser = pd.Series([1, 2, 3], index=["a", "b", "c"], name = "s") >>> df = pd.DataFrame(data=ser, index=["a", "c"]) >>> df s a 1 c 3 >>> df1 = pd.DataFrame([1, 2, 3], index=["a", "b", "c"], columns=["x"]) >>> df2 = pd.DataFrame(data=df1, index=["a", "c"]) >>> df2 x a 1 c 3
Methods
abs
()Return a DataFrame with absolute numeric value of each element.
add
(other[, axis, level, fill_value])Get addition of
DataFrame
and other, element-wise (binary operator add).add_prefix
(prefix)Prefix labels with string prefix.
add_suffix
(suffix)Suffix labels with string suffix.
agg
([func, axis])Aggregate using one or more operations over the specified axis.
aggregate
([func, axis])Aggregate using one or more operations over the specified axis.
align
(other[, join, axis, level, copy, ...])Align two objects on their axes with the specified join method.
all
([axis, bool_only, skipna])Return whether all elements are True, potentially over an axis.
any
(*[, axis, bool_only, skipna])Return whether any element are True, potentially over an axis.
apply
(func[, axis, raw, result_type, args])Apply a function along an axis of the DataFrame.
applymap
(func[, na_action])Apply a function to a Dataframe elementwise.
asfreq
(freq[, method, how, normalize, ...])Convert time series to specified frequency.
asof
(where[, subset])Return the last row(s) without any NaNs before where.
assign
(**kwargs)Assign new columns to a
DataFrame
.astype
(dtype[, copy, errors])Cast a pandas object to a specified dtype
dtype
.at_time
(time[, asof, axis])Select values at particular time of day (e.g., 9:30AM).
backfill
(*[, axis, inplace, limit, downcast])Synonym for DataFrame.fillna with
method='bfill'
.between_time
(start_time, end_time[, ...])Select values between particular times of the day (e.g., 9:00-9:30 AM).
bfill
(*[, axis, inplace, limit, limit_area, ...])Fill NA/NaN values by using the next valid observation to fill the gap.
bool
()Return the bool of a single element BasePandasDataset.
boxplot
([column, by, ax, fontsize, rot, ...])Make a box plot from
DataFrame
columns.cache_result
([inplace])Persists the current Snowpark pandas DataFrame to a temporary table to improve the latency of subsequent operations.
clip
([lower, upper, axis, inplace])Trim values at input threshold(s).
combine
(other, func[, fill_value, overwrite])Perform column-wise combine with another
DataFrame
.combine_first
(other)Update null elements with value in the same location in other.
compare
(other[, align_axis, keep_shape, ...])Compare to another DataFrame and show the differences.
convert_dtypes
([infer_objects, ...])Convert columns to best possible dtypes using dtypes supporting
pd.NA
.copy
([deep])Make a copy of this object's indices and data.
corr
([method, min_periods, numeric_only])Compute pairwise correlation of columns, excluding NA/null values.
corrwith
(other[, axis, drop, method, ...])Compute pairwise correlation.
count
([axis, numeric_only])Count non-NA cells for each column or row.
cov
([min_periods, ddof, numeric_only])cummax
([axis, skipna])Return cumulative maximum over a BasePandasDataset axis.
cummin
([axis, skipna])Return cumulative minimum over a BasePandasDataset axis.
cumprod
([axis, skipna])Return cumulative product over a BasePandasDataset axis.
cumsum
([axis, skipna])Return cumulative sum over a BasePandasDataset axis.
describe
([percentiles, include, exclude])Generate descriptive statistics for columns in the dataset.
diff
([periods, axis])First discrete difference of element.
div
(other[, axis, level, fill_value])Get floating division of
DataFrame
and other, element-wise (binary operator truediv).divide
(other[, axis, level, fill_value])Get floating division of
DataFrame
and other, element-wise (binary operator truediv).dot
(other)Compute the matrix multiplication between the
DataFrame
and other.drop
([labels, axis, index, columns, level, ...])Drop specified labels from rows or columns.
drop_duplicates
([subset, keep, inplace, ...])Return
DataFrame
with duplicate rows removed.droplevel
(level[, axis])Return BasePandasDataset with requested index / column level(s) removed.
dropna
(*[, axis, how, thresh, subset, inplace])Remove missing values.
duplicated
([subset, keep])Return boolean Series denoting duplicate rows.
eq
(other[, axis, level])Perform equality comparison of
DataFrame
and other (binary operator eq).equals
(other)Test whether two dataframes contain the same elements.
eval
(expr[, inplace])Evaluate a string describing operations on
DataFrame
columns.ewm
([com, span, halflife, alpha, ...])Provide exponentially weighted (EW) calculations.
expanding
([min_periods, axis, method])Provide expanding window calculations.
explode
(column[, ignore_index])Transform each element of a list-like to a row.
ffill
(*[, axis, inplace, limit, limit_area, ...])Fill NA/NaN values by propagating the last valid observation to next valid.
fillna
([value, method, axis, inplace, ...])Fill NA/NaN values using the specified method.
filter
([items, like, regex, axis])Subset the BasePandasDataset rows or columns according to the specified index labels.
first
(offset)Select initial periods of time series data based on a date offset.
Return index for first non-NA value or None, if no non-NA value is found.
floordiv
(other[, axis, level, fill_value])Get integer division of
DataFrame
and other, element-wise (binary operator floordiv).from_dict
(data[, orient, dtype, columns])Construct
DataFrame
from dict of array-like or dicts.from_records
(data[, index, exclude, ...])Convert structured or record ndarray to
DataFrame
.ge
(other[, axis, level])Get greater than or equal comparison of
DataFrame
and other, element-wise (binary operator ge).get
(key[, default])Get item from object for given key (ex: DataFrame column).
groupby
([by, axis, level, as_index, sort, ...])Group DataFrame using a mapper or by a Series of columns.
gt
(other[, axis, level])Get greater than comparison of
DataFrame
and other, element-wise (binary operator ge).head
([n])Return the first n rows.
hist
([column, by, grid, xlabelsize, xrot, ...])Make a histogram of the
DataFrame
.idxmax
([axis, skipna, numeric_only])Return index of first occurrence of maximum over requested axis.
idxmin
([axis, skipna, numeric_only])Return index of first occurrence of minimum over requested axis.
infer_objects
([copy])Attempt to infer better dtypes for object columns.
info
([verbose, buf, max_cols, memory_usage, ...])Print a concise summary of the
DataFrame
.insert
(loc, column, value[, allow_duplicates])Insert column into DataFrame at specified location.
interpolate
([method, axis, limit, inplace, ...])isetitem
(loc, value)isin
(values)Whether each element in the DataFrame is contained in values.
isna
()Detect missing values.
isnull
()DataFrame.isnull is an alias for DataFrame.isna.
items
()Iterate over (column name,
Series
) pairs.iterrows
()Iterate over
DataFrame
rows as (index,Series
) pairs.itertuples
([index, name])Iterate over DataFrame rows as namedtuples.
join
(other[, on, how, lsuffix, rsuffix, ...])Join columns of another DataFrame.
keys
()Get columns of the
DataFrame
.kurt
([axis, skipna, numeric_only])Return unbiased kurtosis over requested axis.
kurtosis
([axis, skipna, numeric_only])Return unbiased kurtosis over requested axis.
last
(offset)Select final periods of time series data based on a date offset.
Return index for last non-NA value or None, if no non-NA value is found.
le
(other[, axis, level])Get less than or equal comparison of
DataFrame
and other, element-wise (binary operator le).lt
(other[, axis, level])Get less than comparison of
DataFrame
and other, element-wise (binary operator le).map
(func[, na_action])Apply a function to the DataFrame elementwise.
mask
(cond[, other, inplace, axis, level])Replace values where the condition is True.
max
([axis, skipna, numeric_only])Return the maximum of the values over the requested axis.
mean
([axis, skipna, numeric_only])Return the mean of the values over the requested axis.
median
([axis, skipna, numeric_only])Return the median of the values over the requested axis.
melt
([id_vars, value_vars, var_name, ...])Unpivot a
DataFrame
from wide to long format, optionally leaving identifiers set.memory_usage
([index, deep])Return the memory usage of each column in bytes.
merge
(right[, how, on, left_on, right_on, ...])Merge DataFrame or named Series objects with a database-style join.
min
([axis, skipna, numeric_only])Return the minimum of the values over the requested axis.
mod
(other[, axis, level, fill_value])Get modulo of
DataFrame
and other, element-wise (binary operator mod).mode
([axis, numeric_only, dropna])Get the mode(s) of each element along the selected axis.
mul
(other[, axis, level, fill_value])Get multiplication of
DataFrame
and other, element-wise (binary operator mul).multiply
(other[, axis, level, fill_value])Get multiplication of
DataFrame
and other, element-wise (binary operator mul).ne
(other[, axis, level])Get not equal comparison of
DataFrame
and other, element-wise (binary operator ne).nlargest
(n, columns[, keep])Return the first n rows ordered by columns in descending order.
notna
()Detect non-missing values for an array-like object.
notnull
()Detect non-missing values for an array-like object.
nsmallest
(n, columns[, keep])Return the first n rows ordered by columns in ascending order.
nunique
([axis, dropna])Count number of distinct elements in specified axis.
pad
(*[, axis, inplace, limit, downcast])Fill NA/NaN values by propagating the last valid observation to next valid.
pct_change
([periods, fill_method, limit, freq])Fractional change between the current and a prior element.
pipe
(func, *args, **kwargs)Apply chainable functions that expect BasePandasDataset.
pivot
(*, columns[, index, values])Return reshaped DataFrame organized by given index / column values.
pivot_table
([values, index, columns, ...])Create a spreadsheet-style pivot table as a
DataFrame
.pop
(item)Return item and drop from frame.
pow
(other[, axis, level, fill_value])Get exponential power of
DataFrame
and other, element-wise (binary operator pow).prod
([axis, skipna, numeric_only, min_count])Return the product of the values over the requested axis.
product
([axis, skipna, numeric_only, min_count])Return the product of the values over the requested axis.
quantile
([q, axis, numeric_only, ...])Return values at the given quantile over requested axis.
query
(expr[, inplace])Query the columns of a
DataFrame
with a boolean expression.radd
(other[, axis, level, fill_value])Get addition of
DataFrame
and other, element-wise (binary operator radd).rank
([axis, method, numeric_only, ...])Compute numerical data ranks (1 through n) along axis.
rdiv
(other[, axis, level, fill_value])Get floating division of
DataFrame
and other, element-wise (binary operator rtruediv).reindex
([labels, index, columns, axis, ...])Conform DataFrame to new index with optional filling logic.
reindex_like
(other[, method, copy, limit, ...])Return an object with matching indices as other object.
rename
([mapper, index, columns, axis, copy, ...])Rename columns or index labels.
rename_axis
([mapper, index, columns, axis, ...])Set the name of the axis for the index or columns.
reorder_levels
(order[, axis])Rearrange index levels using input order.
replace
([to_replace, value, inplace, limit, ...])Replace values given in to_replace with value.
resample
(rule[, axis, closed, label, ...])Resample time-series data.
reset_index
([level, drop, inplace, ...])Reset the index, or a level of it.
rfloordiv
(other[, axis, level, fill_value])Get integer division of
DataFrame
and other, element-wise (binary operator rfloordiv).rmod
(other[, axis, level, fill_value])Get modulo of
DataFrame
and other, element-wise (binary operator rmod).rmul
(other[, axis, level, fill_value])Get multiplication of
DataFrame
and other, element-wise (binary operator mul).rolling
(window[, min_periods, center, ...])Provide rolling window calculations.
round
([decimals])Round a DataFrame to a variable number of decimal places.
rpow
(other[, axis, level, fill_value])Get exponential power of
DataFrame
and other, element-wise (binary operator rpow).rsub
(other[, axis, level, fill_value])Get subtraction of
DataFrame
and other, element-wise (binary operator rsub).rtruediv
(other[, axis, level, fill_value])Get floating division of
DataFrame
and other, element-wise (binary operator rtruediv).sample
([n, frac, replace, weights, ...])Return a random sample of items from an axis of object.
select_dtypes
([include, exclude])Return a subset of the
DataFrame
's columns based on the column dtypes.sem
([axis, skipna, ddof, numeric_only])Return unbiased standard error of the mean over requested axis.
set_axis
(labels, *[, axis, copy])Assign desired index to given axis.
set_flags
(*[, copy, allows_duplicate_labels])Return a new BasePandasDataset with updated flags.
set_index
(keys[, drop, append, inplace, ...])Set the DataFrame index using existing columns.
shift
([periods, freq, axis, fill_value, suffix])Shift data by desired number of periods along axis and replace columns with fill_value (default: None).
skew
([axis, skipna, numeric_only])Return unbiased skew, normalized over n-1
sort_index
(*[, axis, level, ascending, ...])Sort object by labels (along an axis).
sort_values
(by[, axis, ascending, inplace, ...])Sort by the values along either axis.
squeeze
([axis])Squeeze 1 dimensional axis objects into scalars.
stack
([level, dropna, sort, future_stack])Stack the prescribed level(s) from columns to index.
std
([axis, skipna, ddof, numeric_only])Return sample standard deviation over requested axis.
sub
(other[, axis, level, fill_value])Get subtraction of
DataFrame
and other, element-wise (binary operator sub).subtract
(other[, axis, level, fill_value])Get subtraction of
DataFrame
and other, element-wise (binary operator sub).sum
([axis, skipna, numeric_only, min_count])Return the sum of the values over the requested axis.
swapaxes
(axis1, axis2[, copy])Interchange axes and swap values axes appropriately.
swaplevel
([i, j, axis])Swap levels i and j in a MultiIndex.
tail
([n])Return the last n rows.
take
(indices[, axis])Return the elements in the given positional indices along an axis.
to_clipboard
([excel, sep])Copy object to the system clipboard.
to_csv
([path_or_buf, sep, na_rep, ...])Write object to a comma-separated values (csv) file.
to_dict
([orient, into])Convert the DataFrame to a dictionary.
to_excel
(excel_writer[, sheet_name, na_rep, ...])Write object to an Excel sheet.
to_feather
(path, **kwargs)Write a
DataFrame
to the binary Feather format.to_gbq
(destination_table[, project_id, ...])Write a
DataFrame
to a Google BigQuery table.to_hdf
(path_or_buf, key[, format])Write the contained data to an HDF5 file using HDFStore.
to_html
([buf, columns, col_space, header, ...])Render a
DataFrame
as an HTML table.to_json
([path_or_buf, orient, date_format, ...])Convert the object to a JSON string.
to_latex
([buf, columns, col_space, header, ...])Render object to a LaTeX tabular, longtable, or nested table.
to_markdown
([buf, mode, index, storage_options])Print BasePandasDataset in Markdown-friendly format.
to_numpy
([dtype, copy, na_value])Convert the DataFrame or Series to a NumPy array.
to_orc
([path, engine, index, engine_kwargs])to_pandas
(*[, statement_params])Convert Snowpark pandas DataFrame to pandas.DataFrame
to_parquet
([path, engine, compression, ...])to_period
([freq, axis, copy])Convert
DataFrame
fromDatetimeIndex
toPeriodIndex
.to_pickle
(path[, compression, protocol, ...])Pickle (serialize) object to file.
to_records
([index, column_dtypes, index_dtypes])Convert
DataFrame
to a NumPy record array.to_snowflake
(name[, if_exists, index, ...])Save the Snowpark pandas DataFrame as a Snowflake table.
to_snowpark
([index, index_label])Convert the Snowpark pandas DataFrame to a Snowpark DataFrame.
to_sql
(name, con[, schema, if_exists, ...])Write records stored in a BasePandasDataset to a SQL database.
to_stata
(path[, convert_dates, write_index, ...])to_string
([buf, columns, col_space, header, ...])Render a BasePandasDataset to a console-friendly tabular output.
to_timestamp
([freq, how, axis, copy])Cast to DatetimeIndex of timestamps, at beginning of period.
to_xarray
()Return an xarray object from the BasePandasDataset.
to_xml
([path_or_buffer, index, root_name, ...])transform
(func[, axis])Call
func
on self producing a Snowpark pandas DataFrame with the same axis shape as self.transpose
([copy])Transpose index and columns.
truediv
(other[, axis, level, fill_value])Get floating division of
DataFrame
and other, element-wise (binary operator truediv).truncate
([before, after, axis, copy])Truncate a BasePandasDataset before and after some index value.
tz_convert
(tz[, axis, level, copy])Convert tz-aware axis to target time zone.
tz_localize
(tz[, axis, level, copy, ...])Localize tz-naive index of a BasePandasDataset to target time zone.
unstack
([level, fill_value, sort])Pivot a level of the (necessarily hierarchical) index labels.
update
(other[, join, overwrite, ...])Modify in place using non-NA values from another
DataFrame
.value_counts
([subset, normalize, sort, ...])Return a Series containing the frequency of each distinct row in the Dataframe.
var
([axis, skipna, ddof, numeric_only])Return unbiased variance over requested axis.
where
(cond[, other, inplace, axis, level])Replace values where the condition is False.
xs
(key[, axis, level, drop_level])Return cross-section from the
DataFrame
.Attributes
Transpose index and columns.
at
Get a single value for a row/column label pair.
attrs
Return a list representing the axes of the DataFrame.
Get the columns for this Snowpark pandas
DataFrame
.Return the dtypes in the
DataFrame
.Indicator whether the DataFrame is empty.
flags
iat
Get a single value for a row/column pair by integer position.
Purely integer-location based indexing for selection by position.
Get the index for this Series/DataFrame.
Access a group of rows and columns by label(s) or a boolean array.
modin
Return the number of dimensions of the underlying data, by definition 2.
plot
Make plots of
DataFrame
.Return a tuple representing the dimensionality of the
DataFrame
.Return an int representing the number of elements in this object.
sparse
style
Return a NumPy representation of the dataset.