modin.pandas.DataFrame¶

class modin.pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=None, query_compiler=None)[source]¶

Bases: BasePandasDataset

Snowpark pandas representation of pandas.DataFrame with a lazily-evaluated relational dataset.

A DataFrame is considered lazy because it encapsulates the computation or query required to produce the final dataset. The computation is not performed until the datasets need to be displayed, or I/O methods like to_pandas, to_snowflake are called.

Internally, the underlying data are stored as Snowflake table with rows and columns.

Parameters:

data (DataFrame, Series, pandas.DataFrame, ndarray, Iterable or dict, optional) – Dict can contain Series, arrays, constants, dataclass or list-like objects. If data is a dict, column order follows insertion-order.
index (Index or array-like, optional) – Index to use for resulting frame. Will default to RangeIndex if no indexing information part of input data and no index provided.
columns (Index or array-like, optional) – Column labels to use for resulting frame. Will default to RangeIndex if no column labels are provided.
dtype (str, np.dtype, or pandas.ExtensionDtype, optional) – Data type to force. Only a single dtype is allowed. If None, infer.
copy (bool, default: False) – Copy data from inputs. Only affects pandas.DataFrame / 2d ndarray input.
query_compiler (BaseQueryCompiler, optional) – A query compiler object to create the DataFrame from.

Notes

DataFrame can be created either from passed data or query_compiler. If both parameters are provided, data source will be prioritized in the next order:

Modin DataFrame or Series passed with data parameter.
Query compiler from the query_compiler parameter.
Various pandas/NumPy/Python data structures passed with data parameter.

The last option is less desirable since import of such data structures is very inefficient, please use previously created Modin structures from the fist two options or import data using highly efficient Modin IO tools (for example pd.read_csv).

Examples

Creating a Snowpark pandas DataFrame from a dictionary:

>>> d = {'col1': [1, 2], 'col2': [3, 4]}
>>> df = pd.DataFrame(data=d)
>>> df
   col1  col2
0     1     3
1     2     4

Constructing DataFrame from numpy ndarray:

>>> df2 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
...                    columns=['a', 'b', 'c'])
>>> df2
   a  b  c
0  1  2  3
1  4  5  6
2  7  8  9

Constructing DataFrame from a numpy ndarray that has labeled columns:

>>> data = np.array([(1, 2, 3), (4, 5, 6), (7, 8, 9)],
...                 dtype=[("a", "i4"), ("b", "i4"), ("c", "i4")])
>>> df3 = pd.DataFrame(data, columns=['c', 'a'])
...
>>> df3
   c  a
0  3  1
1  6  4
2  9  7

Constructing DataFrame from Series/DataFrame:

>>> ser = pd.Series([1, 2, 3], index=["a", "b", "c"], name = "s")
>>> df = pd.DataFrame(data=ser, index=["a", "c"])
>>> df
   s
a  1
c  3
>>> df1 = pd.DataFrame([1, 2, 3], index=["a", "b", "c"], columns=["x"])
>>> df2 = pd.DataFrame(data=df1, index=["a", "c"])
>>> df2
   x
a  1
c  3

Methods


`abs`()	Return a DataFrame with absolute numeric value of each element.
`add`(other[, axis, level, fill_value])	Get addition of `DataFrame` and other, element-wise (binary operator add).
`add_prefix`(prefix)	Prefix labels with string prefix.
`add_suffix`(suffix)	Suffix labels with string suffix.
`agg`([func, axis])	Aggregate using one or more operations over the specified axis.
`aggregate`([func, axis])	Aggregate using one or more operations over the specified axis.
`apply`(func[, axis, raw, result_type, args])	Apply a function along an axis of the DataFrame.
`applymap`(func[, na_action])	Apply a function to a Dataframe elementwise.
`assign`(**kwargs)	Assign new columns to a `DataFrame`.
`boxplot`([column, by, ax, fontsize, rot, ...])	Make a box plot from `DataFrame` columns.
`cache_result`([inplace])	Persists the current Snowpark pandas DataFrame to a temporary table to improve the latency of subsequent operations.
`combine`(other, func[, fill_value, overwrite])	Perform column-wise combine with another `DataFrame`.
`compare`(other[, align_axis, keep_shape, ...])	Compare to another DataFrame and show the differences.
`corr`([method, min_periods, numeric_only])	Compute pairwise correlation of columns, excluding NA/null values.
`corrwith`(other[, axis, drop, method, ...])	Compute pairwise correlation.
`cov`([min_periods, ddof, numeric_only])
`diff`([periods, axis])	First discrete difference of element.
`div`(other[, axis, level, fill_value])	Get floating division of `DataFrame` and other, element-wise (binary operator truediv).
`divide`(other[, axis, level, fill_value])	Get floating division of `DataFrame` and other, element-wise (binary operator truediv).
`dot`(other)	Compute the matrix multiplication between the `DataFrame` and other.
`drop`([labels, axis, index, columns, level, ...])	Drop specified labels from rows or columns.
`drop_duplicates`([subset, keep, inplace, ...])	Return `DataFrame` with duplicate rows removed.
`dropna`(*[, axis, how, thresh, subset, inplace])	Remove missing values.
`duplicated`([subset, keep])	Return boolean Series denoting duplicate rows.
`eq`(other[, axis, level])	Perform equality comparison of `DataFrame` and other (binary operator eq).
`equals`(other)	Test whether two dataframes contain the same elements.
`eval`(expr[, inplace])	Evaluate a string describing operations on `DataFrame` columns.
`fillna`([value, method, axis, inplace, ...])	Fill NA/NaN values using the specified method.
`floordiv`(other[, axis, level, fill_value])	Get integer division of `DataFrame` and other, element-wise (binary operator floordiv).
`from_dict`(data[, orient, dtype, columns])	Construct `DataFrame` from dict of array-like or dicts.
`from_records`(data[, index, exclude, ...])	Convert structured or record ndarray to `DataFrame`.
`ge`(other[, axis, level])	Get greater than or equal comparison of `DataFrame` and other, element-wise (binary operator ge).
`groupby`([by, axis, level, as_index, sort, ...])	Group DataFrame using a mapper or by a Series of columns.
`gt`(other[, axis, level])	Get greater than comparison of `DataFrame` and other, element-wise (binary operator ge).
`hist`([column, by, grid, xlabelsize, xrot, ...])	Make a histogram of the `DataFrame`.
`info`([verbose, buf, max_cols, memory_usage, ...])	Print a concise summary of the `DataFrame`.
`insert`(loc, column, value[, allow_duplicates])	Insert column into DataFrame at specified location.
`interpolate`([method, axis, limit, inplace, ...])
`isetitem`(loc, value)
`isin`(values)	Whether each element in the DataFrame is contained in values.
`isna`()	Detect missing values.
`isnull`()	DataFrame.isnull is an alias for DataFrame.isna.
`items`()	Iterate over (column name, `Series`) pairs.
`iterrows`()	Iterate over `DataFrame` rows as (index, `Series`) pairs.
`itertuples`([index, name])	Iterate over DataFrame rows as namedtuples.
`join`(other[, on, how, lsuffix, rsuffix, ...])	Join columns of another DataFrame.
`keys`()	Get columns of the `DataFrame`.
`le`(other[, axis, level])	Get less than or equal comparison of `DataFrame` and other, element-wise (binary operator le).
`lt`(other[, axis, level])	Get less than comparison of `DataFrame` and other, element-wise (binary operator le).
`mask`(cond[, other, inplace, axis, level])	Replace values where the condition is True.
`melt`([id_vars, value_vars, var_name, ...])	Unpivot a `DataFrame` from wide to long format, optionally leaving identifiers set.
`memory_usage`([index, deep])	Return the memory usage of each column in bytes.
`merge`(right[, how, on, left_on, right_on, ...])	Merge DataFrame or named Series objects with a database-style join.
`mod`(other[, axis, level, fill_value])	Get modulo of `DataFrame` and other, element-wise (binary operator mod).
`mul`(other[, axis, level, fill_value])	Get multiplication of `DataFrame` and other, element-wise (binary operator mul).
`multiply`(other[, axis, level, fill_value])	Get multiplication of `DataFrame` and other, element-wise (binary operator mul).
`ne`(other[, axis, level])	Get not equal comparison of `DataFrame` and other, element-wise (binary operator ne).
`nlargest`(n, columns[, keep])	Return the first n rows ordered by columns in descending order.
`nsmallest`(n, columns[, keep])	Return the first n rows ordered by columns in ascending order.
`pivot`(*, columns[, index, values])	Return reshaped DataFrame organized by given index / column values.
`pivot_table`([values, index, columns, ...])	Create a spreadsheet-style pivot table as a `DataFrame`.
`pow`(other[, axis, level, fill_value])	Get exponential power of `DataFrame` and other, element-wise (binary operator pow).
`prod`([axis, skipna, numeric_only, min_count])	Return the product of the values over the requested axis.
`product`([axis, skipna, numeric_only, min_count])	Return the product of the values over the requested axis.
`quantile`([q, axis, numeric_only, ...])	Return values at the given quantile over requested axis.
`query`(expr[, inplace])	Query the columns of a `DataFrame` with a boolean expression.
`radd`(other[, axis, level, fill_value])	Get addition of `DataFrame` and other, element-wise (binary operator radd).
`rdiv`(other[, axis, level, fill_value])	Get floating division of `DataFrame` and other, element-wise (binary operator rtruediv).
`reindex`([labels, index, columns, axis, ...])	Conform DataFrame to new index with optional filling logic.
`rename`([mapper, index, columns, axis, copy, ...])	Rename columns or index labels.
`replace`([to_replace, value, inplace, limit, ...])	Replace values given in to_replace with value.
`rfloordiv`(other[, axis, level, fill_value])	Get integer division of `DataFrame` and other, element-wise (binary operator rfloordiv).
`rmod`(other[, axis, level, fill_value])	Get modulo of `DataFrame` and other, element-wise (binary operator rmod).
`rmul`(other[, axis, level, fill_value])	Get multiplication of `DataFrame` and other, element-wise (binary operator mul).
`round`([decimals])	Round a DataFrame to a variable number of decimal places.
`rpow`(other[, axis, level, fill_value])	Get exponential power of `DataFrame` and other, element-wise (binary operator rpow).
`rsub`(other[, axis, level, fill_value])	Get subtraction of `DataFrame` and other, element-wise (binary operator rsub).
`rtruediv`(other[, axis, level, fill_value])	Get floating division of `DataFrame` and other, element-wise (binary operator rtruediv).
`select_dtypes`([include, exclude])	Return a subset of the `DataFrame`'s columns based on the column dtypes.
`set_axis`(labels, *[, axis, copy])	Assign desired index to given axis.
`set_index`(keys[, drop, append, inplace, ...])	Set the DataFrame index using existing columns.
`shift`([periods, freq, axis, fill_value, suffix])	Shift data by desired number of periods along axis and replace columns with fill_value (default: None).
`squeeze`([axis])	Squeeze 1 dimensional axis objects into scalars.
`stack`([level, dropna, sort, future_stack])	Stack the prescribed level(s) from columns to index.
`sub`(other[, axis, level, fill_value])	Get subtraction of `DataFrame` and other, element-wise (binary operator sub).
`subtract`(other[, axis, level, fill_value])	Get subtraction of `DataFrame` and other, element-wise (binary operator sub).
`to_dict`([orient, into])	Convert the DataFrame to a dictionary.
`to_feather`(path, **kwargs)	Write a `DataFrame` to the binary Feather format.
`to_gbq`(destination_table[, project_id, ...])	Write a `DataFrame` to a Google BigQuery table.
`to_html`([buf, columns, col_space, header, ...])	Render a `DataFrame` as an HTML table.
`to_orc`([path, engine, index, engine_kwargs])
`to_pandas`(*[, statement_params])	Convert Snowpark pandas DataFrame to pandas.DataFrame
`to_parquet`([path, engine, compression, ...])
`to_period`([freq, axis, copy])	Convert `DataFrame` from `DatetimeIndex` to `PeriodIndex`.
`to_records`([index, column_dtypes, index_dtypes])	Convert `DataFrame` to a NumPy record array.
`to_snowflake`(name[, if_exists, index, ...])	Save the Snowpark pandas DataFrame as a Snowflake table.
`to_snowpark`([index, index_label])	Convert the Snowpark pandas DataFrame to a Snowpark DataFrame.
`to_stata`(path[, convert_dates, write_index, ...])
`to_timestamp`([freq, how, axis, copy])	Cast to DatetimeIndex of timestamps, at beginning of period.
`to_xml`([path_or_buffer, index, root_name, ...])
`transform`(func[, axis])	Call `func` on self producing a Snowpark pandas DataFrame with the same axis shape as self.
`transpose`([copy])	Transpose index and columns.
`truediv`(other[, axis, level, fill_value])	Get floating division of `DataFrame` and other, element-wise (binary operator truediv).
`unstack`([level, fill_value, sort])	Pivot a level of the (necessarily hierarchical) index labels.
`update`(other[, join, overwrite, ...])	Modify in place using non-NA values from another `DataFrame`.
`value_counts`([subset, normalize, sort, ...])	Return a Series containing the frequency of each distinct row in the Dataframe.
`where`(cond[, other, inplace, axis, level])	Replace values where the condition is False.
`xs`(key[, axis, level, drop_level])	Return cross-section from the `DataFrame`.

Attributes


`T`	Transpose index and columns.
`attrs`
`axes`	Return a list representing the axes of the DataFrame.
`columns`	Get the columns for this Snowpark pandas `DataFrame`.
`dtypes`	Return the dtypes in the `DataFrame`.
`empty`	Indicator whether the DataFrame is empty.
`ndim`	Return the number of dimensions of the underlying data, by definition 2.
`plot`	Make plots of `DataFrame`.
`shape`	Return a tuple representing the dimensionality of the `DataFrame`.
`style`