modin.pandas.crosstab¶

modin.pandas.crosstab(index, columns, values=None, rownames=None, colnames=None, aggfunc=None, margins=False, margins_name: str = 'All', dropna: bool = True, normalize=False) → DataFrame[source]¶

Compute a simple cross tabulation of two (or more) factors.

By default, computes a frequency table of the factors unless an array of values and an aggregation function are passed.

Parameters:

index (array-like, Series, or list of arrays/Series) – Values to group by in the rows.
columns (array-like, Series, or list of arrays/Series) – Values to group by in the columns.
values (array-like, optional) – Array of values to aggregate according to the factors. Requires aggfunc be specified.
rownames (sequence, default None) – If passed, must match number of row arrays passed.
colnames (sequence, default None) – If passed, must match number of column arrays passed.
aggfunc (function, optional) – If specified, requires values be specified as well.
margins (bool, default False) – Add row/column margins (subtotals).
margins_name (str, default 'All') – Name of the row/column that will contain the totals when margins is True.
dropna (bool, default True) – Do not include columns whose entries are all NaN.
normalize (bool, {'all', 'index', 'columns'}, or {0,1}, default False) –
Normalize by dividing all values by the sum of values.
- If passed ‘all’ or True, will normalize over all values.
- If passed ‘index’ will normalize over each row.
- If passed ‘columns’ will normalize over each column.
- If margins is True, will also normalize margin values.

Returns:

Cross tabulation of the data.

Return type:

Snowpark pandas DataFrame

Notes

Raises NotImplementedError if aggfunc is not one of “count”, “mean”, “min”, “max”, or “sum”, or margins is True, normalize is True or all, and values is passed.

Examples

>>> a = np.array(["foo", "foo", "foo", "foo", "bar", "bar",
...               "bar", "bar", "foo", "foo", "foo"], dtype=object)
>>> b = np.array(["one", "one", "one", "two", "one", "one",
...               "one", "two", "two", "two", "one"], dtype=object)
>>> c = np.array(["dull", "dull", "shiny", "dull", "dull", "shiny",
...               "shiny", "dull", "shiny", "shiny", "shiny"],
...              dtype=object)
>>> pd.crosstab(a, [b, c], rownames=['a'], colnames=['b', 'c']) 
b    one        two
c   dull shiny dull shiny
a
bar    1     2    1     0
foo    2     2    1     2