modin.pandas.DataFrameGroupBy.var¶
- DataFrameGroupBy.var(ddof: int = 1, engine: Optional[Literal['cython', 'numba']] = None, engine_kwargs: Optional[dict[str, bool]] = None, numeric_only: bool = False)[source]¶
Compute variance of groups, excluding missing values.
For multiple groupings, the result index will be a MultiIndex.
- Parameters:
ddof (int, default 1) – Degrees of freedom. When ddof is 0/1, the operation is executed with Snowflake. Otherwise, it is not yet supported.
engine (str, default None) –
In pandas, engine can be configured as
'cython'
or'numba'
, andNone
defaults to'cython'
or globally settingcompute.use_numba
.This parameter is ignored in Snowpark pandas, as the execution is always performed in Snowflake.
engine_kwargs (dict, default None) –
Configuration keywords for the configured execution egine.
This parameter is ignored in Snowpark pandas, as the execution is always performed in Snowflake.
numeric_only (bool, default False) – Include only float, int or boolean data columns.
- Returns:
Variance of values within each group.
- Return type:
Examples
For SeriesGroupBy:
>>> lst = ['a', 'a', 'a', 'b', 'b', 'b', 'c'] >>> ser = pd.Series([7, 2, 8, 4, 3, 3, 1], index=lst) >>> ser a 7 a 2 a 8 b 4 b 3 b 3 c 1 dtype: int64 >>> ser.groupby(level=0).var() a 10.333333 b 0.333333 c NaN dtype: float64 >>> ser.groupby(level=0).var(ddof=0) a 6.888889 b 0.222222 c 0.000000 dtype: float64
Note that if the number of elements in a group is less or equal to the ddof, the result for the group will be NaN/None. For example, the value for group c is NaN when we call ser.groupby(level=0).var(), and the default ddof is 1.
For DataFrameGroupBy:
>>> data = {'a': [1, 3, 5, 7, 7, 8, 3], 'b': [1, 4, 8, 4, 4, 2, 1]} >>> df = pd.DataFrame(data, index=pd.Index(['dog', 'dog', 'dog', ... 'mouse', 'mouse', 'mouse', 'mouse'], name='c')) >>> df a b c dog 1 1 dog 3 4 dog 5 8 mouse 7 4 mouse 7 4 mouse 8 2 mouse 3 1 >>> df.groupby('c').var() a b c dog 4.000000 12.333333 mouse 4.916667 2.250000 >>> data = {'a': [1, 3, 5, 7, 7, 8, 3], 'b': ['c', 'e', 'd', 'a', 'a', 'b', 'e']} >>> df = pd.DataFrame(data, index=pd.Index(['dog', 'dog', 'dog', ... 'mouse', 'mouse', 'mouse', 'mouse'], name='c')) >>> df a b c dog 1 c dog 3 e dog 5 d mouse 7 a mouse 7 a mouse 8 b mouse 3 e >>> df.groupby('c').var(numeric_only=True) a c dog 4.000000 mouse 4.916667