snowflake.snowpark.modin.plugin.extensions.groupby_overrides.DataFrameGroupBy.std¶
- DataFrameGroupBy.std(ddof: int = 1, engine: Optional[Literal['cython', 'numba']] = None, engine_kwargs: Optional[dict[str, bool]] = None, numeric_only: bool = False)[source]¶
Compute standard deviation of groups, excluding missing values.
For multiple groupings, the result index will be a MultiIndex.
- Parameters:
ddof (int, default 1.) –
Degrees of freedom.
Snowpark pandas currently only supports ddof=0 and ddof=1.
engine (str, default None) –
In pandas, engine can be configured as
'cython'
or'numba'
, andNone
defaults to'cython'
or globally settingcompute.use_numba
.This parameter is ignored in Snowpark pandas, as the execution is always performed in Snowflake.
engine_kwargs (dict, default None) –
Configuration keywords for the configured execution egine.
This parameter is ignored in Snowpark pandas, as the execution is always performed in Snowflake.
numeric_only (bool, default False) – Include only float, int or boolean data columns.
- Returns:
Standard deviation of values within each group.
- Return type:
Examples
For SeriesGroupBy:
>>> lst = ['a', 'a', 'a', 'b', 'b', 'b', 'c'] >>> ser = pd.Series([7, 2, 8, 4, 3, 3, 1], index=lst) >>> ser a 7 a 2 a 8 b 4 b 3 b 3 c 1 dtype: int64 >>> ser.groupby(level=0).std() a 3.21455 b 0.57735 c NaN dtype: float64 >>> ser.groupby(level=0).std(ddof=0) a 2.624669 b 0.471404 c 0.000000 dtype: float64
Note that if the number of elements in a group is less or equal to the ddof, the result for the group will be NaN/None. For example, the value for group c is NaN when we call ser.groupby(level=0).std(), and the default ddof is 1.
For DataFrameGroupBy:
>>> data = {'a': [1, 3, 5, 7, 7, 8, 3], 'b': [1, 4, 8, 4, 4, 2, 1]} >>> df = pd.DataFrame(data, index=pd.Index(['dog', 'dog', 'dog', ... 'mouse', 'mouse', 'mouse', 'mouse'], name='c')) >>> df a b c dog 1 1 dog 3 4 dog 5 8 mouse 7 4 mouse 7 4 mouse 8 2 mouse 3 1 >>> df.groupby('c').std() a b c dog 2.000000 3.511885 mouse 2.217356 1.500000 >>> data = {'a': [1, 3, 5, 7, 7, 8, 3], 'b': ['c', 'e', 'd', 'a', 'a', 'b', 'e']} >>> df = pd.DataFrame(data, index=pd.Index(['dog', 'dog', 'dog', ... 'mouse', 'mouse', 'mouse', 'mouse'], name='c')) >>> df a b c dog 1 c dog 3 e dog 5 d mouse 7 a mouse 7 a mouse 8 b mouse 3 e >>> df.groupby('c').std(numeric_only=True) a c dog 2.000000 mouse 2.217356