snowflake.snowpark.DataFrameStatFunctions.sample_by¶
- DataFrameStatFunctions.sample_by(col: Union[Column, str], fractions: Dict[Union[None, bool, int, float, str, bytearray, Decimal, date, datetime, time, bytes, NaTType, float64, list, tuple, dict], float], seed: Optional[int] = None) DataFrame[source]¶
- Returns a DataFrame containing a stratified sample without replacement, based on a - dictthat specifies the fraction for each stratum.- Example: - >>> df = session.create_dataframe([("Bob", 17), ("Alice", 10), ("Nico", 8), ("Bob", 12)], schema=["name", "age"]) >>> fractions = {"Bob": 0.5, "Nico": 1.0} >>> sample_df = df.stat.sample_by("name", fractions) # non-deterministic result - Parameters:
- col – The name of the column that defines the strata. 
- fractions – A - dictthat specifies the fraction to use for the sample for each stratum. If a stratum is not specified in the- dict, the method uses 0 as the fraction.
- seed – Specifies a seed value to make the sampling deterministic. Can be any integer between 0 and 2147483647 inclusive. Default value is - None. This parameter is only supported for- Table, and it will be ignored if it is specified for :class`DataFrame`.