snowflake.snowpark.Table.sampleΒΆ
- Table.sample(frac: Optional[float] = None, n: Optional[int] = None, *, seed: Optional[int] = None, sampling_method: Optional[str] = None) DataFrame [source]ΒΆ
Samples rows based on either the number of rows to be returned or a percentage of rows to be returned.
Sampling with a seed is not supported on views or subqueries. This method works on tables so it supports
seed
. This is the main difference betweenDataFrame.sample()
and this method.- Parameters:
frac β The percentage of rows to be sampled.
n β The fixed number of rows to sample in the range of 0 to 1,000,000 (inclusive). Either
frac
orn
should be provided.seed β Specifies a seed value to make the sampling deterministic. Can be any integer between 0 and 2147483647 inclusive. Default value is
None
.sampling_method β Specifies the sampling method to use: - βBERNOULLIβ (or βROWβ): Includes each row with a probability of p/100. Similar to flipping a weighted coin for each row. - βSYSTEMβ (or βBLOCKβ): Includes each block of rows with a probability of p/100. Similar to flipping a weighted coin for each block of rows. This method does not support fixed-size sampling. Default is
None
. Then the Snowflake database will use βROWβ by default.
Note
SYSTEM | BLOCK sampling is often faster than BERNOULLI | ROW sampling.
Sampling without a seed is often faster than sampling with a seed.
Fixed-size sampling can be slower than equivalent fraction-based sampling because fixed-size sampling prevents some query optimization.
Fixed-size sampling doesnβt work with SYSTEM | BLOCK sampling.