snowflake.hypothesis_snowpark.dataframe_strategy

snowflake.hypothesis_snowpark.dataframe_strategy(schema: str | DataFrameSchema, session: Session, size: int | None = None) SearchStrategy[DataFrame]

Create a Hypothesis strategy for generating Snowpark DataFrames based on a given schema.

Parameters:
  • schema – A schema defining the columns, data types and checks that the generated DataFrame should satisfy. This can be a path to a JSON schema file generated by the snowflake.snowpark_checkpoints_collector.collect_dataframe_checkpoint() function when the collection mode is set to SCHEMA, or a Pandera DataFrameSchema object.

  • session – The Snowpark session to use for creating the DataFrames.

  • size – The number of rows to generate for each DataFrame. If not specified, the strategy will generate DataFrames of different sizes.

Examples

Generate a Snowpark DataFrame from a JSON schema file:

>>> from hypothesis import given
>>> from snowflake.hypothesis_snowpark import dataframe_strategy
>>> from snowflake.snowpark import DataFrame, Session

>>> @given(
...     df=dataframe_strategy(
...         schema="path/to/schema.json",
...         session=Session.builder.getOrCreate(),
...         size=10,
...     )
... )
>>> def test_my_function(df: DataFrame):
...     ...
Copy

Generate a Snowpark DataFrame from a Pandera DataFrameSchema object:

>>> import pandera as pa
>>> from hypothesis import given
>>> from snowflake.hypothesis_snowpark import dataframe_strategy
>>> from snowflake.snowpark import DataFrame, Session

>>> @given(
...    df=dataframe_strategy(
...        schema=pa.DataFrameSchema(
...            {
...                "A": pa.Column(pa.Int, checks=pa.Check.in_range(0, 10)),
...                "B": pa.Column(pa.Bool),
...            }
...        ),
...        session=Session.builder.getOrCreate(),
...        size=10,
...    )
... )
>>> def test_my_function(df: DataFrame):
...     ...
Copy

You can control aspects like the maximum number of test cases, the deadline for each test execution, verbosity levels and many others using the Hypothesis @settings decorator.

>>> from datetime import timedelta
>>> from hypothesis import given, settings
>>> from snowflake.hypothesis_snowpark import dataframe_strategy
>>> from snowflake.snowpark import DataFrame, Session

>>> @given(
...     df=dataframe_strategy(
...         schema="path/to/schema.json",
...         session=Session.builder.getOrCreate(),
...         size=10,
...     )
... )
>>> @settings(
...     deadline=timedelta(milliseconds=800),
...     max_examples=25,
... )
>>> def test_my_function(df: DataFrame):
...     ...
Copy
Returns:

A Hypothesis strategy that generates Snowpark DataFrames.