snowflake.snowpark_checkpoints.check_dataframe_schema¶

snowflake.snowpark_checkpoints.check_dataframe_schema(df: DataFrame, pandera_schema: DataFrameSchema, checkpoint_name: str, job_context: SnowparkJobContext | None = None, custom_checks: dict[str, list[Check]] | None = None, skip_checks: dict[Any, Any] | None = None, sample_frac: float | None = 1.0, sample_number: int | None = None, sampling_strategy: SamplingStrategy | None = 1, output_path: str | None = None) → tuple[bool, DataFrame] | None¶

Validate a DataFrame against a given Pandera schema using sampling techniques.

Parameters:

df (SnowparkDataFrame) – The DataFrame to be validated.
pandera_schema (DataFrameSchema) – The Pandera schema to validate against.
checkpoint_name (str, optional) – The name of the checkpoint to retrieve the schema. Defaults to None.
job_context (SnowparkJobContext, optional) – Context for job-related operations. Defaults to None.
custom_checks (dict[Any, Any], optional) – Custom checks to be added to the schema. Defaults to None.
skip_checks (dict[Any, Any], optional) – Checks to be skipped. Defaults to None.
sample_frac (float, optional) – Fraction of data to sample. Defaults to 0.1.
sample_number (int, optional) – Number of rows to sample. Defaults to None.
sampling_strategy (SamplingStrategy, optional) – Strategy for sampling data. Defaults to SamplingStrategy.RANDOM_SAMPLE.
output_path (str, optional) – The output path for the validation results.

Raises:

SchemaValidationError – If the DataFrame fails schema validation.

Returns:

A tuple containing the validity flag and the Pandas DataFrame. If the validation for that checkpoint is disabled it returns None.

Return type:

Union[tuple[bool, PandasDataFrame]|None]