snowflake.snowpark.DataFrameAIFunctions.filter¶

DataFrameAIFunctions.filter(predicate: str, input_columns: ~typing.Union[~typing.List[~snowflake.snowpark.column.Column], ~typing.Dict[str, ~snowflake.snowpark.column.Column]], *) → snowflake.snowpark.DataFrame[source]¶

Filter rows using AI-powered boolean classification.

This method applies AI-based filtering to each row, classifying them as True or False based on the provided predicate. Supports both text-based filtering and image filtering.

Parameters:
  • predicate – The classification instruction string. Use placeholders like {name} when passing a dict of columns, or {0}, {1} when passing a list. For file-based filtering, this should contain instructions to classify the file as TRUE or FALSE.

  • input_columns – Optional list of Columns (positional placeholders {0}, {1}, …) or a dict mapping placeholder names to Columns. Used when predicate contains placeholders.

Examples:

>>> # Simple text filtering without placeholders
>>> df = session.create_dataframe(
...     [["This is great!"], ["This is terrible!"], ["This is okay."]],
...     schema=["review"]
... )
>>> positive_df = df.ai.filter("Is this review positive?", input_columns=[df["review"]])
>>> positive_df.count()  # Should be 1 (only "This is great!")
1

>>> # Text filtering with named placeholders
>>> df = session.create_dataframe(
...     [["Switzerland", "Europe"], ["Korea", "Asia"], ["Brazil", "South America"]],
...     schema=["country", "continent"]
... )
>>> european_df = df.ai.filter(
...     "Is {country} located in {continent} and specifically in Europe?",
...     input_columns={"country": df["country"], "continent": df["continent"]}
... )
>>> european_df.collect()[0]["COUNTRY"]
'Switzerland'

>>> # Image filtering with positional placeholders
>>> from snowflake.snowpark.functions import to_file
>>> # Upload images to a stage first
>>> _ = session.sql("CREATE OR REPLACE TEMP STAGE mystage ENCRYPTION = (TYPE = 'SNOWFLAKE_SSE')").collect()
>>> _ = session.file.put("tests/resources/dog.jpg", "@mystage", auto_compress=False)
>>> _ = session.file.put("tests/resources/cat.jpeg", "@mystage", auto_compress=False)
>>> df = session.read.file("@mystage")
>>> dog_images_df = df.ai.filter(
...     "Does this image contain a dog?",
...     input_columns=[df["FILE"]]
... )
>>> dog_images_df.count()  # Should be 1 (only dog image)
1
Copy

This function or method is experimental since 1.39.0.