snowflake.snowpark.DataFrameAIFunctions.filter¶
- DataFrameAIFunctions.filter(predicate: str, input_columns: ~typing.Union[~typing.List[~snowflake.snowpark.column.Column], ~typing.Dict[str, ~snowflake.snowpark.column.Column]], *) snowflake.snowpark.DataFrame[source]¶
Filter rows using AI-powered boolean classification.
This method applies AI-based filtering to each row, classifying them as True or False based on the provided predicate. Supports both text-based filtering and image filtering.
- Parameters:
predicate – The classification instruction string. Use placeholders like
{name}when passing a dict of columns, or{0},{1}when passing a list. For file-based filtering, this should contain instructions to classify the file as TRUE or FALSE.input_columns – Optional list of Columns (positional placeholders
{0},{1}, …) or a dict mapping placeholder names to Columns. Used when predicate contains placeholders.
Examples:
>>> # Simple text filtering without placeholders >>> df = session.create_dataframe( ... [["This is great!"], ["This is terrible!"], ["This is okay."]], ... schema=["review"] ... ) >>> positive_df = df.ai.filter("Is this review positive?", input_columns=[df["review"]]) >>> positive_df.count() # Should be 1 (only "This is great!") 1 >>> # Text filtering with named placeholders >>> df = session.create_dataframe( ... [["Switzerland", "Europe"], ["Korea", "Asia"], ["Brazil", "South America"]], ... schema=["country", "continent"] ... ) >>> european_df = df.ai.filter( ... "Is {country} located in {continent} and specifically in Europe?", ... input_columns={"country": df["country"], "continent": df["continent"]} ... ) >>> european_df.collect()[0]["COUNTRY"] 'Switzerland' >>> # Image filtering with positional placeholders >>> from snowflake.snowpark.functions import to_file >>> # Upload images to a stage first >>> _ = session.sql("CREATE OR REPLACE TEMP STAGE mystage ENCRYPTION = (TYPE = 'SNOWFLAKE_SSE')").collect() >>> _ = session.file.put("tests/resources/dog.jpg", "@mystage", auto_compress=False) >>> _ = session.file.put("tests/resources/cat.jpeg", "@mystage", auto_compress=False) >>> df = session.read.file("@mystage") >>> dog_images_df = df.ai.filter( ... "Does this image contain a dog?", ... input_columns=[df["FILE"]] ... ) >>> dog_images_df.count() # Should be 1 (only dog image) 1
This function or method is experimental since 1.39.0.