snowflake.snowpark.DataFrameAIFunctions.agg

DataFrameAIFunctions.agg(task_description: str, input_column: Union[snowflake.snowpark.column.Column, str], *, output_column: Optional[str] = None) snowflake.snowpark.DataFrame[source]

Aggregate a column of text data using a natural language task description.

This method reduces a column of text by performing a natural language aggregation as described in the task description. For instance, it can summarize large datasets or extract specific insights.

Parameters:
  • task_description – A plain English string that describes the aggregation task, such as “Summarize the product reviews for a blog post targeting consumers” or “Identify the most positive review and translate it into French and Polish, one word only”.

  • input_column – The column (Column object or column name as string) containing the text data on which the aggregation operation is to be performed.

  • output_column – The name of the output column to be appended. If not provided, a column named AI_AGG_OUTPUT is appended.

Examples:

>>> # Aggregate product reviews
>>> df = session.create_dataframe([
...     ["Excellent product, highly recommend!"],
...     ["Great quality and fast shipping"],
...     ["Average product, nothing special"],
...     ["Poor quality, very disappointed"],
... ], schema=["review"])
>>> summary_df = df.ai.agg(
...     task_description="Summarize these product reviews for a blog post targeting consumers",
...     input_column="review",
...     output_column="summary"
... )
>>> summary_df.columns
['SUMMARY']
>>> summary_df.count()
1

>>> # Aggregate with Column object
>>> from snowflake.snowpark.functions import col
>>> df = session.create_dataframe([
...     ["Customer service was excellent"],
...     ["Product arrived damaged"],
...     ["Great value for money"],
...     ["Would buy again"],
... ], schema=["feedback"])
>>> insights_df = df.ai.agg(
...     task_description="Extract the main positive and negative points from customer feedback",
...     input_column=col("feedback"),
...     output_column="insights"
... )
>>> insights_df.count()
1
Copy

Note

For optimal performance, follow these guidelines:

  • Use plain English text for the task description.

  • Describe the text provided in the task description. For example, instead of a task description like “summarize”, use “Summarize the phone call transcripts”.

  • Describe the intended use case. For example, instead of “find the best review”, use “Find the most positive and well-written restaurant review to highlight on the restaurant website”.

  • Consider breaking the task description into multiple steps. For example, instead of “Summarize the new articles”, use “You will be provided with news articles from various publishers presenting events from different points of view. Please create a concise and elaborative summary of source texts without missing any crucial information.”.

This function or method is experimental since 1.39.0.