snowflake.snowpark.RelationalGroupedDataFrame.ai_agg

RelationalGroupedDataFrame.ai_agg(expr: Union[snowflake.snowpark.column.Column, str], task_description: str, **kwargs) DataFrame[source]

Aggregate a column of text data using a natural language task description.

This method reduces a column of text by performing a natural language aggregation as described in the task description for each group. For instance, it can summarize large datasets or extract specific insights per group.

Parameters:
  • expr – The column (Column object or column name as string) containing the text data on which the aggregation operation is to be performed.

  • task_description – A plain English string that describes the aggregation task, such as “Summarize the product reviews for a blog post targeting consumers” or “Identify the most positive review and translate it into French and Polish, one word only”.

Returns:

A DataFrame with one row per group containing the aggregated result.

Example:

>>> df = session.create_dataframe([
...     ["electronics", "Excellent product, highly recommend!"],
...     ["electronics", "Great quality and fast shipping"],
...     ["clothing", "Perfect fit and great material"],
...     ["clothing", "Poor quality, very disappointed"],
... ], schema=["category", "review"])
>>> summary_df = df.group_by("category").ai_agg(
...     expr="review",
...     task_description="Summarize these product reviews for a blog post targeting consumers"
... )
>>> summary_df.count()
2
Copy

Note

For optimal performance, follow these guidelines:

  • Use plain English text for the task description.

  • Describe the text provided in the task description. For example, instead of a task description like “summarize”, use “Summarize the phone call transcripts”.

  • Describe the intended use case. For example, instead of “find the best review”, use “Find the most positive and well-written restaurant review to highlight on the restaurant website”.

  • Consider breaking the task description into multiple steps.

This function or method is experimental since 1.39.0.