snowflake.snowpark.RelationalGroupedDataFrame.ai_agg¶

RelationalGroupedDataFrame.ai_agg(expr: Union[snowflake.snowpark.column.Column, str], task_description: str, **kwargs) → DataFrame[source]¶

Aggregate a column of text data using a natural language task description.

This method reduces a column of text by performing a natural language aggregation as described in the task description for each group. For instance, it can summarize large datasets or extract specific insights per group.

Parameters:

expr – The column (Column object or column name as string) containing the text data on which the aggregation operation is to be performed.
task_description – A plain English string that describes the aggregation task, such as “Summarize the product reviews for a blog post targeting consumers” or “Identify the most positive review and translate it into French and Polish, one word only”.

Returns:

A DataFrame with one row per group containing the aggregated result.

Example:

>>> df = session.create_dataframe([
...     ["electronics", "Excellent product, highly recommend!"],
...     ["electronics", "Great quality and fast shipping"],
...     ["clothing", "Perfect fit and great material"],
...     ["clothing", "Poor quality, very disappointed"],
... ], schema=["category", "review"])
>>> summary_df = df.group_by("category").ai_agg(
...     expr="review",
...     task_description="Summarize these product reviews for a blog post targeting consumers"
... )
>>> summary_df.count()
2

Note

For optimal performance, follow these guidelines:

Use plain English text for the task description.

Describe the text provided in the task description. For example, instead of a task description like “summarize”, use “Summarize the phone call transcripts”.

Describe the intended use case. For example, instead of “find the best review”, use “Find the most positive and well-written restaurant review to highlight on the restaurant website”.

Consider breaking the task description into multiple steps.

This function or method is experimental since 1.39.0.