snowflake.snowpark.RelationalGroupedDataFrame.ai_agg¶
- RelationalGroupedDataFrame.ai_agg(expr: Union[snowflake.snowpark.column.Column, str], task_description: str, **kwargs) DataFrame [source]¶
Aggregate a column of text data using a natural language task description.
This method reduces a column of text by performing a natural language aggregation as described in the task description for each group. For instance, it can summarize large datasets or extract specific insights per group.
- Parameters:
expr – The column (Column object or column name as string) containing the text data on which the aggregation operation is to be performed.
task_description – A plain English string that describes the aggregation task, such as “Summarize the product reviews for a blog post targeting consumers” or “Identify the most positive review and translate it into French and Polish, one word only”.
- Returns:
A DataFrame with one row per group containing the aggregated result.
Example:
>>> df = session.create_dataframe([ ... ["electronics", "Excellent product, highly recommend!"], ... ["electronics", "Great quality and fast shipping"], ... ["clothing", "Perfect fit and great material"], ... ["clothing", "Poor quality, very disappointed"], ... ], schema=["category", "review"]) >>> summary_df = df.group_by("category").ai_agg( ... expr="review", ... task_description="Summarize these product reviews for a blog post targeting consumers" ... ) >>> summary_df.count() 2
Note
For optimal performance, follow these guidelines:
Use plain English text for the task description.
Describe the text provided in the task description. For example, instead of a task description like “summarize”, use “Summarize the phone call transcripts”.
Describe the intended use case. For example, instead of “find the best review”, use “Find the most positive and well-written restaurant review to highlight on the restaurant website”.
Consider breaking the task description into multiple steps.
This function or method is experimental since 1.39.0.