You are viewing documentation about an older version (1.18.0). View latest version

snowflake.snowpark.DataFrameAnalyticsFunctions.cumulative_agg¶

DataFrameAnalyticsFunctions.cumulative_agg(aggs: ~typing.Dict[str, ~typing.List[str]], group_by: ~typing.List[str], order_by: ~typing.List[str], is_forward: bool, col_formatter: ~typing.Callable[[str, str], str] = <function DataFrameAnalyticsFunctions._default_col_formatter>) → DataFrame[source]¶

Applies cummulative aggregations to the specified columns of the DataFrame using defined window direction, and grouping and ordering criteria.

Parameters:
  • aggs – A dictionary where keys are column names and values are lists of the desired aggregation functions.

  • order_by – A list of column names that specify the order in which rows are processed.

  • group_by – A list of column names on which the DataFrame is partitioned for separate window calculations.

  • is_forward – A boolean indicating the direction of accumulation. True for ‘forward’ and False for ‘backward’.

  • col_formatter – An optional function for formatting output column names, defaulting to the format ‘<input_col>_<agg>’. This function takes two arguments: ‘input_col’ (str) for the column name, ‘operation’ (str) for the applied operation, and returns a formatted string for the column name.

Returns:

A Snowflake DataFrame with additional columns corresponding to each specified cumulative aggregation.

Raises:
  • ValueError – If an unsupported value is specified in arguments.

  • TypeError – If an unsupported type is specified in arguments.

  • SnowparkSQLException – If an unsupported aggregration is specified.

Example

>>> sample_data = [
...     ["2023-01-01", 101, 200],
...     ["2023-01-02", 101, 100],
...     ["2023-01-03", 101, 300],
...     ["2023-01-04", 102, 250],
... ]
>>> df = session.create_dataframe(sample_data).to_df(
...     "ORDERDATE", "PRODUCTKEY", "SALESAMOUNT"
... )
>>> res = df.analytics.cumulative_agg(
...     aggs={"SALESAMOUNT": ["SUM", "MIN", "MAX"]},
...     group_by=["PRODUCTKEY"],
...     order_by=["ORDERDATE"],
...     is_forward=True
... )
>>> res.show()
----------------------------------------------------------------------------------------------------------
|"ORDERDATE"  |"PRODUCTKEY"  |"SALESAMOUNT"  |"SALESAMOUNT_SUM"  |"SALESAMOUNT_MIN"  |"SALESAMOUNT_MAX"  |
----------------------------------------------------------------------------------------------------------
|2023-01-03   |101           |300            |300                |300                |300                |
|2023-01-02   |101           |100            |400                |100                |300                |
|2023-01-01   |101           |200            |600                |100                |300                |
|2023-01-04   |102           |250            |250                |250                |250                |
----------------------------------------------------------------------------------------------------------
Copy