You are viewing documentation about an older version (1.16.0). View latest version

snowflake.snowpark.DataFrameAnalyticsFunctions.time_series_agg¶

DataFrameAnalyticsFunctions.time_series_agg(time_col: str, aggs: ~typing.Dict[str, ~typing.List[str]], windows: ~typing.List[str], group_by: ~typing.List[str], sliding_interval: str, col_formatter: ~typing.Callable[[str, str, int], str] = <function DataFrameAnalyticsFunctions._default_col_formatter>) → DataFrame[source]¶

Applies aggregations to the specified columns of the DataFrame over specified time windows, and grouping criteria.

Parameters:
  • aggs – A dictionary where keys are column names and values are lists of the desired aggregation functions.

  • windows – Time windows for aggregations using strings such as ‘7D’ for 7 days, where the units are S: Seconds, M: Minutes, H: Hours, D: Days, W: Weeks, MM: Months, Y: Years. For future-oriented analysis, use positive numbers, and for past-oriented analysis, use negative numbers.

  • sliding_interval – Interval at which the window slides, specified in the same format as the windows.

  • group_by – A list of column names on which the DataFrame is partitioned for separate window calculations.

  • col_formatter – An optional function for formatting output column names, defaulting to the format ‘<input_col>_<agg>_<window>’. This function takes three arguments: ‘input_col’ (str) for the column name, ‘operation’ (str) for the applied operation, and ‘value’ (int) for the window size, and returns a formatted string for the column name.

Returns:

A Snowflake DataFrame with additional columns corresponding to each specified time window aggregation.

Raises:
  • ValueError – If an unsupported value is specified in arguments.

  • TypeError – If an unsupported type is specified in arguments.

  • SnowparkSQLException – If an unsupported aggregration is specified.

Example

>>> sample_data = [
...     ["2023-01-01", 101, 200],
...     ["2023-01-02", 101, 100],
...     ["2023-01-03", 101, 300],
...     ["2023-01-04", 102, 250],
... ]
>>> df = session.create_dataframe(sample_data).to_df(
...     "ORDERDATE", "PRODUCTKEY", "SALESAMOUNT"
... )
>>> df = df.with_column("ORDERDATE", to_timestamp(df["ORDERDATE"]))
>>> def custom_formatter(input_col, agg, window):
...     return f"{agg}_{input_col}_{window}"
>>> res = df.analytics.time_series_agg(
...     time_col="ORDERDATE",
...     group_by=["PRODUCTKEY"],
...     aggs={"SALESAMOUNT": ["SUM", "MAX"]},
...     windows=["1D", "-1D"],
...     sliding_interval="12H",
...     col_formatter=custom_formatter,
... )
>>> res.show()
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|"PRODUCTKEY"  |"SLIDING_POINT"      |"SALESAMOUNT"  |"ORDERDATE"          |"SUM_SALESAMOUNT_1D"  |"MAX_SALESAMOUNT_1D"  |"SUM_SALESAMOUNT_-1D"  |"MAX_SALESAMOUNT_-1D"  |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|101           |2023-01-01 00:00:00  |200            |2023-01-01 00:00:00  |300                   |200                   |200                    |200                    |
|101           |2023-01-02 00:00:00  |100            |2023-01-02 00:00:00  |400                   |300                   |300                    |200                    |
|101           |2023-01-03 00:00:00  |300            |2023-01-03 00:00:00  |300                   |300                   |400                    |300                    |
|102           |2023-01-04 00:00:00  |250            |2023-01-04 00:00:00  |250                   |250                   |250                    |250                    |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Copy

This function or method is experimental since 1.12.0.