snowflake.snowpark.DataFrameAnalyticsFunctions.time_series_agg¶
- DataFrameAnalyticsFunctions.time_series_agg(time_col: str, aggs: ~typing.Dict[str, ~typing.List[str]], windows: ~typing.List[str], group_by: ~typing.List[str], sliding_interval: str, col_formatter: ~typing.Callable[[str, str, int], str] = <function DataFrameAnalyticsFunctions._default_col_formatter>) DataFrame [source]¶
Applies aggregations to the specified columns of the DataFrame over specified time windows, and grouping criteria.
- Parameters:
aggs – A dictionary where keys are column names and values are lists of the desired aggregation functions.
windows – Time windows for aggregations using strings such as ‘7D’ for 7 days, where the units are S: Seconds, M: Minutes, H: Hours, D: Days, W: Weeks, MM: Months, Y: Years. For future-oriented analysis, use positive numbers, and for past-oriented analysis, use negative numbers.
sliding_interval – Interval at which the window slides, specified in the same format as the windows.
group_by – A list of column names on which the DataFrame is partitioned for separate window calculations.
col_formatter – An optional function for formatting output column names, defaulting to the format ‘<input_col>_<agg>_<window>’. This function takes three arguments: ‘input_col’ (str) for the column name, ‘operation’ (str) for the applied operation, and ‘value’ (int) for the window size, and returns a formatted string for the column name.
- Returns:
A Snowflake DataFrame with additional columns corresponding to each specified time window aggregation.
- Raises:
ValueError – If an unsupported value is specified in arguments.
TypeError – If an unsupported type is specified in arguments.
SnowparkSQLException – If an unsupported aggregration is specified.
Example
>>> sample_data = [ ... ["2023-01-01", 101, 200], ... ["2023-01-02", 101, 100], ... ["2023-01-03", 101, 300], ... ["2023-01-04", 102, 250], ... ] >>> df = session.create_dataframe(sample_data).to_df( ... "ORDERDATE", "PRODUCTKEY", "SALESAMOUNT" ... ) >>> df = df.with_column("ORDERDATE", to_timestamp(df["ORDERDATE"])) >>> def custom_formatter(input_col, agg, window): ... return f"{agg}_{input_col}_{window}" >>> res = df.analytics.time_series_agg( ... time_col="ORDERDATE", ... group_by=["PRODUCTKEY"], ... aggs={"SALESAMOUNT": ["SUM", "MAX"]}, ... windows=["1D", "-1D"], ... sliding_interval="12H", ... col_formatter=custom_formatter, ... ) >>> res.show() -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |"PRODUCTKEY" |"SLIDING_POINT" |"SALESAMOUNT" |"ORDERDATE" |"SUM_SALESAMOUNT_1D" |"MAX_SALESAMOUNT_1D" |"SUM_SALESAMOUNT_-1D" |"MAX_SALESAMOUNT_-1D" | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |101 |2023-01-01 00:00:00 |200 |2023-01-01 00:00:00 |300 |200 |200 |200 | |101 |2023-01-02 00:00:00 |100 |2023-01-02 00:00:00 |400 |300 |300 |200 | |101 |2023-01-03 00:00:00 |300 |2023-01-03 00:00:00 |300 |300 |400 |300 | |102 |2023-01-04 00:00:00 |250 |2023-01-04 00:00:00 |250 |250 |250 |250 | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------
This function or method is experimental since 1.12.0.