snowflake.snowpark.functions.window¶
- snowflake.snowpark.functions.window(time_column: Union[Column, str], window_duration: str, slide_duration: Optional[str] = None, start_time: Optional[str] = None) Column [source]¶
Converts a time column into a window object with start and end times. Window start times are inclusive while end times are exclusive. For example 9:30 is in the window [9:30, 10:00), but not [9:00, 9:30).
- Parameters:
time_column – The column to apply the window transformation to.
window_duration – An interval string that determines the length of each window.
slide_duration – An interval string representing the amount of time in-between the start of each window. Note that this parameter is not supported yet. Specifying it will raise a NotImplementedError exception.
start_time – An interval string representing the amount of time the start of each window is offset. eg. a five minute window with start_time of ‘2 minutes’ will be from [9:02, 9:07) instead of [9:00, 9:05)
Note
Interval strings are of the form ‘quantity unit’ where quantity is an integer and unitis is a supported time unit. This function supports the same time units as dateadd. see supported time units for more information.
Example:
>>> import datetime >>> from snowflake.snowpark.functions import window >>> df = session.createDataFrame( ... [(datetime.datetime.strptime("2024-10-31 09:05:00.000", "%Y-%m-%d %H:%M:%S.%f"),)], ... schema=["time"] ... ) >>> df.select(window(df.time, "5 minutes")).show() ---------------------------------------- |"WINDOW" | ---------------------------------------- |{ | | "end": "2024-10-31 09:10:00.000", | | "start": "2024-10-31 09:05:00.000" | |} | ---------------------------------------- >>> df.select(window(df.time, "5 minutes", start_time="2 minutes")).show() ---------------------------------------- |"WINDOW" | ---------------------------------------- |{ | | "end": "2024-10-31 09:07:00.000", | | "start": "2024-10-31 09:02:00.000" | |} | ----------------------------------------
Example:
>>> import datetime >>> from snowflake.snowpark.functions import sum, window >>> df = session.createDataFrame([ ... (datetime.datetime(2024, 10, 31, 1, 0, 0), 1), ... (datetime.datetime(2024, 10, 31, 2, 0, 0), 1), ... (datetime.datetime(2024, 10, 31, 3, 0, 0), 1), ... (datetime.datetime(2024, 10, 31, 4, 0, 0), 1), ... (datetime.datetime(2024, 10, 31, 5, 0, 0), 1), ... ], schema=["time", "value"] ... ) >>> df.group_by(window(df.time, "2 hours")).agg(sum(df.value)).show() ------------------------------------------------------- |"WINDOW" |"SUM(VALUE)" | ------------------------------------------------------- |{ |1 | | "end": "2024-10-31 02:00:00.000", | | | "start": "2024-10-31 00:00:00.000" | | |} | | |{ |2 | | "end": "2024-10-31 04:00:00.000", | | | "start": "2024-10-31 02:00:00.000" | | |} | | |{ |2 | | "end": "2024-10-31 06:00:00.000", | | | "start": "2024-10-31 04:00:00.000" | | |} | | -------------------------------------------------------