snowflake.snowpark.functions.window¶

snowflake.snowpark.functions.window(time_column: Union[Column, str], window_duration: str, slide_duration: Optional[str] = None, start_time: Optional[str] = None) → Column[source]¶

Converts a time column into a window object with start and end times. Window start times are inclusive while end times are exclusive. For example 9:30 is in the window [9:30, 10:00), but not [9:00, 9:30).

Parameters:
  • time_column – The column to apply the window transformation to.

  • window_duration – An interval string that determines the length of each window.

  • slide_duration – An interval string representing the amount of time in-between the start of each window. Note that this parameter is not supported yet. Specifying it will raise a NotImplementedError exception.

  • start_time – An interval string representing the amount of time the start of each window is offset. eg. a five minute window with start_time of ‘2 minutes’ will be from [9:02, 9:07) instead of [9:00, 9:05)

Note

Interval strings are of the form ‘quantity unit’ where quantity is an integer and unitis is a supported time unit. This function supports the same time units as dateadd. see supported time units for more information.

Example:

>>> import datetime
>>> from snowflake.snowpark.functions import window
>>> df = session.createDataFrame(
...      [(datetime.datetime.strptime("2024-10-31 09:05:00.000", "%Y-%m-%d %H:%M:%S.%f"),)],
...      schema=["time"]
... )
>>> df.select(window(df.time, "5 minutes")).show()
----------------------------------------
|"WINDOW"                              |
----------------------------------------
|{                                     |
|  "end": "2024-10-31 09:10:00.000",   |
|  "start": "2024-10-31 09:05:00.000"  |
|}                                     |
----------------------------------------

>>> df.select(window(df.time, "5 minutes", start_time="2 minutes")).show()
----------------------------------------
|"WINDOW"                              |
----------------------------------------
|{                                     |
|  "end": "2024-10-31 09:07:00.000",   |
|  "start": "2024-10-31 09:02:00.000"  |
|}                                     |
----------------------------------------
Copy

Example:

>>> import datetime
>>> from snowflake.snowpark.functions import sum, window
>>> df = session.createDataFrame([
...         (datetime.datetime(2024, 10, 31, 1, 0, 0), 1),
...         (datetime.datetime(2024, 10, 31, 2, 0, 0), 1),
...         (datetime.datetime(2024, 10, 31, 3, 0, 0), 1),
...         (datetime.datetime(2024, 10, 31, 4, 0, 0), 1),
...         (datetime.datetime(2024, 10, 31, 5, 0, 0), 1),
...     ], schema=["time", "value"]
... )
>>> df.group_by(window(df.time, "2 hours")).agg(sum(df.value)).show()
-------------------------------------------------------
|"WINDOW"                              |"SUM(VALUE)"  |
-------------------------------------------------------
|{                                     |1             |
|  "end": "2024-10-31 02:00:00.000",   |              |
|  "start": "2024-10-31 00:00:00.000"  |              |
|}                                     |              |
|{                                     |2             |
|  "end": "2024-10-31 04:00:00.000",   |              |
|  "start": "2024-10-31 02:00:00.000"  |              |
|}                                     |              |
|{                                     |2             |
|  "end": "2024-10-31 06:00:00.000",   |              |
|  "start": "2024-10-31 04:00:00.000"  |              |
|}                                     |              |
-------------------------------------------------------
Copy