Categories:: Query syntax

RESAMPLE¶

Returns a data set that includes both input rows and generated rows for missing data points, given a user-defined time-based granularity.

Syntax¶

FROM <object_reference> [ [ AS ] <alias_name> ]
  RESAMPLE(
    USING <time_series_column>
    INCREMENT BY <time_series_constant>
    [ PARTITION BY <partition_column> [ , ... ] ]
    [ METADATA_COLUMNS
        { IS_GENERATED() | BUCKET_START() } [ [ AS ] <alias_name> ] [ , ... ] ]
    )

Copy

Required parameters¶

FROM object_reference

Specifies the name of a table or another object reference that contains the input data set, such as a subquery. For more information about object references, see FROM.

USING time_series_column

Specifies the column that contains time-based values in the time series. The column must be a date and time data type or a numeric data type. For example, UNIX timestamp values might be stored in NUMBER(38,0) columns, where 1743447600 is equivalent to 2025-3-31 12:00:00.

INCREMENT BY time_series_constant

Specifies an INTERVAL constant or a numeric constant, depending on the data type of the USING column. This constant represents the width of each time interval. The slices are aligned relative to midnight on January 1, 1970 (1970-01-01 00:00:00). The TIME_SLICE function uses the same alignment; for more information, see the TIME_SLICE usage notes.

When the USING parameter specifies a date or time column, the INCREMENT BY expression must be an INTERVAL constant.
When the USING parameter specifies a numeric column, the INCREMENT BY expression must also be numeric.

The starting point for a generated time series is based on the minimum time of the time_series_constant.

If this constant is a numeric constant, it must be positive (greater than 0).

Optional parameters¶

[ AS ] alias_name

Specifies an alternative name for the object reference. The alias can be used in any other subclause within the FROM clause. Alias names must follow the rules for Object identifiers.

PARTITION BY partition_column

Partitions the result set on one or more input columns and generates new rows within each partition.

METADATA_COLUMNS {function} [ [ AS ] {alias_name} ]

Adds one or more metadata columns to the resampled result set. To add the columns, call one or both of the following functions:

IS_GENERATED(): Adds an is_generated column to the result set that marks which rows are new (generated by the RESAMPLE operation) and which rows already existed.
BUCKET_START(): Adds a bucket_start column to the result set. This column returns the value that marks the beginning of the current bucket or interval that the RESAMPLE operation produces, based on the values in the column specified in the USING clause. You can use the BUCKET_START column to identify which interval a particular row belongs to after resampling.

If you specify both metadata columns, separate them with a comma.

Generated columns can have aliases. Alias names must follow the rules for Object identifiers.

Usage notes¶

An INTERVAL constant in the INCREMENT BY clause has the following requirements:
- The constant must be equal to or greater than one second. Smaller units (millisecond, microsecond, nanosecond) aren’t supported.
- When the USING column is a DATE data type, you can’t specify a unit in the interval that is more granular than day (hour, minute, second). For example, the constants 'INTERVAL 1 day, 2 hours' and 'INTERVAL 25 hours' aren’t allowed.
- To avoid ambiguity, certain date parts can’t be mixed. The supported date parts fall into three discrete groups:
  - year, quarter, month
  - week
  - day, hour, minute, second
  For example, the following intervals, which cross these group boundaries, aren’t allowed:
  - INTERVAL '1 week, 3 days'
  - INTERVAL '2 weeks, 12 hours'
  - INTERVAL '3 months, 1 week'

With respect to joins, the RESAMPLE construct works in a similar way to the SAMPLE / TABLESAMPLE construct. Resampling applies to only one table, not all preceding tables or the entire expression prior to the RESAMPLE clause. To resample the result of a join, use a subquery for the join, then resample the resulting table. See Sampling with joins.

The RESAMPLE clause is evaluated before WHERE clause conditions are applied. If you want to resample a filtered data set, filter it first (for example, by creating a new table that you can resample or by using a subquery that is computed first inside the main RESAMPLE query). The following query resamples the whole table, then discards everything but the rows for Atlanta and Boston.

SELECT *
  FROM heavy_weather
    RESAMPLE(
      USING start_time
      INCREMENT BY INTERVAL '1 day')
  WHERE city IN('Atlanta','Boston')
  ORDER BY start_time, city, county;

Copy

A potential rewrite with a subquery would be:

SELECT *
  FROM (SELECT * FROM heavy_weather WHERE city IN('Atlanta','Boston'))
    RESAMPLE(
      USING start_time
      INCREMENT BY INTERVAL '1 day')
  ORDER BY start_time, city, county;

Copy

Examples¶

The following examples show how to use the RESAMPLE construct in queries.

RESAMPLE example that uses a numeric column¶

The following example has a UNIX timestamp in the source table. This numeric column is specified in the RESAMPLE clause as the USING column. Create and load the following table:

CREATE OR REPLACE TABLE sensor_data_unixtime (device_id VARCHAR(10), unixtime NUMBER(38,0), avg_temp NUMBER(6,4), vibration NUMBER (5,4), motor_rpm INT);

INSERT INTO sensor_data_unixtime VALUES
  ('DEVICE3', 1696150802, 36.1103, 0.4226, 1560),
  ('DEVICE3', 1696150803, 35.2987, 0.4326, 1561),
  ('DEVICE3', 1696150804, 40.0001, 0.3221, 1562),
  ('DEVICE3', 1696150805, 38.0422, 0.3333, 1589),
  ('DEVICE3', 1696150807, 33.1524, 0.4865, 1499),
  ('DEVICE3', 1696150808, 32.0422, 0.4221, 1498),
  ('DEVICE3', 1696150809, 31.1519, 0.4751, 1600),
  ('DEVICE3', 1696150810, 29.1524, 0.4639, 1605),
  ('DEVICE3', 1696150812, 35.2987, 0.4336, 1585),
  ('DEVICE3', 1696150813, 40.0000, 0.4226, 1560)
;

Copy

Now run the following RESAMPLE query:

SELECT * FROM sensor_data_unixtime
  RESAMPLE(USING unixtime INCREMENT BY 1) ORDER BY unixtime;

Copy

+-----------+------------+----------+-----------+-----------+
| DEVICE_ID |   UNIXTIME | AVG_TEMP | VIBRATION | MOTOR_RPM |
|-----------+------------+----------+-----------+-----------|
| DEVICE3   | 1696150802 |  36.1103 |    0.4226 |      1560 |
| DEVICE3   | 1696150803 |  35.2987 |    0.4326 |      1561 |
| DEVICE3   | 1696150804 |  40.0001 |    0.3221 |      1562 |
| DEVICE3   | 1696150805 |  38.0422 |    0.3333 |      1589 |
| DEVICE3   | 1696150806 |     NULL |      NULL |      NULL |
| DEVICE3   | 1696150807 |  33.1524 |    0.4865 |      1499 |
| DEVICE3   | 1696150808 |  32.0422 |    0.4221 |      1498 |
| DEVICE3   | 1696150809 |  31.1519 |    0.4751 |      1600 |
| DEVICE3   | 1696150810 |  29.1524 |    0.4639 |      1605 |
| DEVICE3   | 1696150811 |     NULL |      NULL |      NULL |
| DEVICE3   | 1696150812 |  35.2987 |    0.4336 |      1585 |
| DEVICE3   | 1696150813 |  40.0000 |    0.4226 |      1560 |
+-----------+------------+----------+-----------+-----------+

The following query fails because the INCREMENT BY expression must be a positive numeric constant when the USING column is numeric:

SELECT * FROM sensor_data_unixtime
  RESAMPLE(USING unixtime INCREMENT BY INTERVAL '1 second') ORDER BY unixtime;

Copy

009954 (42601): SQL compilation error:
RESAMPLE INCREMENT BY has to be numeric type when USING parameter is numeric.

RESAMPLE example that returns generated rows only¶

The following example resamples the march_temps table (as created in Using the RESAMPLE clause) and includes metadata columns named generated_row and bucket_start in the result:

CREATE OR REPLACE TABLE march_temps_every_five_mins AS
  SELECT * FROM march_temps
    RESAMPLE(
      USING observed
      INCREMENT BY INTERVAL '5 minutes'
      PARTITION BY city, county
      METADATA_COLUMNS IS_GENERATED() AS generated_row, BUCKET_START()
      )
  ORDER BY observed;

Copy

The following query returns only the generated rows from the march_temps_every_five_mins table:

SELECT * FROM march_temps_every_five_mins
  WHERE generated_row = 'True';

Copy

+-------------------------+-------------+------------------+----------------+---------------+-------------------------+
| OBSERVED                | TEMPERATURE | CITY             | COUNTY         | GENERATED_ROW | BUCKET_START            |
|-------------------------+-------------+------------------+----------------+---------------+-------------------------|
| 2025-03-15 09:45:00.000 |        NULL | Big Bear City    | San Bernardino | True          | 2025-03-15 09:45:00.000 |
| 2025-03-15 09:50:00.000 |        NULL | Big Bear City    | San Bernardino | True          | 2025-03-15 09:50:00.000 |
| 2025-03-15 10:00:00.000 |        NULL | South Lake Tahoe | El Dorado      | True          | 2025-03-15 10:00:00.000 |
| 2025-03-15 10:00:00.000 |        NULL | Big Bear City    | San Bernardino | True          | 2025-03-15 10:00:00.000 |
| 2025-03-15 10:05:00.000 |        NULL | South Lake Tahoe | El Dorado      | True          | 2025-03-15 10:05:00.000 |
| 2025-03-15 10:05:00.000 |        NULL | Big Bear City    | San Bernardino | True          | 2025-03-15 10:05:00.000 |
| 2025-03-15 10:15:00.000 |        NULL | Big Bear City    | San Bernardino | True          | 2025-03-15 10:15:00.000 |
+-------------------------+-------------+------------------+----------------+---------------+-------------------------+

RESAMPLE example that uses BUCKET_START() to aggregate resampled rows¶

The following example uses the bucket_start metadata column to aggregate resampled rows. The query counts the number of observations per city that have the same bucket start time, given a resampled result set that is incremented by a 1-day interval. To run this example, create the march_temps table, as described in Using the RESAMPLE clause.

SELECT bucket_start, county, COUNT(*)
  FROM march_temps
    RESAMPLE(
      USING observed
      INCREMENT BY INTERVAL '1 day'
      METADATA_COLUMNS IS_GENERATED(), BUCKET_START()
      )
  WHERE IS_GENERATED = 'False'
  GROUP BY bucket_start, county;

Copy

+-------------------------+----------------+----------+
| BUCKET_START            | COUNTY         | COUNT(*) |
|-------------------------+----------------+----------|
| 2025-03-15 00:00:00.000 | El Dorado      |        4 |
| 2025-03-15 00:00:00.000 | San Bernardino |        4 |
+-------------------------+----------------+----------+

RESAMPLE example that uses BUCKET_START() to filter out non-uniform rows¶

You can use the bucket_start metadata column to filter out non-uniform data from a resampled result set. For example:

SELECT *
  FROM march_temps
    RESAMPLE(
      USING observed
      INCREMENT BY INTERVAL '5 minutes'
      METADATA_COLUMNS BUCKET_START() AS bucket_first_row
      )
  WHERE observed = bucket_first_row
  ORDER BY observed;

Copy

This query resamples the table, then removes two original rows that don’t conform to the 5-minute interval (those with values 09:49:00 and 10:18:00).