- Categories:
RESAMPLE¶
Returns a data set that includes both input rows and generated rows for missing data points, given a user-defined time-based granularity.
Syntax¶
FROM <object_reference> [ [ AS ] <alias_name> ]
RESAMPLE(
USING <time_series_column>
INCREMENT BY <time_series_constant>
[ PARTITION BY <partition_column> [ , ... ] ]
[ METADATA_COLUMNS
{ IS_GENERATED() | BUCKET_START() } [ [ AS ] <alias_name> ] [ , ... ] ]
)
Required parameters¶
FROM object_reference
Specifies the name of a table or another object reference that contains the input data set, such as a subquery. For more information about object references, see FROM.
USING time_series_column
Specifies the column that contains time-based values in the time series. The column must be a date and time data type or a numeric data type. For example, UNIX timestamp values might be stored in NUMBER(38,0) columns, where
1743447600
is equivalent to2025-3-31 12:00:00
.INCREMENT BY time_series_constant
Specifies an INTERVAL constant or a numeric constant, depending on the data type of the USING column. This constant represents the width of each time interval. The slices are aligned relative to midnight on January 1, 1970 (
1970-01-01 00:00:00
). The TIME_SLICE function uses the same alignment; for more information, see the TIME_SLICE usage notes.When the USING parameter specifies a date or time column, the INCREMENT BY expression must be an INTERVAL constant.
When the USING parameter specifies a numeric column, the INCREMENT BY expression must also be numeric.
The starting point for a generated time series is based on the minimum time of the
time_series_constant
.If this constant is a numeric constant, it must be positive (greater than 0).
Optional parameters¶
[ AS ] alias_name
Specifies an alternative name for the object reference. The alias can be used in any other subclause within the FROM clause. Alias names must follow the rules for Object identifiers.
PARTITION BY partition_column
Partitions the result set on one or more input columns and generates new rows within each partition.
METADATA_COLUMNS {function} [ [ AS ] {alias_name} ]
Adds one or more metadata columns to the resampled result set. To add the columns, call one or both of the following functions:
IS_GENERATED()
Adds an
is_generated
column to the result set that marks which rows are new (generated by the RESAMPLE operation) and which rows already existed.BUCKET_START()
Adds a
bucket_start
column to the result set. This column returns the value that marks the beginning of the current bucket or interval that the RESAMPLE operation produces, based on the values in the column specified in the USING clause. You can use the BUCKET_START column to identify which interval a particular row belongs to after resampling.
If you specify both metadata columns, separate them with a comma.
Generated columns can have aliases. Alias names must follow the rules for Object identifiers.
Usage notes¶
An INTERVAL constant in the INCREMENT BY clause has the following requirements:
The constant must be equal to or greater than one
second
. Smaller units (millisecond
,microsecond
,nanosecond
) aren’t supported.When the USING column is a DATE data type, you can’t specify a unit in the interval that is more granular than
day
(hour
,minute
,second
). For example, the constants'INTERVAL 1 day, 2 hours'
and'INTERVAL 25 hours'
aren’t allowed.To avoid ambiguity, certain date parts can’t be mixed. The supported date parts fall into three discrete groups:
year
,quarter
,month
week
day
,hour
,minute
,second
For example, the following intervals, which cross these group boundaries, aren’t allowed:
INTERVAL '1 week, 3 days'
INTERVAL '2 weeks, 12 hours'
INTERVAL '3 months, 1 week'
With respect to joins, the RESAMPLE construct works in a similar way to the SAMPLE / TABLESAMPLE construct. Resampling applies to only one table, not all preceding tables or the entire expression prior to the RESAMPLE clause. To resample the result of a join, use a subquery for the join, then resample the resulting table. See Sampling with joins.
The RESAMPLE clause is evaluated before WHERE clause conditions are applied. If you want to resample a filtered data set, filter it first (for example, by creating a new table that you can resample or by using a subquery that is computed first inside the main RESAMPLE query). The following query resamples the whole table, then discards everything but the rows for
Atlanta
andBoston
.SELECT * FROM heavy_weather RESAMPLE( USING start_time INCREMENT BY INTERVAL '1 day') WHERE city IN('Atlanta','Boston') ORDER BY start_time, city, county;
A potential rewrite with a subquery would be:
SELECT * FROM (SELECT * FROM heavy_weather WHERE city IN('Atlanta','Boston')) RESAMPLE( USING start_time INCREMENT BY INTERVAL '1 day') ORDER BY start_time, city, county;
Examples¶
The following examples show how to use the RESAMPLE construct in queries.
RESAMPLE example that uses a numeric column¶
The following example has a UNIX timestamp in the source table. This numeric column is specified in the RESAMPLE clause as the USING column. Create and load the following table:
CREATE OR REPLACE TABLE sensor_data_unixtime (device_id VARCHAR(10), unixtime NUMBER(38,0), avg_temp NUMBER(6,4), vibration NUMBER (5,4), motor_rpm INT);
INSERT INTO sensor_data_unixtime VALUES
('DEVICE3', 1696150802, 36.1103, 0.4226, 1560),
('DEVICE3', 1696150803, 35.2987, 0.4326, 1561),
('DEVICE3', 1696150804, 40.0001, 0.3221, 1562),
('DEVICE3', 1696150805, 38.0422, 0.3333, 1589),
('DEVICE3', 1696150807, 33.1524, 0.4865, 1499),
('DEVICE3', 1696150808, 32.0422, 0.4221, 1498),
('DEVICE3', 1696150809, 31.1519, 0.4751, 1600),
('DEVICE3', 1696150810, 29.1524, 0.4639, 1605),
('DEVICE3', 1696150812, 35.2987, 0.4336, 1585),
('DEVICE3', 1696150813, 40.0000, 0.4226, 1560)
;
Now run the following RESAMPLE query:
SELECT * FROM sensor_data_unixtime
RESAMPLE(USING unixtime INCREMENT BY 1) ORDER BY unixtime;
+-----------+------------+----------+-----------+-----------+
| DEVICE_ID | UNIXTIME | AVG_TEMP | VIBRATION | MOTOR_RPM |
|-----------+------------+----------+-----------+-----------|
| DEVICE3 | 1696150802 | 36.1103 | 0.4226 | 1560 |
| DEVICE3 | 1696150803 | 35.2987 | 0.4326 | 1561 |
| DEVICE3 | 1696150804 | 40.0001 | 0.3221 | 1562 |
| DEVICE3 | 1696150805 | 38.0422 | 0.3333 | 1589 |
| DEVICE3 | 1696150806 | NULL | NULL | NULL |
| DEVICE3 | 1696150807 | 33.1524 | 0.4865 | 1499 |
| DEVICE3 | 1696150808 | 32.0422 | 0.4221 | 1498 |
| DEVICE3 | 1696150809 | 31.1519 | 0.4751 | 1600 |
| DEVICE3 | 1696150810 | 29.1524 | 0.4639 | 1605 |
| DEVICE3 | 1696150811 | NULL | NULL | NULL |
| DEVICE3 | 1696150812 | 35.2987 | 0.4336 | 1585 |
| DEVICE3 | 1696150813 | 40.0000 | 0.4226 | 1560 |
+-----------+------------+----------+-----------+-----------+
The following query fails because the INCREMENT BY expression must be a positive numeric constant when the USING column is numeric:
SELECT * FROM sensor_data_unixtime
RESAMPLE(USING unixtime INCREMENT BY INTERVAL '1 second') ORDER BY unixtime;
009954 (42601): SQL compilation error:
RESAMPLE INCREMENT BY has to be numeric type when USING parameter is numeric.
RESAMPLE example that returns generated rows only¶
The following example resamples the march_temps
table (as created in Using the RESAMPLE clause) and includes
metadata columns named generated_row
and bucket_start
in the result:
CREATE OR REPLACE TABLE march_temps_every_five_mins AS
SELECT * FROM march_temps
RESAMPLE(
USING observed
INCREMENT BY INTERVAL '5 minutes'
PARTITION BY city, county
METADATA_COLUMNS IS_GENERATED() AS generated_row, BUCKET_START()
)
ORDER BY observed;
The following query returns only the generated rows from the march_temps_every_five_mins
table:
SELECT * FROM march_temps_every_five_mins
WHERE generated_row = 'True';
+-------------------------+-------------+------------------+----------------+---------------+-------------------------+
| OBSERVED | TEMPERATURE | CITY | COUNTY | GENERATED_ROW | BUCKET_START |
|-------------------------+-------------+------------------+----------------+---------------+-------------------------|
| 2025-03-15 09:45:00.000 | NULL | Big Bear City | San Bernardino | True | 2025-03-15 09:45:00.000 |
| 2025-03-15 09:50:00.000 | NULL | Big Bear City | San Bernardino | True | 2025-03-15 09:50:00.000 |
| 2025-03-15 10:00:00.000 | NULL | South Lake Tahoe | El Dorado | True | 2025-03-15 10:00:00.000 |
| 2025-03-15 10:00:00.000 | NULL | Big Bear City | San Bernardino | True | 2025-03-15 10:00:00.000 |
| 2025-03-15 10:05:00.000 | NULL | South Lake Tahoe | El Dorado | True | 2025-03-15 10:05:00.000 |
| 2025-03-15 10:05:00.000 | NULL | Big Bear City | San Bernardino | True | 2025-03-15 10:05:00.000 |
| 2025-03-15 10:15:00.000 | NULL | Big Bear City | San Bernardino | True | 2025-03-15 10:15:00.000 |
+-------------------------+-------------+------------------+----------------+---------------+-------------------------+
RESAMPLE example that uses BUCKET_START() to aggregate resampled rows¶
The following example uses the bucket_start
metadata column to aggregate resampled rows. The query counts the number of observations
per city that have the same bucket start time, given a resampled result set that is incremented by a 1-day interval. To run this
example, create the march_temps
table, as described in Using the RESAMPLE clause.
SELECT bucket_start, county, COUNT(*)
FROM march_temps
RESAMPLE(
USING observed
INCREMENT BY INTERVAL '1 day'
METADATA_COLUMNS IS_GENERATED(), BUCKET_START()
)
WHERE IS_GENERATED = 'False'
GROUP BY bucket_start, county;
+-------------------------+----------------+----------+
| BUCKET_START | COUNTY | COUNT(*) |
|-------------------------+----------------+----------|
| 2025-03-15 00:00:00.000 | El Dorado | 4 |
| 2025-03-15 00:00:00.000 | San Bernardino | 4 |
+-------------------------+----------------+----------+
RESAMPLE example that uses BUCKET_START() to filter out non-uniform rows¶
You can use the bucket_start
metadata column to filter out non-uniform data from a resampled result set. For example:
SELECT *
FROM march_temps
RESAMPLE(
USING observed
INCREMENT BY INTERVAL '5 minutes'
METADATA_COLUMNS BUCKET_START() AS bucket_first_row
)
WHERE observed = bucket_first_row
ORDER BY observed;
This query resamples the table, then removes two original rows that don’t conform to the 5-minute interval (those with
values 09:49:00
and 10:18:00
).