Step 3. Flatten Data¶
FLATTEN is a table function that produces a lateral view of a VARIANT, OBJECT, or ARRAY column. In this step, you use this funtion to explore different levels of flattening.
Flatten array objects in a variant column¶
You can flatten the event objects in the events
array into separate rows
using the FLATTEN
function. The function output includes a
VALUE column that stores these individual events.
You can then use the LATERAL modifier to join the FLATTEN
function output
with any information outside of the object — in this example,
the device_type
and version
.
Query the data for each event:
SELECT src:device_type::string, src:version::String, VALUE FROM raw_source, LATERAL FLATTEN( INPUT => SRC:events );
The query returns the following result:
+-------------------------+---------------------+-------------------------------------------------------------------------------+ | SRC:DEVICE_TYPE::STRING | SRC:VERSION::STRING | VALUE | |-------------------------+---------------------+-------------------------------------------------------------------------------| | server | 2.6 | { | | | | "f": 83, | | | | "rv": "15219.64,783.63,48674.48,84679.52,27499.78,2178.83,0.42,74900.19", | | | | "t": 1437560931139, | | | | "v": { | | | | "ACHZ": 42869, | | | | "ACV": 709489, | | | | "DCA": 232, | | | | "DCV": 62287, | | | | "ENJR": 2599, | | | | "ERRS": 205, | | | | "MXEC": 487, | | | | "TMPI": 9 | | | | }, | | | | "vd": 54, | | | | "z": 1437644222811 | | | | } | | server | 2.6 | { | | | | "f": 1000083, | | | | "rv": "8070.52,54470.71,85331.27,9.10,70825.85,65191.82,46564.53,29422.22", | | | | "t": 1437036965027, | | | | "v": { | | | | "ACHZ": 6953, | | | | "ACV": 346795, | | | | "DCA": 250, | | | | "DCV": 46066, | | | | "ENJR": 9033, | | | | "ERRS": 615, | | | | "MXEC": 0, | | | | "TMPI": 112 | | | | }, | | | | "vd": 626, | | | | "z": 1437660796958 | | | | } | +-------------------------+---------------------+-------------------------------------------------------------------------------+
Use a CREATE TABLE AS statement to store the preceding query result in a table:
CREATE OR REPLACE TABLE flattened_source AS SELECT src:device_type::string AS device_type, src:version::string AS version, VALUE AS src FROM raw_source, LATERAL FLATTEN( INPUT => SRC:events );
Query the resulting table.
SELECT * FROM flattened_source;
The query returns the following result:
+-------------+---------+-------------------------------------------------------------------------------+ | DEVICE_TYPE | VERSION | SRC | |-------------+---------+-------------------------------------------------------------------------------| | server | 2.6 | { | | | | "f": 83, | | | | "rv": "15219.64,783.63,48674.48,84679.52,27499.78,2178.83,0.42,74900.19", | | | | "t": 1437560931139, | | | | "v": { | | | | "ACHZ": 42869, | | | | "ACV": 709489, | | | | "DCA": 232, | | | | "DCV": 62287, | | | | "ENJR": 2599, | | | | "ERRS": 205, | | | | "MXEC": 487, | | | | "TMPI": 9 | | | | }, | | | | "vd": 54, | | | | "z": 1437644222811 | | | | } | | server | 2.6 | { | | | | "f": 1000083, | | | | "rv": "8070.52,54470.71,85331.27,9.10,70825.85,65191.82,46564.53,29422.22", | | | | "t": 1437036965027, | | | | "v": { | | | | "ACHZ": 6953, | | | | "ACV": 346795, | | | | "DCA": 250, | | | | "DCV": 46066, | | | | "ENJR": 9033, | | | | "ERRS": 615, | | | | "MXEC": 0, | | | | "TMPI": 112 | | | | }, | | | | "vd": 626, | | | | "z": 1437660796958 | | | | } | +-------------+---------+-------------------------------------------------------------------------------+
Flatten object keys in separate columns¶
In the preceding example, you flattened the event objects in the events
array
into separate rows. The resulting flattened_source
table retained the event structure
in the src
column of the VARIANT type. One benefit of retaining the
event objects in the src
column of the VARIANT type is that when event format changes,
you don’t have to recreate and repopulate such tables. But you also have the option
copy individual keys in the event object into separate typed columns as shown
in the following query.
The following SELECT statement creates a new table, EVENTS, with the event object keys stored in separate columns. Each value is cast to a data type that is appropriate for the value, using a double-colon (::) followed by the type. If you omit the casting, the column assumes the VARIANT data type, which can hold any value:
create or replace table events as
select
src:device_type::string as device_type
, src:version::string as version
, value:f::number as f
, value:rv::variant as rv
, value:t::number as t
, value:v.ACHZ::number as achz
, value:v.ACV::number as acv
, value:v.DCA::number as dca
, value:v.DCV::number as dcv
, value:v.ENJR::number as enjr
, value:v.ERRS::number as errs
, value:v.MXEC::number as mxec
, value:v.TMPI::number as tmpi
, value:vd::number as vd
, value:z::number as z
from
raw_source
, lateral flatten ( input => SRC:events );
The statement flattens the nested data in the EVENTS.SRC:V key, adding a separate column for each value. The statement outputs a row for each key/value pair. The following output shows the first two records in the new EVENTS
table:
SELECT * FROM EVENTS;
+-------------+---------+---------+----------------------------------------------------------------------+---------------+-------+--------+-----+-------+------+------+------+------+-----+---------------+
| DEVICE_TYPE | VERSION | F | RV | T | ACHZ | ACV | DCA | DCV | ENJR | ERRS | MXEC | TMPI | VD | Z |
|-------------+---------+---------+----------------------------------------------------------------------+---------------+-------+--------+-----+-------+------+------+------+------+-----+---------------|
| server | 2.6 | 83 | "15219.64,783.63,48674.48,84679.52,27499.78,2178.83,0.42,74900.19" | 1437560931139 | 42869 | 709489 | 232 | 62287 | 2599 | 205 | 487 | 9 | 54 | 1437644222811 |
| server | 2.6 | 1000083 | "8070.52,54470.71,85331.27,9.10,70825.85,65191.82,46564.53,29422.22" | 1437036965027 | 6953 | 346795 | 250 | 46066 | 9033 | 615 | 0 | 112 | 626 | 1437660796958 |
+-------------+---------+---------+----------------------------------------------------------------------+---------------+-------+--------+-----+-------+------+------+------+------+-----+---------------+
Next: Step 4. Update Data