Step 3. Flatten Data¶
FLATTEN is a table function that produces a lateral view of a VARIANT, OBJECT, or ARRAY column. In this step, we’ll use the function to create two tables with different levels of flattening.
Maintaining Object Data in a Variant Column¶
You can use a LATERAL JOIN and a FLATTEN function to separate events into individual JSON objects while preserving the global data. FLATTEN returns a row for each object, and the LATERAL modifier joins the data with any information outside of the object — in this example, the device type and version. The function outputs a VALUE
column that includes the value of the flattened object.
Query the data for each event:
SELECT src:device_type::string , src:version::String , VALUE FROM raw_source , LATERAL FLATTEN( INPUT => SRC:events );
Snowflake returns the following output:
+-------------------------+---------------------+-------------------------------------------------------------------------------+ | SRC:DEVICE_TYPE::STRING | SRC:VERSION::STRING | VALUE | |-------------------------+---------------------+-------------------------------------------------------------------------------| | server | 2.6 | { | | | | "f": 83, | | | | "rv": "15219.64,783.63,48674.48,84679.52,27499.78,2178.83,0.42,74900.19", | | | | "t": 1437560931139, | | | | "v": { | | | | "ACHZ": 42869, | | | | "ACV": 709489, | | | | "DCA": 232, | | | | "DCV": 62287, | | | | "ENJR": 2599, | | | | "ERRS": 205, | | | | "MXEC": 487, | | | | "TMPI": 9 | | | | }, | | | | "vd": 54, | | | | "z": 1437644222811 | | | | } | | server | 2.6 | { | | | | "f": 1000083, | | | | "rv": "8070.52,54470.71,85331.27,9.10,70825.85,65191.82,46564.53,29422.22", | | | | "t": 1437036965027, | | | | "v": { | | | | "ACHZ": 6953, | | | | "ACV": 346795, | | | | "DCA": 250, | | | | "DCV": 46066, | | | | "ENJR": 9033, | | | | "ERRS": 615, | | | | "MXEC": 0, | | | | "TMPI": 112 | | | | }, | | | | "vd": 626, | | | | "z": 1437660796958 | | | | } | +-------------------------+---------------------+-------------------------------------------------------------------------------+
Use a CREATE TABLE AS statement to store the above results in a table:
CREATE OR REPLACE TABLE flattened_source AS SELECT src:device_type::string AS device_type , src:version::string AS version , VALUE AS src FROM raw_source , LATERAL FLATTEN( INPUT => SRC:events ); SELECT * FROM flattened_source; +-------------+---------+-------------------------------------------------------------------------------+ | DEVICE_TYPE | VERSION | SRC | |-------------+---------+-------------------------------------------------------------------------------| | server | 2.6 | { | | | | "f": 83, | | | | "rv": "15219.64,783.63,48674.48,84679.52,27499.78,2178.83,0.42,74900.19", | | | | "t": 1437560931139, | | | | "v": { | | | | "ACHZ": 42869, | | | | "ACV": 709489, | | | | "DCA": 232, | | | | "DCV": 62287, | | | | "ENJR": 2599, | | | | "ERRS": 205, | | | | "MXEC": 487, | | | | "TMPI": 9 | | | | }, | | | | "vd": 54, | | | | "z": 1437644222811 | | | | } | | server | 2.6 | { | | | | "f": 1000083, | | | | "rv": "8070.52,54470.71,85331.27,9.10,70825.85,65191.82,46564.53,29422.22", | | | | "t": 1437036965027, | | | | "v": { | | | | "ACHZ": 6953, | | | | "ACV": 346795, | | | | "DCA": 250, | | | | "DCV": 46066, | | | | "ENJR": 9033, | | | | "ERRS": 615, | | | | "MXEC": 0, | | | | "TMPI": 112 | | | | }, | | | | "vd": 626, | | | | "z": 1437660796958 | | | | } | +-------------+---------+-------------------------------------------------------------------------------+
Extracting Key-Values to Separate Columns¶
As shown in this tutorial, Snowflake supports querying JSON directly. One benefit of retaining the JSON structure is that when formats change, you don’t have to recreate and re-populate large tables. However, you can optionally copy JSON key values into typed columns.
The following statement creates a new table, EVENTS
, and copies the values for each event in the RAW_SOURCE
table into separate columns. Each value is cast to a data type that is appropriate for the value, using a double-colon (::
) followed by the type. If you omit the casting, the column assumes the VARIANT data type, which can hold any value:
create or replace table events as select src:device_type::string as device_type , src:version::string as version , value:f::number as f , value:rv::variant as rv , value:t::number as t , value:v.ACHZ::number as achz , value:v.ACV::number as acv , value:v.DCA::number as dca , value:v.DCV::number as dcv , value:v.ENJR::number as enjr , value:v.ERRS::number as errs , value:v.MXEC::number as mxec , value:v.TMPI::number as tmpi , value:vd::number as vd , value:z::number as z from raw_source , lateral flatten ( input => SRC:events );
The statement flattens the nested data in the EVENTS.SRC:V key, adding a separate column for each value. The statement outputs a row for each key/value pair. The following output shows the first two records in the new EVENTS
table:
SELECT * FROM EVENTS; +-------------+---------+---------+----------------------------------------------------------------------+---------------+-------+--------+-----+-------+------+------+------+------+-----+---------------+ | DEVICE_TYPE | VERSION | F | RV | T | ACHZ | ACV | DCA | DCV | ENJR | ERRS | MXEC | TMPI | VD | Z | |-------------+---------+---------+----------------------------------------------------------------------+---------------+-------+--------+-----+-------+------+------+------+------+-----+---------------| | server | 2.6 | 83 | "15219.64,783.63,48674.48,84679.52,27499.78,2178.83,0.42,74900.19" | 1437560931139 | 42869 | 709489 | 232 | 62287 | 2599 | 205 | 487 | 9 | 54 | 1437644222811 | | server | 2.6 | 1000083 | "8070.52,54470.71,85331.27,9.10,70825.85,65191.82,46564.53,29422.22" | 1437036965027 | 6953 | 346795 | 250 | 46066 | 9033 | 615 | 0 | 112 | 626 | 1437660796958 | +-------------+---------+---------+----------------------------------------------------------------------+---------------+-------+--------+-----+-------+------+------+------+------+-----+---------------+
Next: Step 4. Update Data