Step 4. Update Data¶
So far in this tutorial, you did the following:
Copied sample JSON event data from an S3 bucket into the
RAW_SOURCE
table and explored simple queries.You also explored the FLATTEN function to flatten the JSON data and obtain a relational representation of the data. For example, you extracted event keys and stored the keys in separate columns in another EVENTS table.
At the beginning, the tutorial explains the application scenario where multiple sources generate
events and a web endpoint saves it to your S3 bucket. As new events are added to the S3 bucket,
you might use a script to continuously copy new data into the RAW_SOURCE
table.
But how do insert only new event data into the EVENTS
table.
There are numerous ways to maintain data consistency. This section explains two options.
Use Primary Key Columns for Comparison¶
In this section you add a primary key to the EVENTS
table. The primary key then guarantees uniqueness.
Examine your JSON data for any values that are naturally unique and would be good candidates for a primary key. For example, assume that the combination of
src:device_type
andvalue:rv
can be a primary key. These two JSON keys correspond to theDEVICE_TYPE
andRV
columns in theEVENTS
table.Note
Snowflake does not enforce the primary key constraint. Rather, the constraint serves as metadata that identifies the natural key in the Information Schema.
Add the primary key constraint to the
EVENTS
table:ALTER TABLE events ADD CONSTRAINT pk_DeviceType PRIMARY KEY (device_type, rv);
Insert a new JSON event record into the
RAW_SOURCE
table:insert into raw_source select PARSE_JSON ('{ "device_type": "cell_phone", "events": [ { "f": 79, "rv": "786954.67,492.68,3577.48,40.11,343.00,345.8,0.22,8765.22", "t": 5769784730576, "v": { "ACHZ": 75846, "ACV": 098355, "DCA": 789, "DCV": 62287, "ENJR": 2234, "ERRS": 578, "MXEC": 999, "TMPI": 9 }, "vd": 54, "z": 1437644222811 } ], "version": 3.2 }');
Insert the new record that you added to the
RAW_SOURCE
table into theEVENTS
table based on a comparison of the primary key values:insert into events select src:device_type::string , src:version::string , value:f::number , value:rv::variant , value:t::number , value:v.ACHZ::number , value:v.ACV::number , value:v.DCA::number , value:v.DCV::number , value:v.ENJR::number , value:v.ERRS::number , value:v.MXEC::number , value:v.TMPI::number , value:vd::number , value:z::number from raw_source , lateral flatten( input => src:events ) where not exists (select 'x' from events where events.device_type = src:device_type and events.rv = value:rv);
Querying the
EVENTS
table shows the added row:select * from EVENTS;
The query returns the following result:
+-------------+---------+---------+----------------------------------------------------------------------+---------------+-------+--------+-----+-------+------+------+------+------+-----+---------------+ | DEVICE_TYPE | VERSION | F | RV | T | ACHZ | ACV | DCA | DCV | ENJR | ERRS | MXEC | TMPI | VD | Z | |-------------+---------+---------+----------------------------------------------------------------------+---------------+-------+--------+-----+-------+------+------+------+------+-----+---------------| | server | 2.6 | 83 | "15219.64,783.63,48674.48,84679.52,27499.78,2178.83,0.42,74900.19" | 1437560931139 | 42869 | 709489 | 232 | 62287 | 2599 | 205 | 487 | 9 | 54 | 1437644222811 | | server | 2.6 | 1000083 | "8070.52,54470.71,85331.27,9.10,70825.85,65191.82,46564.53,29422.22" | 1437036965027 | 6953 | 346795 | 250 | 46066 | 9033 | 615 | 0 | 112 | 626 | 1437660796958 | | cell_phone | 3.2 | 79 | "786954.67,492.68,3577.48,40.11,343.00,345.8,0.22,8765.22" | 5769784730576 | 75846 | 98355 | 789 | 62287 | 2234 | 578 | 999 | 9 | 54 | 1437644222811 | +-------------+---------+---------+----------------------------------------------------------------------+---------------+-------+--------+-----+-------+------+------+------+------+-----+---------------+
Use All Columns for Comparison¶
If the JSON data does not have fields that can be primary key candidates, you
could compare all repeating JSON keys in the RAW_SOURCE
table with the
corresponding column values in the EVENTS
table.
No changes to your existing EVENTS
table are required.
Insert a new JSON event record into the
RAW_SOURCE
table:insert into raw_source select parse_json ('{ "device_type": "web_browser", "events": [ { "f": 79, "rv": "122375.99,744.89,386.99,12.45,78.08,43.7,9.22,8765.43", "t": 5769784730576, "v": { "ACHZ": 768436, "ACV": 9475, "DCA": 94835, "DCV": 88845, "ENJR": 8754, "ERRS": 567, "MXEC": 823, "TMPI": 0 }, "vd": 55, "z": 8745598047355 } ], "version": 8.7 }');
Insert the new record in the
RAW_SOURCE
table into theEVENTS
table based on a comparison of all repeating key values:insert into events select src:device_type::string , src:version::string , value:f::number , value:rv::variant , value:t::number , value:v.ACHZ::number , value:v.ACV::number , value:v.DCA::number , value:v.DCV::number , value:v.ENJR::number , value:v.ERRS::number , value:v.MXEC::number , value:v.TMPI::number , value:vd::number , value:z::number from raw_source , lateral flatten( input => src:events ) where not exists (select 'x' from events where events.device_type = src:device_type and events.version = src:version and events.f = value:f and events.rv = value:rv and events.t = value:t and events.achz = value:v.ACHZ and events.acv = value:v.ACV and events.dca = value:v.DCA and events.dcv = value:v.DCV and events.enjr = value:v.ENJR and events.errs = value:v.ERRS and events.mxec = value:v.MXEC and events.tmpi = value:v.TMPI and events.vd = value:vd and events.z = value:z);
Querying the
EVENTS
table shows the added row:select * from EVENTS;
The query returns the following result:
+-------------+---------+---------+----------------------------------------------------------------------+---------------+--------+--------+-------+-------+------+------+------+------+-----+---------------+ | DEVICE_TYPE | VERSION | F | RV | T | ACHZ | ACV | DCA | DCV | ENJR | ERRS | MXEC | TMPI | VD | Z | |-------------+---------+---------+----------------------------------------------------------------------+---------------+--------+--------+-------+-------+------+------+------+------+-----+---------------| | server | 2.6 | 83 | "15219.64,783.63,48674.48,84679.52,27499.78,2178.83,0.42,74900.19" | 1437560931139 | 42869 | 709489 | 232 | 62287 | 2599 | 205 | 487 | 9 | 54 | 1437644222811 | | server | 2.6 | 1000083 | "8070.52,54470.71,85331.27,9.10,70825.85,65191.82,46564.53,29422.22" | 1437036965027 | 6953 | 346795 | 250 | 46066 | 9033 | 615 | 0 | 112 | 626 | 1437660796958 | | cell_phone | 3.2 | 79 | "786954.67,492.68,3577.48,40.11,343.00,345.8,0.22,8765.22" | 5769784730576 | 75846 | 98355 | 789 | 62287 | 2234 | 578 | 999 | 9 | 54 | 1437644222811 | | web_browser | 8.7 | 79 | "122375.99,744.89,386.99,12.45,78.08,43.7,9.22,8765.43" | 5769784730576 | 768436 | 9475 | 94835 | 88845 | 8754 | 567 | 823 | 0 | 55 | 8745598047355 | +-------------+---------+---------+----------------------------------------------------------------------+---------------+--------+--------+-------+-------+------+------+------+------+-----+---------------+
Next: Step 5. Clean Up