Step 4. Update Data

So far in this tutorial, you did the following:

  • Copied sample JSON event data from an S3 bucket into the RAW_SOURCE table and explored simple queries.

  • You also explored the FLATTEN function to flatten the JSON data and obtain a relational representation of the data. For example, you extracted event keys and stored the keys in separate columns in another EVENTS table.

At the beginning, the tutorial explains the application scenario where multiple sources generate events and a web endpoint saves it to your S3 bucket. As new events are added to the S3 bucket, you might use a script to continuously copy new data into the RAW_SOURCE table. But how do insert only new event data into the EVENTS table.

There are numerous ways to maintain data consistency. This section explains two options.

Use Primary Key Columns for Comparison

In this section you add a primary key to the EVENTS table. The primary key then guarantees uniqueness.

  1. Examine your JSON data for any values that are naturally unique and would be good candidates for a primary key. For example, assume that the combination of src:device_type and value:rv can be a primary key. These two JSON keys correspond to the DEVICE_TYPE and RV columns in the EVENTS table.

    Note

    Snowflake does not enforce the primary key constraint. Rather, the constraint serves as metadata that identifies the natural key in the Information Schema.

  2. Add the primary key constraint to the EVENTS table:

    ALTER TABLE events ADD CONSTRAINT pk_DeviceType PRIMARY KEY (device_type, rv);
    
    Copy
  3. Insert a new JSON event record into the RAW_SOURCE table:

    insert into raw_source
      select
      PARSE_JSON ('{
        "device_type": "cell_phone",
        "events": [
          {
            "f": 79,
            "rv": "786954.67,492.68,3577.48,40.11,343.00,345.8,0.22,8765.22",
            "t": 5769784730576,
            "v": {
              "ACHZ": 75846,
              "ACV": 098355,
              "DCA": 789,
              "DCV": 62287,
              "ENJR": 2234,
              "ERRS": 578,
              "MXEC": 999,
              "TMPI": 9
            },
            "vd": 54,
            "z": 1437644222811
          }
        ],
        "version": 3.2
      }');
    
    Copy
  4. Insert the new record that you added to the RAW_SOURCE table into the EVENTS table based on a comparison of the primary key values:

    insert into events
    select
          src:device_type::string
        , src:version::string
        , value:f::number
        , value:rv::variant
        , value:t::number
        , value:v.ACHZ::number
        , value:v.ACV::number
        , value:v.DCA::number
        , value:v.DCV::number
        , value:v.ENJR::number
        , value:v.ERRS::number
        , value:v.MXEC::number
        , value:v.TMPI::number
        , value:vd::number
        , value:z::number
        from
          raw_source
        , lateral flatten( input => src:events )
        where not exists
        (select 'x'
          from events
          where events.device_type = src:device_type
          and events.rv = value:rv);
    
    Copy

    Querying the EVENTS table shows the added row:

    select * from EVENTS;
    
    Copy

    The query returns the following result:

    +-------------+---------+---------+----------------------------------------------------------------------+---------------+-------+--------+-----+-------+------+------+------+------+-----+---------------+
    | DEVICE_TYPE | VERSION |       F | RV                                                                   |             T |  ACHZ |    ACV | DCA |   DCV | ENJR | ERRS | MXEC | TMPI |  VD |             Z |
    |-------------+---------+---------+----------------------------------------------------------------------+---------------+-------+--------+-----+-------+------+------+------+------+-----+---------------|
    | server      | 2.6     |      83 | "15219.64,783.63,48674.48,84679.52,27499.78,2178.83,0.42,74900.19"   | 1437560931139 | 42869 | 709489 | 232 | 62287 | 2599 |  205 |  487 |    9 |  54 | 1437644222811 |
    | server      | 2.6     | 1000083 | "8070.52,54470.71,85331.27,9.10,70825.85,65191.82,46564.53,29422.22" | 1437036965027 |  6953 | 346795 | 250 | 46066 | 9033 |  615 |    0 |  112 | 626 | 1437660796958 |
    | cell_phone  | 3.2     |      79 | "786954.67,492.68,3577.48,40.11,343.00,345.8,0.22,8765.22"           | 5769784730576 | 75846 |  98355 | 789 | 62287 | 2234 |  578 |  999 |    9 |  54 | 1437644222811 |
    +-------------+---------+---------+----------------------------------------------------------------------+---------------+-------+--------+-----+-------+------+------+------+------+-----+---------------+
    
    Copy

Use All Columns for Comparison

If the JSON data does not have fields that can be primary key candidates, you could compare all repeating JSON keys in the RAW_SOURCE table with the corresponding column values in the EVENTS table.

No changes to your existing EVENTS table are required.

  1. Insert a new JSON event record into the RAW_SOURCE table:

    insert into raw_source
      select
      parse_json ('{
        "device_type": "web_browser",
        "events": [
          {
            "f": 79,
            "rv": "122375.99,744.89,386.99,12.45,78.08,43.7,9.22,8765.43",
            "t": 5769784730576,
            "v": {
              "ACHZ": 768436,
              "ACV": 9475,
              "DCA": 94835,
              "DCV": 88845,
              "ENJR": 8754,
              "ERRS": 567,
              "MXEC": 823,
              "TMPI": 0
            },
            "vd": 55,
            "z": 8745598047355
          }
        ],
        "version": 8.7
      }');
    
    Copy
  2. Insert the new record in the RAW_SOURCE table into the EVENTS table based on a comparison of all repeating key values:

    insert into events
    select
          src:device_type::string
        , src:version::string
        , value:f::number
        , value:rv::variant
        , value:t::number
        , value:v.ACHZ::number
        , value:v.ACV::number
        , value:v.DCA::number
        , value:v.DCV::number
        , value:v.ENJR::number
        , value:v.ERRS::number
        , value:v.MXEC::number
        , value:v.TMPI::number
        , value:vd::number
        , value:z::number
        from
          raw_source
        , lateral flatten( input => src:events )
        where not exists
        (select 'x'
          from events
          where events.device_type = src:device_type
          and events.version = src:version
          and events.f = value:f
          and events.rv = value:rv
          and events.t = value:t
          and events.achz = value:v.ACHZ
          and events.acv = value:v.ACV
          and events.dca = value:v.DCA
          and events.dcv = value:v.DCV
          and events.enjr = value:v.ENJR
          and events.errs = value:v.ERRS
          and events.mxec = value:v.MXEC
          and events.tmpi = value:v.TMPI
          and events.vd = value:vd
          and events.z = value:z);
    
    Copy

    Querying the EVENTS table shows the added row:

    select * from EVENTS;
    
    Copy

    The query returns the following result:

    +-------------+---------+---------+----------------------------------------------------------------------+---------------+--------+--------+-------+-------+------+------+------+------+-----+---------------+
    | DEVICE_TYPE | VERSION |       F | RV                                                                   |             T |   ACHZ |    ACV |   DCA |   DCV | ENJR | ERRS | MXEC | TMPI |  VD |             Z |
    |-------------+---------+---------+----------------------------------------------------------------------+---------------+--------+--------+-------+-------+------+------+------+------+-----+---------------|
    | server      | 2.6     |      83 | "15219.64,783.63,48674.48,84679.52,27499.78,2178.83,0.42,74900.19"   | 1437560931139 |  42869 | 709489 |   232 | 62287 | 2599 |  205 |  487 |    9 |  54 | 1437644222811 |
    | server      | 2.6     | 1000083 | "8070.52,54470.71,85331.27,9.10,70825.85,65191.82,46564.53,29422.22" | 1437036965027 |   6953 | 346795 |   250 | 46066 | 9033 |  615 |    0 |  112 | 626 | 1437660796958 |
    | cell_phone  | 3.2     |      79 | "786954.67,492.68,3577.48,40.11,343.00,345.8,0.22,8765.22"           | 5769784730576 |  75846 |  98355 |   789 | 62287 | 2234 |  578 |  999 |    9 |  54 | 1437644222811 |
    | web_browser | 8.7     |      79 | "122375.99,744.89,386.99,12.45,78.08,43.7,9.22,8765.43"              | 5769784730576 | 768436 |   9475 | 94835 | 88845 | 8754 |  567 |  823 |    0 |  55 | 8745598047355 |
    +-------------+---------+---------+----------------------------------------------------------------------+---------------+--------+--------+-------+-------+------+------+------+------+-----+---------------+
    
    Copy

Next: Step 5. Clean Up