Loading Objects Larger Than 16 MB in Size

In the past, an error occurred when you attempted to load an object larger than 16 MB from a file on a stage into one of the following types of columns:

The following error occurs because the maximum size of an object stored in a column is 16 MB:

Max LOB size (16777216) exceeded

This error also occurred if you attempted to query a file on a stage, and the file contained objects larger than 16 MB in size.

Although you still cannot store objects larger than 16 MB in size in a column, you can now query objects up to 128 MB in size in files on a stage and reduce the size of the objects before storing the objects in columns. An error no longer occurs when you query a file containing objects larger than 16 MB but smaller than 128 MB.

For example, you can split large objects across multiple columns or rows, transform nested JSON into a tabular format, or simplify complex geometries.

Example: Loading and Splitting JSON Objects from a Parquet File

Suppose that you are loading a Parquet file from a stage, and the Parquet file contains a JSON object that is larger than 16 MB in size:

{
  "ID": 1,
  "CustomerDetails": {
    "RegistrationDate": 158415500,
    "FirstName": "John",
    "LastName": "Doe",
    "Events": [
      {
        "Type": "LOGIN",
        "Time": 1584158401,
        "EventID": "NZ0000000001"
      },
      ...
      /* this array contains thousands of elements */
      /* with total size exceeding 16 MB */
      ...
      {
        "Type": "LOGOUT",
        "Time": 1584158402,
        "EventID": "NZ0000000002"
      }
    ]
  }
}
Copy

The following example creates a table to store the data from the file and loads the data into the table. Because the size of the array of events can exceed 16 MB, the example expands the array of events into separate rows (one for each array element).

CREATE OR REPLACE TABLE mytable AS
  SELECT
      t1.$1:ID AS id,
      t1.$1:CustomerDetails:RegistrationDate::varchar as RegistrationDate,
      t1.$1:CustomerDetails:FirstName::varchar as First_Name,
      t1.$1:CustomerDetails:LastName::varchar as Last_Name,
      t2.value as Event
  FROM @json t1,
TABLE(FLATTEN(input => $1:CustomerDetails:Events)) t2;
Copy

The following is an example of the contents of the resulting table.

+----+-------------------+------------+------------+------------------------------+
| ID | REGISTRATION_DATE | FIRST_NAME | LAST_NAME  | EVENT                        |
|----+-------------------+------------+------------+------------------------------|
| 1  | 158415500         | John       | Doe        | {                            |
|    |                   |            |            |   "EventID": "NZ0000000001", |
|    |                   |            |            |   "Time": 1584158401,        |
|    |                   |            |            |   "Type": "LOGIN"            |
|    |                   |            |            | }                            |
|    |                ... thousands of rows ...    |                              |
| 1  | 158415500         | John       | Doe        | {                            |
|    |                   |            |            |   "EventID": "NZ0000000002", |
|    |                   |            |            |   "Time": 1584158402,        |
|    |                   |            |            |   "Type": "LOGOUT"           |
|    |                   |            |            | }                            |
+----+-------------------+------------+------------+------------------------------+

Inserting FLATTEN Results to an Existing Table

To insert the result of FLATTEN into an existing table, use an INSERT statement. For example:

CREATE OR REPLACE TABLE mytable (
  id VARCHAR,
  registration_date VARCHAR(16777216),
  first_name VARCHAR(16777216),
  last_name VARCHAR(16777216),
  event VARCHAR(16777216)
);

INSERT INTO mytable
  SELECT
    t1.$1:ID,
    t1.$1:CustomerDetails:RegistrationDate::varchar,
    t1.$1:CustomerDetails:FirstName::varchar,
    t1.$1:CustomerDetails:LastName::varchar,
    t2.value
FROM @json t1,
TABLE(FLATTEN(input => $1:CustomerDetails:Events)) t2;
Copy

Example: Loading and Splitting XML

Suppose that you are loading an XML file from a stage, and it contains an XML object that is larger than 16 MB in size:

<?xml version='1.0' encoding='UTF-8'?>
<osm version="0.6" generator="osmium/1.14.0">
  <node id="197798" version="17" timestamp="2021-09-06T17:01:27Z" />
  <node id="197824" version="7" timestamp="2021-08-04T23:17:18Z" >
    <tag k="highway" v="traffic_signals"/>
  </node>
  <!--  thousands of node elements with total size exceeding 16MB -->
  <node id="197826" version="4" timestamp="2021-08-04T16:43:28Z" />
</osm>
Copy

The following example creates a table to store the data from the file and loads the data into the table. Because the size of the XML can exceed 16 MB, the example expands each node into separate rows.

CREATE OR REPLACE TABLE mytable AS
  SELECT
    value:"@id" AS id,
    value:"@version" AS version,
    value:"@timestamp"::datetime AS timestamp,
    value:"$" AS tags
  FROM @mystage,
  LATERAL FLATTEN(input => $1:"$")
  WHERE value:"@" = 'node';
Copy

The following is an example of the contents of the resulting table.

+--------+---------+-------------------------+---------------------------------------------+
| ID     | VERSION | TIMESTAMP               | TAGS                                        |
|--------+---------+-------------------------+---------------------------------------------|
| 197798 | 17      | 2021-09-06 17:01:27.000 | ""                                          |
| 197824 | 7       | 2021-08-04 23:17:18.000 | <tag k="highway" v="traffic_signals"></tag> |
|        |          ... thousands of rows ...                                              |
| 197826 | 4       | 2021-08-04 16:43:28.000 | ""                                          |
+--------+---------+-------------------------+---------------------------------------------+

Example: Loading and Simplifying Large Geospatial Objects Before Storing Them

Suppose that you are loading a Parquet file from a stage, and the Parquet file contains a geospatial object that is larger than 16 MB in size. You can load the file from the stage and simplify the geospatial object (by using ST_SIMPLIFY) before you store the object:

CREATE OR REPLACE TABLE mytable AS
  SELECT
    ST_SIMPLIFY($1:geo, 10) AS geo
  FROM @mystage;
Copy

Example: Using COPY INTO <table>

If you need to use COPY INTO <table> to load data from a file on a stage, you cannot use FLATTEN to split up large objects. Instead, use SELECT. For example:

CREATE OR REPLACE TABLE mytable (
  id VARCHAR,
  registration_date VARCHAR,
  first_name VARCHAR,
  last_name VARCHAR
);

COPY INTO mytable (
  id,
  registration_date,
  first_name,
  last_name
) FROM (
  SELECT
      $1:ID,
      $1:CustomerDetails::OBJECT:RegistrationDate::VARCHAR,
      $1:CustomerDetails::OBJECT:FirstName::VARCHAR,
      $1:CustomerDetails::OBJECT:LastName::VARCHAR
  FROM @mystage
);
Copy