- Categories:
GENERATE_COLUMN_DESCRIPTION¶
Generates a list of columns from a set of staged files that contain semi-structured data using the INFER_SCHEMA function output.
The output from this function can be used as input when manually creating a table, external table, or view (using the appropriate CREATE <object> command) based on the column definitions of the staged files.
Alternatively, the CREATE TABLE command with the USING TEMPLATE clause can be used to create a new table with the column definitions derived from the same INFER_SCHEMA function output.
Syntax¶
GENERATE_COLUMN_DESCRIPTION( <expr> , '<string>' )
Arguments¶
expr
Output of the INFER_SCHEMA function formatted as an array.
'string'
Type of object that could be created from the column list. The appropriate formatting for this type is applied to the output.
Possible values are
table
,external_table
, orview
.
Returns¶
The function returns the list of columns in a set of staged files, which can be used as input when creating an object of the type identified in the second argument.
Examples¶
Detect, format, and output the set of column definitions in a set of Parquet files staged in the mystage
stage. The output columns are
formatted for creating a table.
This example builds on an example in the INFER_SCHEMA topic:
-- Create a file format that sets the file type as Parquet.
CREATE FILE FORMAT my_parquet_format
TYPE = parquet;
-- Query the GENERATE_COLUMN_DESCRIPTION function.
SELECT GENERATE_COLUMN_DESCRIPTION(ARRAY_AGG(OBJECT_CONSTRUCT(*)), 'table') AS COLUMNS
FROM TABLE (
INFER_SCHEMA(
LOCATION=>'@mystage',
FILE_FORMAT=>'my_parquet_format'
)
);
+--------------------+
| COLUMN_DESCRIPTION |
|--------------------|
| "country" VARIANT, |
| "continent" TEXT |
+--------------------+
-- The function output can be used to define the columns in a table.
CREATE TABLE mytable ("country" VARIANT, "continent" TEXT);
Same as the previous example, but generates a set of columns formatted for creating an external table:
-- Query the GENERATE_COLUMN_DESCRIPTION function.
SELECT GENERATE_COLUMN_DESCRIPTION(ARRAY_AGG(OBJECT_CONSTRUCT(*)), 'external_table') AS COLUMNS
FROM TABLE (
INFER_SCHEMA(
LOCATION=>'@mystage',
FILE_FORMAT=>'my_parquet_format'
)
);
+---------------------------------------------+
| COLUMN_DESCRIPTION |
|---------------------------------------------|
| "country" VARIANT AS ($1:country::VARIANT), |
| "continent" TEXT AS ($1:continent::TEXT) |
+---------------------------------------------+
Same as the previous examples, but generates a set of columns formatted for creating a view:
-- Query the GENERATE_COLUMN_DESCRIPTION function.
SELECT GENERATE_COLUMN_DESCRIPTION(ARRAY_AGG(OBJECT_CONSTRUCT(*)), 'view') AS COLUMNS
FROM TABLE (
INFER_SCHEMA(
LOCATION=>'@mystage',
FILE_FORMAT=>'my_parquet_format'
)
);
+--------------------+
| COLUMN_DESCRIPTION |
|--------------------|
| "country" , |
| "continent" |
+--------------------+
Note
Using *
for ARRAY_AGG(OBJECT_CONSTRUCT())
may result in an error if the returned result is larger than 16MB. It is recommended to avoid using *
for larger result sets, and only use the required columns, COLUMN NAME
, TYPE
, and NULLABLE
, for the query. Optional column ORDER_ID
can be included when using WITHIN GROUP (ORDER BY order_id)
.