- Categories:
INFER_SCHEMA¶
Automatically detects the file metadata schema in a set of staged data files that contain semi-structured data and retrieves the column definitions.
The GENERATE_COLUMN_DESCRIPTION function builds on the INFER_SCHEMA function output to simplify the creation of new tables, external tables, or views (using the appropriate CREATE <object> command) based on the column definitions of the staged files.
The CREATE TABLE command with the USING TEMPLATE clause can be executed to create a new table with the column definitions derived from the INFER_SCHEMA function output.
Note
This feature is currently limited to Apache Parquet, Apache Avro, and ORC files.
Syntax¶
INFER_SCHEMA(
LOCATION => '{ internalStage | externalStage }'
, FILE_FORMAT => '<format_name>'
)
Where:
internalStage ::= @[<namespace>.]<int_stage_name>[/<path>][/<filename>] | @~[/<path>][/<filename>]externalStage ::= @[<namespace>.]<ext_stage_name>[/<path>][/<filename>]
Arguments¶
LOCATION => '...'
Name of the internal or external stage where the files are stored. Optionally include a path to one or more files in the cloud storage location; otherwise, the INFER_SCHEMA function scans files in all subdirectories in the stage:
@[namespace.]int_stage_name[/path][/filename]
Files are in the specified named internal stage.
@[namespace.]ext_stage_name[/path][/filename]
Files are in the specified named external stage.
@~[/path][/filename]
Files are in the stage for the current user.
Note
This SQL function supports named stages (internal or external) and user stages only. It does not support table stages.
FILE_FORMAT => 'format_name'
Name of the file format object that describes the data contained in the staged files. For more information, see CREATE FILE FORMAT.
Output¶
The function returns the following columns:
Column Name |
Data Type |
Description |
---|---|---|
COLUMN_NAME |
TEXT |
Name of a column in the staged files. |
TYPE |
TEXT |
Data type of the column. |
NULLABLE |
BOOLEAN |
Specifies whether rows in the column can store NULL instead of a value. Currently, the inferred nullability of a column can apply to one data file but not others in the scanned set. |
EXPRESSION |
TEXT |
Expression of the column in the format |
FILENAMES |
TEXT |
Names of the files that contain the column. |
Examples¶
Retrieve the column definitions for Parquet files in the mystage
stage:
-- Create a file format that sets the file type as Parquet.
CREATE FILE FORMAT my_parquet_format
TYPE = parquet;
-- Query the INFER_SCHEMA function.
SELECT *
FROM TABLE(
INFER_SCHEMA(
LOCATION=>'@mystage'
, FILE_FORMAT=>'my_parquet_format'
)
);
+-------------+---------+----------+---------------------+--------------------------+
| COLUMN_NAME | TYPE | NULLABLE | EXPRESSION | FILENAMES |
|-------------+---------+----------+---------------------+--------------------------|
| continent | TEXT | True | $1:continent::TEXT | geography/cities.parquet |
| country | VARIANT | True | $1:country::VARIANT | geography/cities.parquet |
+-------------+---------+----------+---------------------+--------------------------+
Similar to the previous example, but specify a single Parquet file in the mystage
stage:
-- Query the INFER_SCHEMA function.
SELECT *
FROM TABLE(
INFER_SCHEMA(
LOCATION=>'@mystage/geography/cities.parquet'
, FILE_FORMAT=>'my_parquet_format'
)
);
+-------------+---------+----------+---------------------+--------------------------+
| COLUMN_NAME | TYPE | NULLABLE | EXPRESSION | FILENAMES |
|-------------+---------+----------+---------------------+--------------------------|
| continent | TEXT | True | $1:continent::TEXT | geography/cities.parquet |
| country | VARIANT | True | $1:country::VARIANT | geography/cities.parquet |
+-------------+---------+----------+---------------------+--------------------------+