Categories:

Table Functions

INFER_SCHEMA

Automatically detects the file metadata schema in a set of staged data files that contain semi-structured data and retrieves the column definitions.

The GENERATE_COLUMN_DESCRIPTION function builds on the INFER_SCHEMA function output to simplify the creation of new tables, external tables, or views (using the appropriate CREATE <object> command) based on the column definitions of the staged files.

The CREATE TABLE command with the USING TEMPLATE clause can be executed to create a new table with the column definitions derived from the INFER_SCHEMA function output.

Note

This feature is currently limited to Apache Parquet, Apache Avro, and ORC files.

Syntax

INFER_SCHEMA(
  LOCATION => '{ internalStage | externalStage }'
  , FILE_FORMAT => '<format_name>'
)

Where:

internalStage ::=
    @[<namespace>.]<int_stage_name>[/<path>][/<filename>]
  | @~[/<path>][/<filename>]
externalStage ::=
  @[<namespace>.]<ext_stage_name>[/<path>][/<filename>]

Arguments

LOCATION => '...'

Name of the internal or external stage where the files are stored. Optionally include a path to one or more files in the cloud storage location; otherwise, the INFER_SCHEMA function scans files in all subdirectories in the stage:

@[namespace.]int_stage_name[/path][/filename]

Files are in the specified named internal stage.

@[namespace.]ext_stage_name[/path][/filename]

Files are in the specified named external stage.

@~[/path][/filename]

Files are in the stage for the current user.

Note

This SQL function supports named stages (internal or external) and user stages only. It does not support table stages.

FILE_FORMAT => 'format_name'

Name of the file format object that describes the data contained in the staged files. For more information, see CREATE FILE FORMAT.

Output

The function returns the following columns:

Column Name

Data Type

Description

COLUMN_NAME

TEXT

Name of a column in the staged files.

TYPE

TEXT

Data type of the column.

NULLABLE

BOOLEAN

Specifies whether rows in the column can store NULL instead of a value. Currently, the inferred nullability of a column can apply to one data file but not others in the scanned set.

EXPRESSION

TEXT

Expression of the column in the format $1:COLUMN_NAME::TYPE (primarily for external tables).

FILENAMES

TEXT

Names of the files that contain the column.

Examples

Retrieve the column definitions for Parquet files in the mystage stage:

-- Create a file format that sets the file type as Parquet.
CREATE FILE FORMAT my_parquet_format
  TYPE = parquet;

-- Query the INFER_SCHEMA function.
SELECT *
  FROM TABLE(
    INFER_SCHEMA(
      LOCATION=>'@mystage'
      , FILE_FORMAT=>'my_parquet_format'
      )
    );

+-------------+---------+----------+---------------------+--------------------------+
| COLUMN_NAME | TYPE    | NULLABLE | EXPRESSION          | FILENAMES                |
|-------------+---------+----------+---------------------+--------------------------|
| continent   | TEXT    | True     | $1:continent::TEXT  | geography/cities.parquet |
| country     | VARIANT | True     | $1:country::VARIANT | geography/cities.parquet |
+-------------+---------+----------+---------------------+--------------------------+

Similar to the previous example, but specify a single Parquet file in the mystage stage:

-- Query the INFER_SCHEMA function.
SELECT *
  FROM TABLE(
    INFER_SCHEMA(
      LOCATION=>'@mystage/geography/cities.parquet'
      , FILE_FORMAT=>'my_parquet_format'
      )
    );

+-------------+---------+----------+---------------------+--------------------------+
| COLUMN_NAME | TYPE    | NULLABLE | EXPRESSION          | FILENAMES                |
|-------------+---------+----------+---------------------+--------------------------|
| continent   | TEXT    | True     | $1:continent::TEXT  | geography/cities.parquet |
| country     | VARIANT | True     | $1:country::VARIANT | geography/cities.parquet |
+-------------+---------+----------+---------------------+--------------------------+
Back to top