Categories:

Table Functions

INFER_SCHEMA

Automatically detects the file metadata schema in a set of staged data files that contain semi-structured data and retrieves the column definitions. Use the column definitions to simplify the creation of a landing table or external table to query the data.

Note

This feature is currently limited to Apache Parquet, Apache Avro, and ORC files.

Syntax

INFER_SCHEMA(
  LOCATION => '{ internalStage | externalStage }'
  , FILE_FORMAT => '<format_name>'
)

Where:

internalStage ::=
    @[<namespace>.]<int_stage_name>[/<path>][/<filename>]
  | @~[/<path>][/<filename>]
externalStage ::=
  @[<namespace>.]<ext_stage_name>[/<path>][/<filename>]

Arguments

LOCATION => '...'

Name of the internal or external stage where the files are stored. Optionally include a path to one or more files in the cloud storage location; otherwise, the INFER_SCHEMA function scans files in all subdirectories in the stage:

@[namespace.]int_stage_name[/path][/filename]

Files are in the specified named internal stage.

@[namespace.]ext_stage_name[/path][/filename]

Files are in the specified named external stage.

@~[/path][/filename]

Files are in the stage for the current user.

Note

This SQL function supports named stages (internal or external) and user stages only. It does not support table stages.

FILE_FORMAT => 'format_name'

Name of the file format object that describes the data contained in the staged files. For more information, see CREATE FILE FORMAT.

Usage Notes

None.

Output

The function returns the following columns:

Column Name

Data Type

Description

COLUMN_NAME

TEXT

Name of a column in the staged files.

TYPE

TEXT

Data type of the column.

NULLABLE

BOOLEAN

Specifies whether rows in the column can store NULL instead of a value. Currently, the inferred nullability of a column can apply to one data file but not others in the scanned set.

EXPRESSION

TEXT

Expression of the column in the format $1:COLUMN_NAME::TYPE (primarily for external tables).

FILENAMES

TEXT

Names of the files that contain the column.

Examples

Retrieve the column definitions for Parquet files in the mystage stage:

-- Create a file format that sets the file type as Parquet.
CREATE FILE FORMAT my_parquet_format
  TYPE = parquet;

-- Query the INFER_SCHEMA function.
SELECT *
  FROM TABLE(
    INFER_SCHEMA(
      LOCATION=>'@mystage'
      , FILE_FORMAT=>'my_parquet_format'
      )
    );

+-------------+---------+----------+---------------------+--------------------------+
| COLUMN_NAME | TYPE    | NULLABLE | EXPRESSION          | FILENAMES                |
|-------------+---------+----------+---------------------+--------------------------|
| continent   | TEXT    | True     | $1:continent::TEXT  | geography/cities.parquet |
| country     | VARIANT | True     | $1:country::VARIANT | geography/cities.parquet |
+-------------+---------+----------+---------------------+--------------------------+

Similar to the previous example, but specify a single Parquet file in the mystage stage:

-- Query the INFER_SCHEMA function.
SELECT *
  FROM TABLE(
    INFER_SCHEMA(
      LOCATION=>'@mystage/geography/cities.parquet'
      , FILE_FORMAT=>'my_parquet_format'
      )
    );

+-------------+---------+----------+---------------------+--------------------------+
| COLUMN_NAME | TYPE    | NULLABLE | EXPRESSION          | FILENAMES                |
|-------------+---------+----------+---------------------+--------------------------|
| continent   | TEXT    | True     | $1:continent::TEXT  | geography/cities.parquet |
| country     | VARIANT | True     | $1:country::VARIANT | geography/cities.parquet |
+-------------+---------+----------+---------------------+--------------------------+