Categories:

Table Functions

INFER_SCHEMA

Automatically detects the file metadata schema in a set of staged data files that contain semi-structured data and retrieves the column definitions.

The GENERATE_COLUMN_DESCRIPTION function builds on the INFER_SCHEMA function output to simplify the creation of new tables, external tables, or views (using the appropriate CREATE <object> command) based on the column definitions of the staged files.

The CREATE TABLE or CREATE EXTERNAL TABLE command with the USING TEMPLATE clause can be executed to create a new table or external table with the column definitions derived from the INFER_SCHEMA function output.

Note

This feature is currently limited to Apache Parquet, Apache Avro, and ORC files.

Syntax

INFER_SCHEMA(
  LOCATION => '{ internalStage | externalStage }'
  , FILE_FORMAT => '<file_format_name>'
  , FILES => '<file_name>' [ , '<file_name>' ] [ , ... ]
  , IGNORE_CASE => TRUE | FALSE
)

Where:

internalStage ::=
    @[<namespace>.]<int_stage_name>[/<path>][/<filename>]
  | @~[/<path>][/<filename>]
externalStage ::=
  @[<namespace>.]<ext_stage_name>[/<path>][/<filename>]

Arguments

LOCATION => '...'

Name of the internal or external stage where the files are stored. Optionally include a path to one or more files in the cloud storage location; otherwise, the INFER_SCHEMA function scans files in all subdirectories in the stage:

@[namespace.]int_stage_name[/path][/filename]

Files are in the specified named internal stage.

@[namespace.]ext_stage_name[/path][/filename]

Files are in the specified named external stage.

@~[/path][/filename]

Files are in the stage for the current user.

Note

This SQL function supports named stages (internal or external) and user stages only. It does not support table stages.

FILES => '<file_name>' [ , '<file_name>' ] [ , ... ]

Specifies a list of one or more files (separated by commas) in a set of staged files that contain semi-structured data. The files must already have been staged in either the Snowflake internal location or external location specified in the command. If any of the specified files cannot be found, the query will be aborted.

The maximum number of files names that can be specified is 1000.

Note

For external stages only (Amazon S3, Google Cloud Storage, or Microsoft Azure), the file path is set by concatenating the URL in the stage definition and the list of resolved file names.

However, Snowflake doesn’t insert a separator implicitly between the path and file names. You must explicitly include a separator (/) either at the end of the URL in the stage definition or at the beginning of each file name specified in this parameter.

FILE_FORMAT => '<file_format_name>'

Name of the file format object that describes the data contained in the staged files. For more information, see CREATE FILE FORMAT.

IGNORE_CASE => TRUE | FALSE

Specifies whether column names detected from stage files are treated as case sensitive. By default, the value is FALSE, which means that Snowflake preserves the case of alphabetic characters when retrieving column names. If you specify the value as TRUE, column names are treated as case-insensitive and all column names are retrieved as uppercase letters.

Output

The function returns the following columns:

Column Name

Data Type

Description

COLUMN_NAME

TEXT

Name of a column in the staged files.

TYPE

TEXT

Data type of the column.

NULLABLE

BOOLEAN

Specifies whether rows in the column can store NULL instead of a value. Currently, the inferred nullability of a column can apply to one data file but not others in the scanned set.

EXPRESSION

TEXT

Expression of the column in the format $1:COLUMN_NAME::TYPE (primarily for external tables). If IGNORE_CASE is specified as TRUE, the expression of the column will be in the format GET_IGNORE_CASE ($1, COLUMN_NAME)::TYPE.

FILENAMES

TEXT

Names of the files that contain the column.

ORDER_ID

NUMBER

Column order in the staged files.

Examples

Retrieve the column definitions for Parquet files in the mystage stage:

-- Create a file format that sets the file type as Parquet.
CREATE FILE FORMAT my_parquet_format
  TYPE = parquet;

-- Query the INFER_SCHEMA function.
SELECT *
  FROM TABLE(
    INFER_SCHEMA(
      LOCATION=>'@mystage'
      , FILE_FORMAT=>'my_parquet_format'
      )
    );

+-------------+---------+----------+---------------------+--------------------------+----------+
| COLUMN_NAME | TYPE    | NULLABLE | EXPRESSION          | FILENAMES                | ORDER_ID |
|-------------+---------+----------+---------------------+--------------------------|----------+
| continent   | TEXT    | True     | $1:continent::TEXT  | geography/cities.parquet | 0        |
| country     | VARIANT | True     | $1:country::VARIANT | geography/cities.parquet | 1        |
| COUNTRY     | VARIANT | True     | $1:COUNTRY::VARIANT | geography/cities.parquet | 2        |
+-------------+---------+----------+---------------------+--------------------------+----------+

Similar to the previous example, but specify a single Parquet file in the mystage stage:

-- Query the INFER_SCHEMA function.
SELECT *
  FROM TABLE(
    INFER_SCHEMA(
      LOCATION=>'@mystage/geography/cities.parquet'
      , FILE_FORMAT=>'my_parquet_format'
      )
    );

+-------------+---------+----------+---------------------+--------------------------+----------+
| COLUMN_NAME | TYPE    | NULLABLE | EXPRESSION          | FILENAMES                | ORDER_ID |
|-------------+---------+----------+---------------------+--------------------------|----------+
| continent   | TEXT    | True     | $1:continent::TEXT  | geography/cities.parquet | 0        |
| country     | VARIANT | True     | $1:country::VARIANT | geography/cities.parquet | 1        |
| COUNTRY     | VARIANT | True     | $1:COUNTRY::VARIANT | geography/cities.parquet | 2        |
+-------------+---------+----------+---------------------+--------------------------+----------+

Retrieve the column definitions for Parquet files in the mystage stage with IGNORE_CASE specified as TRUE. In the returned output, all column names are retrieved as uppercase letters.

-- Query the INFER_SCHEMA function.
SELECT *
  FROM TABLE(
    INFER_SCHEMA(
      LOCATION=>'@mystage'
      , FILE_FORMAT=>'my_parquet_format'
      , IGNORE_CASE=>TRUE
      )
    );

+-------------+---------+----------+----------------------------------------+--------------------------+----------+
| COLUMN_NAME | TYPE    | NULLABLE | EXPRESSION                             | FILENAMES                | ORDER_ID |
|-------------+---------+----------+---------------------+---------------------------------------------|----------+
| CONTINENT   | TEXT    | True     | GET_IGNORE_CASE ($1, CONTINENT)::TEXT  | geography/cities.parquet | 0        |
| COUNTRY     | VARIANT | True     | GET_IGNORE_CASE ($1, COUNTRY)::VARIANT | geography/cities.parquet | 1        |
+-------------+---------+----------+---------------------+---------------------------------------------+----------+
Back to top