Categories:

Table, View, & Sequence DDL

CREATE EXTERNAL TABLE

Creates a new external table in the current/specified schema or replaces an existing external table. When queried, an external table reads data from a set of one or more files in a specified external stage and outputs the data in a single VARIANT column.

Additional columns can be defined, with each column definition consisting of a name, data type, and optionally whether the column requires a value (NOT NULL) or has any referential integrity constraints (primary key, foreign key, etc.). See the usage notes for more information.

See also:

ALTER EXTERNAL TABLE , DROP EXTERNAL TABLE , SHOW EXTERNAL TABLES

In this Topic:

Syntax

CREATE [ OR REPLACE ] EXTERNAL TABLE [IF NOT EXISTS]
  <table_name>
    ( [ <col_name> <col_type> AS <expr> | <part_col_name> <col_type> AS <part_expr> ]
      [ inlineConstraint ]
      [ , <col_name> <col_type> AS <expr> | <part_col_name> <col_type> AS <part_expr> ... ]
      [ , ... ] )
  cloudProviderParams
  [ PARTITION BY ( <part_col_name> [, <part_col_name> ... ] ) ]
  [ WITH ] LOCATION = externalStage
  [ REFRESH_ON_CREATE =  { TRUE | FALSE } ]
  [ AUTO_REFRESH = { TRUE | FALSE } ]
  [ PATTERN = '<regex_pattern>' ]
  FILE_FORMAT = ( { FORMAT_NAME = '<file_format_name>' | TYPE = { CSV | JSON | AVRO | ORC | PARQUET } [ formatTypeOptions ] } )
  [ AWS_SNS_TOPIC = <string> ]
  [ COPY GRANTS ]
  [ COMMENT = '<string_literal>' ]

Where:

inlineConstraint ::=
  [ NOT NULL ]
  [ CONSTRAINT <constraint_name> ]
  { UNIQUE | PRIMARY KEY | { [ FOREIGN KEY ] REFERENCES <ref_table_name> [ ( <ref_col_name> ) } }
  [ <constraint_properties> ]

For additional inline constraint details, see CREATE | ALTER TABLE … CONSTRAINT.

cloudProviderParams (for Microsoft Azure) ::=
  [ INTEGRATION = '<integration_name>' ]
externalStage ::=
  @[<namespace>.]<ext_stage_name>[/<path>]
formatTypeOptions ::=
-- If FILE_FORMAT = ( TYPE = CSV ... )
     COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE
     RECORD_DELIMITER = '<character>' | NONE
     FIELD_DELIMITER = '<character>' | NONE
     SKIP_HEADER = <integer>
     SKIP_BLANK_LINES = TRUE | FALSE
-- If FILE_FORMAT = ( TYPE = JSON ... )
     COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE
-- If FILE_FORMAT = ( TYPE = AVRO ... )
     COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE
-- If FILE_FORMAT = ( TYPE = PARQUET ... )
     COMPRESSION = AUTO | SNAPPY | NONE

Required Parameters

table_name

String that specifies the identifier (i.e. name) for the table; must be unique for the schema in which the table is created.

In addition, the identifier must start with an alphabetic character and cannot contain spaces or special characters unless the entire identifier string is enclosed in double quotes (e.g. "My object"). Identifiers enclosed in double quotes are also case-sensitive.

For more details, see Identifier Requirements.

[ WITH ] LOCATION =

Specifies the external stage where the files containing data to be read are staged:

@[namespace.]ext_stage_name[/path]

Files are in the specified named external stage.

Where:

  • namespace is the database and/or schema in which the external stage resides, in the form of database_name.schema_name or schema_name. It is optional if a database and schema are currently in use within the user session; otherwise, it is required.

  • path is an optional case-sensitive path for files in the cloud storage location (i.e. files have names that begin with a common string) that limits the set of files to load. Paths are alternatively called prefixes or folders by different cloud storage services.

    Note that the external table appends this path to any path specified in the stage definition. To view the stage definition, execute DESC STAGE stage_name and check the url property value. For example, if the stage URL includes path a and the external table location includes path b, then the external table reads files staged in stage/a/b.

FILE_FORMAT = ( FORMAT_NAME = 'file_format_name' ) or . FILE_FORMAT = ( TYPE = CSV | JSON | AVRO | ORC | PARQUET [ ... ] )

String (constant) that specifies the file format:

FORMAT_NAME = file_format_name

Specifies an existing named file format that describes the staged data files to scan. The named file format determines the format type (CSV, JSON, etc.), as well as any other format options, for data files.

TYPE = CSV | JSON | AVRO | ORC | PARQUET [ ... ]

Specifies the format type of the staged data files to scan when querying the external table.

If a file format type is specified, additional format-specific options can be specified. For more details, see Format Type Options (in this topic).

The file format options can be configured at either the external table or stage level. Any settings specified at the external table level take precedence. Any settings not specified at either level assume the default values.

Default: TYPE = CSV.

Important

The external table does not inherit the file format, if any, in the stage definition. You must explicitly specify any file format options for the external table using the FILE_FORMAT parameter.

Note

FORMAT_NAME and TYPE are mutually exclusive; to avoid unintended behavior, you should only specify one or the other when creating an external table.

Optional Parameters

col_name

String that specifies the column identifier (i.e. name). All the requirements for table identifiers also apply to column identifiers.

External table columns are virtual columns, which are defined using an explicit expression.

For more details, see Identifier Requirements.

col_type

String (constant) that specifies the data type for the column. The data type must match the result of expr for the column.

For details about the data types that can be specified for table columns, see Data Types.

expr

String that specifies the expression for the column. When queried, the column returns results derived from this expression.

CONSTRAINT ...

String that defines an inline or out-of-line constraint for the specified column(s) in the table.

For syntax details, see CREATE | ALTER TABLE … CONSTRAINT. For more information about constraints, see Constraints.

REFRESH_ON_CREATE = TRUE | FALSE

Specifies whether to automatically refresh the external table metadata once, immediately after the external table is created. Refreshing the external table metadata synchronizes the metadata with the current list of data files in the specified stage path. This action is required for the metadata to register any existing data files in the named external stage specified in the [ WITH ] LOCATION = setting.

TRUE

Snowflake automatically refreshes the external table metadata once after creation.

FALSE

Snowflake does not automatically refresh the external table metadata. To register any existing data files in the stage, you must manually refresh the external table metadata once using ALTER EXTERNAL TABLE … REFRESH.

Default: TRUE

AUTO_REFRESH = TRUE | FALSE

Specifies whether Snowflake should enable triggering automatic refreshes of the external table metadata when new or updated data files are available in the named external stage specified in the [ WITH ] LOCATION = setting.

Note

  • You must configure an event notification for your storage location (Amazon S3 or Microsoft Azure) to notify Snowflake when new or updated data is available to read into the external table metadata. For more information, see Refreshing External Tables Automatically for Amazon S3 (S3) or Refreshing External Tables Automatically for Azure Blob Storage (Azure).

  • Currently, the ability to automatically refresh the metadata is not available for external tables that reference Google Cloud Storage stages.

    As a workaround, we suggest following our best practices for staging your data files and periodically executing an ALTER EXTERNAL TABLE … REFRESH statement to register any missed files. For satisfactory performance, we also recommend using a selective path prefix with ALTER EXTERNAL TABLE to reduce the number of files that need to be listed and checked if they have been registered already (e.g. bucket_name/YYYY/MM/DD/ or even bucket_name/YYYY/MM/DD/HH/ depending on your volume).

  • When an external table is created, its metadata is refreshed automatically once unless REFRESH_ON_CREATE = FALSE.

TRUE

Snowflake enables triggering automatic refreshes of the external table metadata.

FALSE

Snowflake does not enable triggering automatic refreshes of the external table metadata. You must manually refresh the external table metadata periodically using ALTER EXTERNAL TABLE … REFRESH to synchronize the metadata with the current list of files in the stage path.

Default: TRUE

PATTERN = 'regex_pattern'

A regular expression pattern string, enclosed in single quotes, specifying the file names and/or paths on the external stage to match.

Tip

For the best performance, try to avoid applying patterns that filter on a large number of files.

Note

Currently, this parameter is only supported when the external table metadata is refreshed manually by executing an ALTER EXTERNAL TABLE ... REFRESH statement to register files. The parameter is not supported when the metadata is refreshed using event notifications.

AWS_SNS_TOPIC = string

Required only when configuring AUTO_REFRESH for Amazon S3 stages using Amazon Simple Notification Service (SNS). Specifies the Amazon Resource Name (ARN) for the SNS topic for your S3 bucket. The CREATE EXTERNAL TABLE statement subscribes the Amazon Simple Queue Service (SQS) queue to the specified SNS topic. Event notifications via the SNS topic trigger metadata refreshes. For more information, see Refreshing External Tables Automatically for Amazon S3.

COPY GRANTS

Specifies to retain the access permissions from the original table when an external table is recreated using the CREATE OR REPLACE TABLE variant. The parameter copies all permissions, except OWNERSHIP, from the existing table to the new table. By default, the role that executes the CREATE EXTERNAL TABLE command owns the new external table.

Note:

The operation to copy grants occurs atomically in the CREATE EXTERNAL TABLE command (i.e. within the same transaction).

COMMENT = 'string_literal'

String (literal) that specifies a comment for the external table.

Default: No value

Partitioning Parameters

Use these parameters to partition your external table.

part_col_name col_type AS part_expr

Required for partitioning the data in an external table

Specifies one or more partition columns in the external table.

A partition column must evaluate as an expression that parses the path and/or filename information in the METADATA$FILENAME pseudocolumn. Partition columns optimize query performance by pruning out the data files that do not need to be scanned (i.e. partitioning the external table). A partition consists of all data files that match the path and/or filename in the expression for the partition column.

part_col_name

String that specifies the partition column identifier (i.e. name). All the requirements for table identifiers also apply to column identifiers.

col_type

String (constant) that specifies the data type for the column. The data type must match the result of part_expr for the column.

part_expr

String that specifies the expression for the column. The expression must include the METADATA$FILENAME pseudocolumn.

External tables currently support the following subset of functions in partition expressions:

List of supported functions:

After defining any partition columns for the table, identify these columns using the PARTITION BY clause.

[ PARTITION BY ( part_col_name [, part_col_name ... ] ) ]

Specifies any partition columns to evaluate for the external table.

Usage

When querying an external table, include one or more partition columns in a WHERE clause, e.g.:

... WHERE part_col_name = 'filter_value'

Snowflake filters on the partition columns to restrict the set of data files to scan. Note that all rows in these files are scanned. If a WHERE clause includes non-partition columns, those filters are evaluated after the data files have been filtered.

A common practice is to partition the data files based on increments of time; or, if the data files are staged from multiple sources, to partition by a data source identifier and date or timestamp.

Cloud Provider Parameters (cloudProviderParams)

Microsoft Azure

INTEGRATION = integration_name

Specifies the name of the notification integration used to automatically refresh the external table metadata using Azure Event Grid notifications. A notification integration is a Snowflake object that provides an interface between Snowflake and third-party cloud message queuing services.

This parameter is required to enable auto-refresh operations for the external table. For instructions on configuring the auto-refresh capability, see Refreshing External Tables Automatically for Azure Blob Storage.

Format Type Options (formatTypeOptions)

Format type options are used for loading data into and unloading data out of tables.

Depending on the file format type specified (FILE_FORMAT = ( TYPE = ... )), you can include one or more of the following format-specific options (separated by blank spaces, commas, or new lines):

TYPE = CSV

COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE

String (constant) that specifies the current compression algorithm for the data files to be loaded. Snowflake uses this option to detect how already-compressed data files were compressed so that the compressed data in the files can be extracted for loading.

Supported Values

Notes

AUTO

Compression algorithm detected automatically, except for Brotli-compressed files, which cannot currently be detected automatically. If loading Brotli-compressed files, explicitly use BROTLI instead of AUTO.

GZIP

BZ2

BROTLI

Must be used if loading Brotli-compressed files.

ZSTD

Zstandard v0.8 (and higher) supported.

DEFLATE

Deflate-compressed files (with zlib header, RFC1950).

RAW_DEFLATE

Raw Deflate-compressed files (without header, RFC1951).

NONE

Data files to load have not been compressed.

RECORD_DELIMITER = 'character' | NONE

One or more singlebyte or multibyte characters that separate records in an input file.

Accepts common escape sequences, octal values (prefixed by \\), or hex values (prefixed by 0x). For example, for records delimited by the thorn (Þ) character, specify the octal (\\336) or hex (0xDE) value. Also accepts a value of NONE.

The specified delimiter must be a valid UTF-8 character and not a random sequence of bytes.

Multiple-character delimiters are also supported; however, the delimiter for RECORD_DELIMITER or FIELD_DELIMITER cannot be a substring of the delimiter for the other file format option (e.g. FIELD_DELIMITER = 'aa' RECORD_DELIMITER = 'aabb'). The delimiter is limited to a maximum of 20 characters.

Default: New line character. Note that “new line” is logical such that \r\n will be understood as a new line for files on a Windows platform.

FIELD_DELIMITER = 'character' | NONE

One or more singlebyte or multibyte characters that separate fields in an input file.

Accepts common escape sequences, octal values (prefixed by \\), or hex values (prefixed by 0x). For example, for fields delimited by the thorn (Þ) character, specify the octal (\\336) or hex (0xDE) value. Also accepts a value of NONE.

The specified delimiter must be a valid UTF-8 character and not a random sequence of bytes.

Multiple-character delimiters are also supported; however, the delimiter for RECORD_DELIMITER or FIELD_DELIMITER cannot be a substring of the delimiter for the other file format option (e.g. FIELD_DELIMITER = 'aa' RECORD_DELIMITER = 'aabb'). The delimiter is limited to a maximum of 20 characters.

Default: comma (,)

SKIP_HEADER = integer

Number of lines at the start of the file to skip.

Note that SKIP_HEADER does not use the RECORD_DELIMITER or FIELD_DELIMITER values to determine what a header line is; rather, it simply skips the specified number of CRLF (Carriage Return, Line Feed)-delimited lines in the file. RECORD_DELIMITER and FIELD_DELIMITER are then used to determine the rows of data to load.

Default: 0

SKIP_BLANK_LINES = TRUE | FALSE
Use

Data loading only

Definition

Boolean that specifies to skip any blank lines encountered in the data files; otherwise, blank lines produce an end-of-record error (default behavior).

Default: FALSE

TYPE = JSON

COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE

String (constant) that specifies the current compression algorithm for the data files to be loaded. Snowflake uses this option to detect how already-compressed data files were compressed so that the compressed data in the files can be extracted for loading.

Supported Values

Notes

AUTO

Compression algorithm detected automatically, except for Brotli-compressed files, which cannot currently be detected automatically. If loading Brotli-compressed files, explicitly use BROTLI instead of AUTO.

GZIP

BZ2

BROTLI

ZSTD

DEFLATE

Deflate-compressed files (with zlib header, RFC1950).

RAW_DEFLATE

Raw Deflate-compressed files (without header, RFC1951).

NONE

Indicates the files for loading data have not been compressed.

Default: AUTO

TYPE = AVRO

COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE

String (constant) that specifies the current compression algorithm for the data files to be loaded. Snowflake uses this option to detect how already-compressed data files were compressed so that the compressed data in the files can be extracted for loading.

Supported Values

Notes

AUTO

Compression algorithm detected automatically, except for Brotli-compressed files, which cannot currently be detected automatically. If loading Brotli-compressed files, explicitly use BROTLI instead of AUTO.

GZIP

BZ2

BROTLI

ZSTD

DEFLATE

Deflate-compressed files (with zlib header, RFC1950).

RAW_DEFLATE

Raw Deflate-compressed files (without header, RFC1951).

NONE

Data files to load have not been compressed.

Default: AUTO

TYPE = PARQUET

COMPRESSION = AUTO | SNAPPY | NONE

String (constant) that specifies the current compression algorithm for columns in the Parquet files.

Supported Values

Notes

AUTO

Compression algorithm detected automatically. Supports the following compression algorithms: Brotli, gzip, Lempel–Ziv–Oberhumer (LZO), LZ4, Snappy, or Zstandard v0.8 (and higher).

SNAPPY

NONE

Data files to load have not been compressed.

Default: AUTO

Usage Notes

  • External tables support external (i.e. S3, Azure, or GCS) stages only; internal (i.e. Snowflake) stages are not supported.

  • Every external table has a column named VALUE of type VARIANT. Additional columns might be specified. All of the columns are treated as virtual columns.

    • The VALUE column structures rows in a CSV data file as JSON objects with elements identified by column position, e.g. {c1: col_1_value, c2: col_2_value, c3: col_3_value ...}.

  • No referential integrity constants on external tables are enforced by Snowflake. This differs from the behavior for normal tables, whereby the NOT NULL constraint on columns is enforced.

  • External tables include the following metadata column:

    • METADATA$FILENAME: Name of each staged data file included in the external table. Includes the path to the data file in the stage.

  • The following are not supported for external tables:

    • Clustering keys

    • Cloning

    • Data in XML format

  • Time Travel is not supported for external tables.

Examples

Simple External Table

  1. Create an external stage named mystage for the storage location where a set of Parquet data files are stored. For more information, see CREATE STAGE.

    Amazon S3

    Create an external stage using a private/protected S3 bucket named mybucket with a folder path named files:

    CREATE OR REPLACE STAGE mystage URL='s3://mybucket/files/'
    ..
    ;
    

    Google Cloud Storage

    Create an external stage using an Google Cloud Storage container named mybucket with a folder path named files:

    CREATE OR REPLACE STAGE mystage
      URL='gcs://mybucket/files'
      ..
      ;
    

    Microsoft Azure

    Create an external stage using an Azure storage account named myaccount and a container named mycontainer with a folder path named files:

    CREATE OR REPLACE STAGE mystage
      URL='azure://myaccount.blob.core.windows.net/mycontainer/files'
      ..
      ;
    

    Note

    Use the blob.core.windows.net endpoint for all supported types of Azure blob storage accounts, including Data Lake Storage Gen2.

  2. Create an external table named ext_twitter_feed that references the Parquet files in the mystage external stage. The stage reference includes a folder path named daily. The external table appends this path to the stage definition, i.e. the external table references the data files in @mystage/files/daily`.

    The SQL command specifies Parquet as the file format type. In addition, file pattern matching is applied to include only Parquet files whose names include the string sales:

    Amazon S3

    CREATE OR REPLACE EXTERNAL TABLE ext_twitter_feed
      WITH LOCATION = @mystage/daily/
      AUTO_REFRESH = true
      FILE_FORMAT = (TYPE = PARQUET)
      PATTERN='.*sales.*[.]parquet';
    

    Google Cloud Storage

    CREATE OR REPLACE EXTERNAL TABLE ext_twitter_feed
      WITH LOCATION = @mystage/daily/
      FILE_FORMAT = (TYPE = PARQUET)
      PATTERN='.*sales.*[.]parquet';
    

    Microsoft Azure

    CREATE OR REPLACE EXTERNAL TABLE ext_twitter_feed
     INTEGRATION = 'MY_AZURE_INT'
     WITH LOCATION = @mystage/daily/
     AUTO_REFRESH = true
     FILE_FORMAT = (TYPE = PARQUET)
     PATTERN='.*sales.*[.]parquet';
    
  3. Refresh the external table metadata:

    ALTER EXTERNAL TABLE ext_twitter_feed REFRESH;
    

Partitioned External Table

Create a partitioned external table that partitions data by the logical, granular details in the stage path.

In the following example, the data files are organized in cloud storage with the following structure: logs/YYYY/MM/DD/HH24, e.g.:

  • logs/2018/08/05/0524/

  • logs/2018/08/27/1408/

  1. Create an external stage named exttable_part_stage for the storage location where the data files are stored. For more information, see CREATE STAGE.

    The stage definition includes the path /files/logs/:

    Amazon S3

    CREATE STAGE exttable_part_stage
      URL='s3://mybucket/files/logs/'
      ..
      ;
    

    Google Cloud Storage

    CREATE STAGE exttable_part_stage
      URL='gcs://mybucket/files/logs/'
      ..
      ;
    

    Microsoft Azure

    CREATE STAGE exttable_part_stage
      URL='azure://mycontainer/files/logs/'
      ..
      ;
    
  2. Query the METADATA$FILENAME pseudocolumn in the staged data. Use the results to develop your partition column(s):

    SELECT metadata$filename FROM @exttable_part_stage/;
    
    +----------------------------------------+
    | METADATA$FILENAME                      |
    |----------------------------------------|
    | files/logs/2018/08/05/0524/log.parquet |
    | files/logs/2018/08/27/1408/log.parquet |
    +----------------------------------------+
    
  3. Create the partitioned external table.

    The partition column date_part casts YYYY/MM/DD in the METADATA$FILENAME pseudocolumn as a date using TO_DATE , DATE.

    The SQL command also specifies Parquet as the file format type:

    Amazon S3

    CREATE EXTERNAL TABLE exttable_part(
      date_part date as to_date(substr(metadata$filename, 12, 10), 'YYYY/MM/DD'),
      timestamp bigint AS (value:timestamp::bigint),
      col2 varchar AS (value:col2::varchar))
      PARTITION BY (date_part)
      LOCATION=@exttable_part_stage/logs/
      AUTO_REFRESH = true
      FILE_FORMAT = (TYPE = PARQUET);
    

    Google Cloud Storage

    CREATE EXTERNAL TABLE exttable_part(
      date_part date as to_date(substr(metadata$filename, 12, 10), 'YYYY/MM/DD'),
      timestamp bigint AS (value:timestamp::bigint),
      col2 varchar AS (value:col2::varchar))
      PARTITION BY (date_part)
      LOCATION=@exttable_part_stage/logs/
      AUTO_REFRESH = true
      FILE_FORMAT = (TYPE = PARQUET);
    

    Microsoft Azure

    CREATE EXTERNAL TABLE exttable_part(
     date_part date as to_date(substr(metadata$filename, 12, 10), 'YYYY/MM/DD'),
     timestamp bigint AS (value:timestamp::bigint),
     col2 varchar AS (value:col2::varchar))
     INTEGRATION = 'MY_INT'
     PARTITION BY (date_part)
     LOCATION=@exttable_part_stage/logs/
     AUTO_REFRESH = true
     FILE_FORMAT = (TYPE = PARQUET);
    
  4. Refresh the external table metadata:

    ALTER EXTERNAL TABLE exttable_part REFRESH;
    

When querying the external table, filter the data by the partition column using a WHERE clause:

SELECT timestamp, col2 FROM exttable_part WHERE date_part = to_date('08/05/2018');

Simple external table: AUTO_REFRESH Using Amazon SNS

Create a non-partitioned external table in the current schema whose metadata is refresh automatically when triggered by event notifications received from Amazon SNS:

CREATE OR REPLACE EXTERNAL TABLE ext_table
     WITH LOCATION = @mystage/path1/
     FILE_FORMAT = (TYPE = JSON)
     AWS_SNS_TOPIC = 'arn:aws:sns:us-west-2:001234567890:s3_mybucket';

Materialized View on an External Table

Create a materialized view based on a subquery of the columns in the external table created in the Partitioned External Table example:

CREATE MATERIALIZED VIEW exttable_part_mv
  AS
  SELECT col2 FROM exttable_part;

For general syntax, usage notes, and further examples for this SQL command, see CREATE MATERIALIZED VIEW.