Categories:

Table, View, & Sequence DDL

CREATE EXTERNAL TABLE

Creates a new external table in the current/specified schema or replaces an existing external table. When queried, an external table reads data from a set of one or more files in a specified external stage and outputs the data in a single VARIANT column.

Additional columns can be defined, with each column definition consisting of a name, data type, and optionally whether the column requires a value (NOT NULL) or has any referential integrity constraints (primary key, foreign key, etc.). See the usage notes for more information.

See also:

ALTER EXTERNAL TABLE , DROP EXTERNAL TABLE , SHOW EXTERNAL TABLES , DESCRIBE EXTERNAL TABLE

In this Topic:

Syntax

-- Partitions computed from expressions
CREATE [ OR REPLACE ] EXTERNAL TABLE [IF NOT EXISTS]
  <table_name>
    ( [ <col_name> <col_type> AS <expr> | <part_col_name> <col_type> AS <part_expr> ]
      [ inlineConstraint ]
      [ , <col_name> <col_type> AS <expr> | <part_col_name> <col_type> AS <part_expr> ... ]
      [ , ... ] )
  cloudProviderParams
  [ PARTITION BY ( <part_col_name> [, <part_col_name> ... ] ) ]
  [ WITH ] LOCATION = externalStage
  [ REFRESH_ON_CREATE =  { TRUE | FALSE } ]
  [ AUTO_REFRESH = { TRUE | FALSE } ]
  [ PATTERN = '<regex_pattern>' ]
  FILE_FORMAT = ( { FORMAT_NAME = '<file_format_name>' | TYPE = { CSV | JSON | AVRO | ORC | PARQUET } [ formatTypeOptions ] } )
  [ AWS_SNS_TOPIC = '<string>' ]
  [ TABLE_FORMAT = DELTA ]
  [ COPY GRANTS ]
  [ [ WITH ] ROW ACCESS POLICY <policy_name> ON (VALUE) ]
  [ [ WITH ] TAG ( <tag_name> = '<tag_value>' [ , <tag_name> = '<tag_value>' , ... ] ) ]
  [ COMMENT = '<string_literal>' ]

-- Partitions added and removed manually
CREATE [ OR REPLACE ] EXTERNAL TABLE [IF NOT EXISTS]
  <table_name>
    ( [ <col_name> <col_type> AS <expr> | <part_col_name> <col_type> AS <part_expr> ]
      [ inlineConstraint ]
      [ , <col_name> <col_type> AS <expr> | <part_col_name> <col_type> AS <part_expr> ... ]
      [ , ... ] )
  cloudProviderParams
  [ PARTITION BY ( <part_col_name> [, <part_col_name> ... ] ) ]
  [ WITH ] LOCATION = externalStage
  PARTITION_TYPE = USER_SPECIFIED
  FILE_FORMAT = ( { FORMAT_NAME = '<file_format_name>' | TYPE = { CSV | JSON | AVRO | ORC | PARQUET } [ formatTypeOptions ] } )
  [ COPY GRANTS ]
  [ [ WITH ] ROW ACCESS POLICY <policy_name> ON (VALUE) ]
  [ [ WITH ] TAG ( <tag_name> = '<tag_value>' [ , <tag_name> = '<tag_value>' , ... ] ) ]
  [ COMMENT = '<string_literal>' ]

Where:

inlineConstraint ::=
  [ NOT NULL ]
  [ CONSTRAINT <constraint_name> ]
  { UNIQUE | PRIMARY KEY | [ FOREIGN KEY ] REFERENCES <ref_table_name> [ ( <ref_col_name> [ , <ref_col_name> ] ) ] }
  [ <constraint_properties> ]

For additional inline constraint details, see CREATE | ALTER TABLE … CONSTRAINT.

cloudProviderParams (for Google Cloud Storage) ::=
  [ INTEGRATION = '<integration_name>' ]

cloudProviderParams (for Microsoft Azure) ::=
  [ INTEGRATION = '<integration_name>' ]
externalStage ::=
  @[<namespace>.]<ext_stage_name>[/<path>]
formatTypeOptions ::=
-- If FILE_FORMAT = ( TYPE = CSV ... )
     COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE
     RECORD_DELIMITER = '<character>' | NONE
     FIELD_DELIMITER = '<character>' | NONE
     SKIP_HEADER = <integer>
     SKIP_BLANK_LINES = TRUE | FALSE
-- If FILE_FORMAT = ( TYPE = JSON ... )
     COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE
-- If FILE_FORMAT = ( TYPE = AVRO ... )
     COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE
-- If FILE_FORMAT = ( TYPE = ORC ... )
     TRIM_SPACE = TRUE | FALSE
     NULL_IF = ( '<string>' [ , '<string>' ... ]
-- If FILE_FORMAT = ( TYPE = PARQUET ... )
     COMPRESSION = AUTO | SNAPPY | NONE

Required Parameters

table_name

String that specifies the identifier (i.e. name) for the table; must be unique for the schema in which the table is created.

In addition, the identifier must start with an alphabetic character and cannot contain spaces or special characters unless the entire identifier string is enclosed in double quotes (e.g. "My object"). Identifiers enclosed in double quotes are also case-sensitive.

For more details, see Identifier Requirements.

[ WITH ] LOCATION =

Specifies the external stage and optional path where the files containing data to be read are staged:

@[namespace.]ext_stage_name[/path]

Files are in the specified named external stage.

Where:

  • namespace is the database and/or schema in which the external stage resides, in the form of database_name.schema_name or schema_name. It is optional if a database and schema are currently in use within the user session; otherwise, it is required.

  • path is an optional case-sensitive path for files in the cloud storage location (i.e. files have names that begin with a common string) that limits the set of files to load. Paths are alternatively called prefixes or folders by different cloud storage services.

    The external table appends this path to any path specified in the stage definition. To view the stage definition, execute DESC STAGE stage_name and check the url property value. For example, if the stage URL includes path a and the external table location includes path b, then the external table reads files staged in stage/a/b.

    Note that the [ WITH ] LOCATION value cannot reference specific filenames. To point an external table to individual staged files, use the PATTERN parameter.

FILE_FORMAT = ( FORMAT_NAME = 'file_format_name' ) or . FILE_FORMAT = ( TYPE = CSV | JSON | AVRO | ORC | PARQUET [ ... ] )

String (constant) that specifies the file format:

FORMAT_NAME = file_format_name

Specifies an existing named file format that describes the staged data files to scan. The named file format determines the format type (CSV, JSON, etc.), as well as any other format options, for data files.

TYPE = CSV | JSON | AVRO | ORC | PARQUET [ ... ]

Specifies the format type of the staged data files to scan when querying the external table.

If a file format type is specified, additional format-specific options can be specified. For more details, see Format Type Options (in this topic).

The file format options can be configured at either the external table or stage level. Any settings specified at the external table level take precedence. Any settings not specified at either level assume the default values.

Default: TYPE = CSV.

Important

The external table does not inherit the file format, if any, in the stage definition. You must explicitly specify any file format options for the external table using the FILE_FORMAT parameter.

Note

FORMAT_NAME and TYPE are mutually exclusive; to avoid unintended behavior, you should only specify one or the other when creating an external table.

Optional Parameters

col_name

String that specifies the column identifier (i.e. name). All the requirements for table identifiers also apply to column identifiers.

For more details, see Identifier Requirements.

col_type

String (constant) that specifies the data type for the column. The data type must match the result of expr for the column.

For details about the data types that can be specified for table columns, see Data Types.

expr

String that specifies the expression for the column. When queried, the column returns results derived from this expression.

External table columns are virtual columns, which are defined using an explicit expression. Add virtual columns as expressions using the VALUE column and/or the METADATA$FILENAME pseudocolumn:

VALUE

A VARIANT type column that represents a single row in the external file.

CSV

The VALUE column structures each row as an object with elements identified by column position (i.e. {c1: <column_1_value>, c2: <column_2_value>, c3: <column_1_value> ...}).

For example, add a VARCHAR column named mycol that references the first column in the staged CSV files:

mycol varchar as (value:c1::varchar)
Semi-structured data

Enclose element names and values in double-quotes. Traverse the path in the VALUE column using dot notation.

For example, suppose the following represents a single row of semi-structured data in a staged file:

{ "a":"1", "b": { "c":"2", "d":"3" } }

Add a VARCHAR column named mycol that references the nested repeating c element in the staged file:

mycol varchar as (value:"b"."c"::varchar)
Apache Parquet

Either reference the element names using the semi-structured data notation or reference the Parquet field ID for a specific column.

For example, using the field ID, add a VARCHAR column named mycol that references the second column in the staged Parquet files:

mycol VARCHAR as (value:"2"::VARCHAR)
METADATA$FILENAME

A pseudocolumn that identifies the name of each staged data file included in the external table, including its path in the stage. For an example, see Partitions Added Automatically From Partition Column Expressions (in this topic).

CONSTRAINT ...

String that defines an inline or out-of-line constraint for the specified column(s) in the table.

For syntax details, see CREATE | ALTER TABLE … CONSTRAINT. For more information about constraints, see Constraints.

REFRESH_ON_CREATE = TRUE | FALSE

Specifies whether to automatically refresh the external table metadata once, immediately after the external table is created. Refreshing the external table metadata synchronizes the metadata with the current list of data files in the specified stage path. This action is required for the metadata to register any existing data files in the named external stage specified in the [ WITH ] LOCATION = setting.

TRUE

Snowflake automatically refreshes the external table metadata once after creation.

Note

If the specified location contains close to 1 million files or more, we recommend that you set REFRESH_ON_CREATE = FALSE. After creating the external table, refresh the metadata incrementally by executing ALTER EXTERNAL TABLE … REFRESH statements that specify subpaths in the location (i.e. subsets of files to include in the refresh) until the metadata includes all of the files in the location.

FALSE

Snowflake does not automatically refresh the external table metadata. To register any existing data files in the stage, you must manually refresh the external table metadata once using ALTER EXTERNAL TABLE … REFRESH.

Default: TRUE

AUTO_REFRESH = TRUE | FALSE

Specifies whether Snowflake should enable triggering automatic refreshes of the external table metadata when new or updated data files are available in the named external stage specified in the [ WITH ] LOCATION = setting.

Note

TRUE

Snowflake enables triggering automatic refreshes of the external table metadata.

FALSE

Snowflake does not enable triggering automatic refreshes of the external table metadata. You must manually refresh the external table metadata periodically using ALTER EXTERNAL TABLE … REFRESH to synchronize the metadata with the current list of files in the stage path.

Default: TRUE

PATTERN = 'regex_pattern'

A regular expression pattern string, enclosed in single quotes, specifying the filenames and/or paths on the external stage to match.

Tip

For the best performance, try to avoid applying patterns that filter on a large number of files.

AWS_SNS_TOPIC = 'string'

Required only when configuring AUTO_REFRESH for Amazon S3 stages using Amazon Simple Notification Service (SNS). Specifies the Amazon Resource Name (ARN) for the SNS topic for your S3 bucket. The CREATE EXTERNAL TABLE statement subscribes the Amazon Simple Queue Service (SQS) queue to the specified SNS topic. Event notifications via the SNS topic trigger metadata refreshes. For more information, see Refreshing External Tables Automatically for Amazon S3.

TABLE_FORMAT = DELTA

Identifies the external table as referencing a Delta Lake on the cloud storage location. A Delta Lake on Amazon S3, Google Cloud Storage, or Microsoft Azure cloud storage is supported.

When this parameter is set, the external table scans for Delta Lake transaction log files in the [ WITH ] LOCATION location. Delta log files have names like _delta_log/00000000000000000000.json, _delta_log/00000000000000000010.checkpoint.parquet, etc.

When the metadata for an external table is refreshed, Snowflake parses the Delta Lake transaction logs and determines which Parquet files are current. In the background, the refresh performs add and remove file operations to keep the external table metadata in sync.

Note

  • The ordering of event notifications triggered by DDL operations in cloud storage is not guaranteed. Therefore, the ability to automatically refresh is not available for external tables that reference Delta Lake files. Both REFRESH_ON_CREATE and AUTO_REFRESH must be set to FALSE.

    Periodically execute an ALTER EXTERNAL TABLE … REFRESH statement to register any added or removed files.

  • The FILE_FORMAT value must specify Parquet as the file type.

  • For optimal performance, we recommend defining partition columns for the external table.

COPY GRANTS

Specifies to retain the access permissions from the original table when an external table is recreated using the CREATE OR REPLACE TABLE variant. The parameter copies all permissions, except OWNERSHIP, from the existing table to the new table. By default, the role that executes the CREATE EXTERNAL TABLE command owns the new external table.

Note:

The operation to copy grants occurs atomically in the CREATE EXTERNAL TABLE command (i.e. within the same transaction).

ROW ACCESS POLICY <policy_name> ON (VALUE)

Specifies the row access policy to set on the table.

Specify the VALUE column when applying a row access policy to an external table.

TAG ( tag_name = 'tag_value' [ , tag_name = 'tag_value' , ... ] )

Specifies the tag name (i.e. the key) and the tag value.

The tag value is always a string, and the maximum number of characters for the tag value is 256. The maximum number of unique tag keys that can be set on an object is 20.

Note

For a table or view and its columns, the total number of unique tag keys that can be set is 20.

For example, if a single column in a table has 10 unique tag keys set on the column, Snowflake allows 10 additional unique tag keys to be set on either that column, other columns in the table, the table itself, or some combination of the table and its columns. Once the limit of 20 unique tag keys is met, no additional tag keys can be set on the table or its columns.

COMMENT = 'string_literal'

String (literal) that specifies a comment for the external table.

Default: No value

Partitioning Parameters

Use these parameters to partition your external table.

part_col_name col_type AS part_expr

Defines one or more partition columns in the external table.

The format of a partition column definition differs depending on whether partitions are computed and added automatically from an expression in each partition column or the partitions are added manually.

Added from an expression

A partition column must evaluate as an expression that parses the path and/or filename information in the METADATA$FILENAME pseudocolumn. Partition columns optimize query performance by pruning out the data files that do not need to be scanned (i.e. partitioning the external table). A partition consists of all data files that match the path and/or filename in the expression for the partition column.

part_col_name

String that specifies the partition column identifier (i.e. name). All the requirements for table identifiers also apply to column identifiers.

col_type

String (constant) that specifies the data type for the column. The data type must match the result of part_expr for the column.

part_expr

String that specifies the expression for the column. The expression must include the METADATA$FILENAME pseudocolumn.

External tables currently support the following subset of functions in partition expressions:

List of supported functions:

Added manually

Required: Also set the PARTITION_TYPE parameter value to USER_SPECIFIED.

A partition column definition is an expression that parses the column metadata in the internal (hidden) METADATA$EXTERNAL_TABLE_PARTITION column. Essentially, the definition only defines the data type for the column. The format of the partition column definition is as follows:

part_col_name col_type AS ( PARSE_JSON (METADATA$EXTERNALTABLE_PARTITION:part_col_name::data_type )

For example, suppose columns col1, col2, and col3 contain varchar, number, and timestamp (time zone) data, respectively:

col1 varchar as (parse_json(metadata$external_table_partition):col1::varchar),
col2 number as (parse_json(metadata$external_table_partition):col2::number),
col3 timestamp_tz as (parse_json(metadata$external_table_partition):col3::timestamp_tz)

After defining any partition columns for the table, identify these columns using the PARTITION BY clause.

PARTITION_TYPE = USER_SPECIFIED

Defines the partition type for the external table as user-defined. The owner of the external table (i.e. the role that has the OWNERSHIP privilege on the external table) must add partitions to the external metadata manually by executing ALTER EXTERNAL TABLE … ADD PARTITION statements.

Do not set this parameter if partitions are added to the external table metadata automatically upon evaluation of expressions in the partition columns.

[ PARTITION BY ( part_col_name [, part_col_name ... ] ) ]

Specifies any partition columns to evaluate for the external table.

Usage

When querying an external table, include one or more partition columns in a WHERE clause, e.g.:

... WHERE part_col_name = 'filter_value'

Snowflake filters on the partition columns to restrict the set of data files to scan. Note that all rows in these files are scanned. If a WHERE clause includes non-partition columns, those filters are evaluated after the data files have been filtered.

A common practice is to partition the data files based on increments of time; or, if the data files are staged from multiple sources, to partition by a data source identifier and date or timestamp.

Cloud Provider Parameters (cloudProviderParams)

Google Cloud Storage

INTEGRATION = integration_name

Specifies the name of the notification integration used to automatically refresh the external table metadata using Google Pub/Sub event notifications. A notification integration is a Snowflake object that provides an interface between Snowflake and third-party cloud message queuing services.

This parameter is required to enable auto-refresh operations for the external table. For instructions on configuring the auto-refresh capability, see Refreshing External Tables Automatically for Google Cloud Storage.

Microsoft Azure

INTEGRATION = integration_name

Specifies the name of the notification integration used to automatically refresh the external table metadata using Azure Event Grid notifications. A notification integration is a Snowflake object that provides an interface between Snowflake and third-party cloud message queuing services.

This parameter is required to enable auto-refresh operations for the external table. For instructions on configuring the auto-refresh capability, see Refreshing External Tables Automatically for Azure Blob Storage.

Format Type Options (formatTypeOptions)

Format type options are used for loading data into and unloading data out of tables.

Depending on the file format type specified (FILE_FORMAT = ( TYPE = ... )), you can include one or more of the following format-specific options (separated by blank spaces, commas, or new lines):

TYPE = CSV

COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE

String (constant) that specifies the current compression algorithm for the data files to be loaded. Snowflake uses this option to detect how already-compressed data files were compressed so that the compressed data in the files can be extracted for loading.

Supported Values

Notes

AUTO

Compression algorithm detected automatically, except for Brotli-compressed files, which cannot currently be detected automatically. If loading Brotli-compressed files, explicitly use BROTLI instead of AUTO.

GZIP

BZ2

BROTLI

Must be specified when loading Brotli-compressed files.

ZSTD

Zstandard v0.8 (and higher) supported.

DEFLATE

Deflate-compressed files (with zlib header, RFC1950).

RAW_DEFLATE

Raw Deflate-compressed files (without header, RFC1951).

NONE

Data files to load have not been compressed.

RECORD_DELIMITER = 'character' | NONE

One or more characters that separate records in an input file. Accepts common escape sequences or the following singlebyte or multibyte characters:

Singlebyte characters

Octal values (prefixed by \\) or hex values (prefixed by 0x or \x). For example, for records delimited by the circumflex accent (^) character, specify the octal (\\5e) or hex (0x5e) value.

Multibyte characters

Hex values (prefixed by \x). For example, for records delimited by the cent (¢) character, specify the hex (\xC2\xA2) value.

The delimiter for RECORD_DELIMITER or FIELD_DELIMITER cannot be a substring of the delimiter for the other file format option (e.g. FIELD_DELIMITER = 'aa' RECORD_DELIMITER = 'aabb').

The specified delimiter must be a valid UTF-8 character and not a random sequence of bytes. Also note that the delimiter is limited to a maximum of 20 characters.

Also accepts a value of NONE.

Default: New line character. Note that “new line” is logical such that \r\n is understood as a new line for files on a Windows platform.

FIELD_DELIMITER = 'character' | NONE

One or more singlebyte or multibyte characters that separate fields in an input file. Accepts common escape sequences or the following singlebyte or multibyte characters:

Singlebyte characters

Octal values (prefixed by \\) or hex values (prefixed by 0x or \x). For example, for records delimited by the circumflex accent (^) character, specify the octal (\\5e) or hex (0x5e) value.

Multibyte characters

Hex values (prefixed by \x). For example, for records delimited by the cent (¢) character, specify the hex (\xC2\xA2) value.

The delimiter for RECORD_DELIMITER or FIELD_DELIMITER cannot be a substring of the delimiter for the other file format option (e.g. FIELD_DELIMITER = 'aa' RECORD_DELIMITER = 'aabb').

The specified delimiter must be a valid UTF-8 character and not a random sequence of bytes. Also note that the delimiter is limited to a maximum of 20 characters.

Also accepts a value of NONE.

Default: comma (,)

SKIP_HEADER = integer

Number of lines at the start of the file to skip.

Note that SKIP_HEADER does not use the RECORD_DELIMITER or FIELD_DELIMITER values to determine what a header line is; rather, it simply skips the specified number of CRLF (Carriage Return, Line Feed)-delimited lines in the file. RECORD_DELIMITER and FIELD_DELIMITER are then used to determine the rows of data to load.

Default: 0

SKIP_BLANK_LINES = TRUE | FALSE
Use

Data loading only

Definition

Boolean that specifies to skip any blank lines encountered in the data files; otherwise, blank lines produce an end-of-record error (default behavior).

Default: FALSE

TYPE = JSON

COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE

String (constant) that specifies the current compression algorithm for the data files to be loaded. Snowflake uses this option to detect how already-compressed data files were compressed so that the compressed data in the files can be extracted for loading.

Supported Values

Notes

AUTO

Compression algorithm detected automatically, except for Brotli-compressed files, which cannot currently be detected automatically. If loading Brotli-compressed files, explicitly use BROTLI instead of AUTO.

GZIP

BZ2

BROTLI

ZSTD

DEFLATE

Deflate-compressed files (with zlib header, RFC1950).

RAW_DEFLATE

Raw Deflate-compressed files (without header, RFC1951).

NONE

Indicates the files for loading data have not been compressed.

Default: AUTO

TYPE = AVRO

COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE

String (constant) that specifies the current compression algorithm for the data files to be loaded. Snowflake uses this option to detect how already-compressed data files were compressed so that the compressed data in the files can be extracted for loading.

Supported Values

Notes

AUTO

Compression algorithm detected automatically, except for Brotli-compressed files, which cannot currently be detected automatically. If loading Brotli-compressed files, explicitly use BROTLI instead of AUTO.

GZIP

BZ2

BROTLI

ZSTD

DEFLATE

Deflate-compressed files (with zlib header, RFC1950).

RAW_DEFLATE

Raw Deflate-compressed files (without header, RFC1951).

NONE

Data files to load have not been compressed.

Default: AUTO

TYPE = ORC

TRIM_SPACE = TRUE | FALSE

Boolean that specifies whether to remove leading and trailing white space from strings.

For example, if your external database software encloses fields in quotes, but inserts a leading space, Snowflake reads the leading space rather than the opening quotation character as the beginning of the field (i.e. the quotation marks are interpreted as part of the string of field data). Set this option to TRUE to remove undesirable spaces during the data load.

This file format option is applied to the following actions only:

  • Querying object values in staged ORC data files.

  • Loading ORC data into separate columns using the MATCH_BY_COLUMN_NAME copy option.

  • Loading ORC data into separate columns by specifying a query in the COPY statement (i.e. COPY transformation).

Default: FALSE

NULL_IF = ( 'string1' [ , 'string2' , ... ] )

String used to convert to and from SQL NULL. Snowflake replaces these strings in the data load source with SQL NULL. To specify more than one string, enclose the list of strings in parentheses and use commas to separate each value.

Note that Snowflake converts all instances of the value to NULL, regardless of the data type. For example, if 2 is specified as a value, all instances of 2 as either a string or number are converted.

For example:

NULL_IF = ('\\N', 'NULL', 'NUL', '')

Note that this option can include empty strings.

This file format option is applied to the following actions only:

  • Querying object values in staged ORC data files.

  • Loading ORC data into separate columns using the MATCH_BY_COLUMN_NAME copy option.

  • Loading ORC data into separate columns by specifying a query in the COPY statement (i.e. COPY transformation).

Default: \\N (i.e. NULL, which assumes the ESCAPE_UNENCLOSED_FIELD value is \\)

TYPE = PARQUET

COMPRESSION = AUTO | SNAPPY | NONE

String (constant) that specifies the current compression algorithm for columns in the Parquet files.

Supported Values

Notes

AUTO

Compression algorithm detected automatically. Supports the following compression algorithms: Brotli, gzip, Lempel–Ziv–Oberhumer (LZO), LZ4, Snappy, or Zstandard v0.8 (and higher).

SNAPPY

NONE

Data files to load have not been compressed.

Default: AUTO

Access Control Requirements

A role used to execute this SQL command must have the following privileges at a minimum:

Privilege

Object

Notes

CREATE EXTERNAL TABLE

Schema

CREATE STAGE

Schema

Required if creating a new stage.

USAGE

Stage

Required if referencing an existing stage.

Note that operating on any object in a schema also requires the USAGE privilege on the parent database and schema.

For instructions on creating a custom role with a specified set of privileges, see Creating Custom Roles.

For general information about roles and privilege grants for performing SQL actions on securable objects, see Access Control in Snowflake.

Usage Notes

  • External tables support external (i.e. S3, Azure, or GCS) stages only; internal (i.e. Snowflake) stages are not supported.

    You cannot access data held in archival cloud storage classes that requires restoration before it can be retrieved. These archival storage classes include, for example, the Amazon S3 Glacier Flexible Retrieval or Glacier Deep Archive storage class, or Microsoft Azure Archive Storage.

  • Snowflake does not enforce integrity constraints on external tables. In particular, unlike normal tables, Snowflake does not enforce NOT NULL constraints.

  • External tables include the following metadata column:

    • METADATA$FILENAME: Name of each staged data file included in the external table. Includes the path to the data file in the stage.

    • METADATA$FILE_ROW_NUMBER: Row number for each record in the staged data file.

  • The following are not supported for external tables:

    • Clustering keys

    • Cloning

    • Data in XML format

  • Time Travel is not supported for external tables.

  • Policies:

    • You cannot add a masking policy to an external table column while creating the external table because a masking policy cannot be attached to a virtual column.

    • You can add a row access policy to an external table while creating the external table.

  • Using OR REPLACE is the equivalent of using DROP EXTERNAL TABLE on the existing external table and then creating a new external table with the same name. The drop and create actions occur in a single atomic operation. This means that any queries concurrent with the CREATE OR REPLACE EXTERNAL TABLE operation use either the old or new external table version.

  • Regarding metadata:

    Attention

    Customers should ensure that no personal data (other than for a User object), sensitive data, export-controlled data, or other regulated data is entered as metadata when using the Snowflake service. For more information, see Metadata Fields in Snowflake.

  • When creating an external table with a row access policy added to the external table, use the POLICY_CONTEXT function to simulate a query on the external table protected by a row access policy.

Examples

Partitions Added Automatically From Partition Column Expressions

Create an external table with partitions computed from expressions in the partition column definitions.

In the following example, the data files are organized in cloud storage with the following structure: logs/YYYY/MM/DD/HH24. For example:

  • logs/2018/08/05/0524/

  • logs/2018/08/27/1408/

  1. Create an external stage named s1 for the storage location where the data files are stored. For more information, see CREATE STAGE.

    The stage definition includes the path /files/logs/:

    Amazon S3

    CREATE STAGE s1
      URL='s3://mybucket/files/logs/'
      ...
      ;
    

    Google Cloud Storage

    CREATE STAGE s1
      URL='gcs://mybucket/files/logs/'
      ...
      ;
    

    Microsoft Azure

    CREATE STAGE s1
      URL='azure://mycontainer/files/logs/'
      ...
      ;
    
  2. Query the METADATA$FILENAME pseudocolumn in the staged data. Use the results to develop your partition column(s):

    SELECT metadata$filename FROM @s1/;
    
    +----------------------------------------+
    | METADATA$FILENAME                      |
    |----------------------------------------|
    | files/logs/2018/08/05/0524/log.parquet |
    | files/logs/2018/08/27/1408/log.parquet |
    +----------------------------------------+
    
  3. Create the partitioned external table.

    The partition column date_part casts YYYY/MM/DD in the METADATA$FILENAME pseudocolumn as a date using TO_DATE , DATE. The SQL command also specifies Parquet as the file format type.

    The external tables for Amazon S3 and Microsoft Azure cloud storage include the parameter required to refresh the metadata automatically when triggered by event notifications from the respective cloud messaging service:

    Amazon S3

    CREATE EXTERNAL TABLE et1(
     date_part date AS TO_DATE(SPLIT_PART(metadata$filename, '/', 3)
       || '/' || SPLIT_PART(metadata$filename, '/', 4)
       || '/' || SPLIT_PART(metadata$filename, '/', 5), 'YYYY/MM/DD'),
     timestamp bigint AS (value:timestamp::bigint),
     col2 varchar AS (value:col2::varchar))
     PARTITION BY (date_part)
     LOCATION=@s1/logs/
     AUTO_REFRESH = true
     FILE_FORMAT = (TYPE = PARQUET)
     AWS_SNS_TOPIC = 'arn:aws:sns:us-west-2:001234567890:s3_mybucket';
    

    Google Cloud Storage

    CREATE EXTERNAL TABLE et1(
      date_part date AS TO_DATE(SPLIT_PART(metadata$filename, '/', 3)
        || '/' || SPLIT_PART(metadata$filename, '/', 4)
        || '/' || SPLIT_PART(metadata$filename, '/', 5), 'YYYY/MM/DD'),
      timestamp bigint AS (value:timestamp::bigint),
      col2 varchar AS (value:col2::varchar))
      PARTITION BY (date_part)
      LOCATION=@s1/logs/
      AUTO_REFRESH = true
      FILE_FORMAT = (TYPE = PARQUET);
    

    Microsoft Azure

    CREATE EXTERNAL TABLE et1(
      date_part date AS TO_DATE(SPLIT_PART(metadata$filename, '/', 3)
        || '/' || SPLIT_PART(metadata$filename, '/', 4)
        || '/' || SPLIT_PART(metadata$filename, '/', 5), 'YYYY/MM/DD'),
      timestamp bigint AS (value:timestamp::bigint),
      col2 varchar AS (value:col2::varchar))
      PARTITION BY (date_part)
      INTEGRATION = 'MY_INT'
      LOCATION=@s1/logs/
      AUTO_REFRESH = true
      FILE_FORMAT = (TYPE = PARQUET);
    
  4. Refresh the external table metadata:

    ALTER EXTERNAL TABLE et1 REFRESH;
    

When querying the external table, filter the data by the partition column using a WHERE clause. Snowflake only scans the files in the specified partitions that match the filter conditions:

SELECT timestamp, col2 FROM et1 WHERE date_part = to_date('08/05/2018');

Partitions Added Manually

Create an external table with user-defined partitions (i.e. the partitions are added manually by the external table owner).

  1. Create an external stage named s2 for the storage location where the data files are stored:

    The stage definition includes the path /files/logs/:

    Amazon S3

    CREATE STAGE s2
      URL='s3://mybucket/files/logs/'
      ...
      ;
    

    Google Cloud Storage

    CREATE STAGE s2
      URL='gcs://mybucket/files/logs/'
      ...
      ;
    

    Microsoft Azure

    CREATE STAGE s2
      URL='azure://mycontainer/files/logs/'
      ...
      ;
    
  2. Create the partitioned external table. The external table includes three partition columns with different data types.

    Note that the syntax for each of the three cloud storage services (Amazon S3, Google Cloud Storage, and Microsoft Azure) is identical because the external table metadata is not refreshed:

    CREATE EXTERNAL TABLE et2(
      col1 DATE AS (PARSE_JSON(METADATA$EXTERNAL_TABLE_PARTITION):col1::DATE),
      col2 VARCHAR AS (PARSE_JSON(METADATA$EXTERNAL_TABLE_PARTITION):col2::VARCHAR),
      col3 NUMBER AS (PARSE_JSON(METADATA$EXTERNAL_TABLE_PARTITION):col3::NUMBER))
      PARTITION BY (col1,col2,col3)
      LOCATION=@s2/logs/
      PARTITION_TYPE = USER_SPECIFIED
      FILE_FORMAT = (TYPE = PARQUET);
    
  3. Add partitions for the partition columns:

    ALTER EXTERNAL TABLE et2 ADD PARTITION(col1='2022-01-24', col2='a', col3='12') LOCATION '2022/01';
    

    Snowflake adds the partitions to the metadata for the external table. The operation also adds any new data files in the specified location to the metadata:

    +---------------------------------------+----------------+-------------------------------+
    |                       file            |     status     |          description          |
    +---------------------------------------+----------------+-------------------------------+
    | mycontainer/files/logs/2022/01/24.csv | REGISTERED_NEW | File registered successfully. |
    | mycontainer/files/logs/2022/01/25.csv | REGISTERED_NEW | File registered successfully. |
    +---------------------------------------+----------------+-------------------------------+
    

When querying the external table, filter the data by the partition columns using a WHERE clause. This example returns the records in the order they are stored in the staged data files:

SELECT col1, col2, col3 FROM et1 WHERE col1 = TO_DATE('2022-01-24') AND col2 = 'a' ORDER BY METADATA$FILE_ROW_NUMBER;

External Table that References Files in a Delta Lake

Create a partitioned external table named ext_twitter_feed that references the Delta Lake files in Parquet format in the mystage external stage and daily path.

The partition column date_part casts YYYY/MM/DD in the METADATA$FILENAME pseudocolumn as a date using TO_DATE , DATE:

Amazon S3

CREATE EXTERNAL TABLE ext_twitter_feed(
 date_part date AS TO_DATE(SPLIT_PART(metadata$filename, '/', 3)
   || '/' || SPLIT_PART(metadata$filename, '/', 4)
   || '/' || SPLIT_PART(metadata$filename, '/', 5), 'YYYY/MM/DD'),
 timestamp bigint AS (value:timestamp::bigint),
 col2 varchar AS (value:col2::varchar))
 PARTITION BY (date_part)
 LOCATION=@mystage/daily/
 REFRESH_ON_CREATE =  FALSE
 AUTO_REFRESH = FALSE
 FILE_FORMAT = (TYPE = PARQUET)
 TABLE_FORMAT = DELTA;

Google Cloud Storage

CREATE EXTERNAL TABLE ext_twitter_feed(
  date_part date AS TO_DATE(SPLIT_PART(metadata$filename, '/', 3)
    || '/' || SPLIT_PART(metadata$filename, '/', 4)
    || '/' || SPLIT_PART(metadata$filename, '/', 5), 'YYYY/MM/DD'),
  timestamp bigint AS (value:timestamp::bigint),
  col2 varchar AS (value:col2::varchar))
  PARTITION BY (date_part)
  LOCATION=@mystage/daily/
  REFRESH_ON_CREATE =  FALSE
  AUTO_REFRESH = FALSE
  FILE_FORMAT = (TYPE = PARQUET)
  TABLE_FORMAT = DELTA;

Microsoft Azure

CREATE EXTERNAL TABLE ext_twitter_feed(
  date_part date AS TO_DATE(SPLIT_PART(metadata$filename, '/', 3)
    || '/' || SPLIT_PART(metadata$filename, '/', 4)
    || '/' || SPLIT_PART(metadata$filename, '/', 5), 'YYYY/MM/DD'),
  timestamp bigint AS (value:timestamp::bigint),
  col2 varchar AS (value:col2::varchar))
  PARTITION BY (date_part)
  LOCATION=@mystage/daily/
  REFRESH_ON_CREATE =  FALSE
  AUTO_REFRESH = FALSE
  FILE_FORMAT = (TYPE = PARQUET)
  TABLE_FORMAT = DELTA;

Materialized View on an External Table

Create a materialized view based on a subquery of the columns in the external table created in the Partitions Added Automatically From Partition Column Expressions example:

CREATE MATERIALIZED VIEW et1_mv
  AS
  SELECT col2 FROM et1;

For general syntax, usage notes, and further examples for this SQL command, see CREATE MATERIALIZED VIEW.

Back to top