- Categories:
CREATE EXTERNAL TABLE¶
Creates a new external table in the current/specified schema or replaces an existing external table. When queried, an external table reads data from a set of one or more files in a specified external stage and outputs the data in a single VARIANT column.
Additional columns can be defined, with each column definition consisting of a name, data type, and optionally whether the column requires a value (NOT NULL) or has any referential integrity constraints (primary key, foreign key, etc.). See the usage notes for more information.
- See also:
ALTER EXTERNAL TABLE , DROP EXTERNAL TABLE , SHOW EXTERNAL TABLES , DESCRIBE EXTERNAL TABLE
In this Topic:
Syntax¶
-- Partitions computed from expressions
CREATE [ OR REPLACE ] EXTERNAL TABLE [IF NOT EXISTS]
<table_name>
( [ <col_name> <col_type> AS <expr> | <part_col_name> <col_type> AS <part_expr> ]
[ inlineConstraint ]
[ , <col_name> <col_type> AS <expr> | <part_col_name> <col_type> AS <part_expr> ... ]
[ , ... ] )
cloudProviderParams
[ PARTITION BY ( <part_col_name> [, <part_col_name> ... ] ) ]
[ WITH ] LOCATION = externalStage
[ REFRESH_ON_CREATE = { TRUE | FALSE } ]
[ AUTO_REFRESH = { TRUE | FALSE } ]
[ PATTERN = '<regex_pattern>' ]
FILE_FORMAT = ( { FORMAT_NAME = '<file_format_name>' | TYPE = { CSV | JSON | AVRO | ORC | PARQUET } [ formatTypeOptions ] } )
[ AWS_SNS_TOPIC = '<string>' ]
[ TABLE_FORMAT = DELTA ]
[ COPY GRANTS ]
[ [ WITH ] ROW ACCESS POLICY <policy_name> ON (VALUE) ]
[ [ WITH ] TAG ( <tag_name> = '<tag_value>' [ , <tag_name> = '<tag_value>' , ... ] ) ]
[ COMMENT = '<string_literal>' ]
-- Partitions added and removed manually
CREATE [ OR REPLACE ] EXTERNAL TABLE [IF NOT EXISTS]
<table_name>
( [ <col_name> <col_type> AS <expr> | <part_col_name> <col_type> AS <part_expr> ]
[ inlineConstraint ]
[ , <col_name> <col_type> AS <expr> | <part_col_name> <col_type> AS <part_expr> ... ]
[ , ... ] )
cloudProviderParams
[ PARTITION BY ( <part_col_name> [, <part_col_name> ... ] ) ]
[ WITH ] LOCATION = externalStage
PARTITION_TYPE = USER_SPECIFIED
FILE_FORMAT = ( { FORMAT_NAME = '<file_format_name>' | TYPE = { CSV | JSON | AVRO | ORC | PARQUET } [ formatTypeOptions ] } )
[ COPY GRANTS ]
[ [ WITH ] ROW ACCESS POLICY <policy_name> ON (VALUE) ]
[ [ WITH ] TAG ( <tag_name> = '<tag_value>' [ , <tag_name> = '<tag_value>' , ... ] ) ]
[ COMMENT = '<string_literal>' ]
Where:
inlineConstraint ::= [ NOT NULL ] [ CONSTRAINT <constraint_name> ] { UNIQUE | PRIMARY KEY | [ FOREIGN KEY ] REFERENCES <ref_table_name> [ ( <ref_col_name> [ , <ref_col_name> ] ) ] } [ <constraint_properties> ]For additional inline constraint details, see CREATE | ALTER TABLE … CONSTRAINT.
cloudProviderParams (for Google Cloud Storage) ::= [ INTEGRATION = '<integration_name>' ] cloudProviderParams (for Microsoft Azure) ::= [ INTEGRATION = '<integration_name>' ]externalStage ::= @[<namespace>.]<ext_stage_name>[/<path>]formatTypeOptions ::= -- If FILE_FORMAT = ( TYPE = CSV ... ) COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE RECORD_DELIMITER = '<character>' | NONE FIELD_DELIMITER = '<character>' | NONE SKIP_HEADER = <integer> SKIP_BLANK_LINES = TRUE | FALSE -- If FILE_FORMAT = ( TYPE = JSON ... ) COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE -- If FILE_FORMAT = ( TYPE = AVRO ... ) COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE -- If FILE_FORMAT = ( TYPE = ORC ... ) TRIM_SPACE = TRUE | FALSE NULL_IF = ( '<string>' [ , '<string>' ... ] -- If FILE_FORMAT = ( TYPE = PARQUET ... ) COMPRESSION = AUTO | SNAPPY | NONE
Required Parameters¶
table_name
String that specifies the identifier (i.e. name) for the table; must be unique for the schema in which the table is created.
In addition, the identifier must start with an alphabetic character and cannot contain spaces or special characters unless the entire identifier string is enclosed in double quotes (e.g.
"My object"
). Identifiers enclosed in double quotes are also case-sensitive.For more details, see Identifier Requirements.
[ WITH ] LOCATION =
Specifies the external stage and optional path where the files containing data to be read are staged:
@[namespace.]ext_stage_name[/path]
Files are in the specified named external stage.
Where:
namespace
is the database and/or schema in which the external stage resides, in the form ofdatabase_name.schema_name
orschema_name
. It is optional if a database and schema are currently in use within the user session; otherwise, it is required.path
is an optional case-sensitive path for files in the cloud storage location (i.e. files have names that begin with a common string) that limits the set of files to load. Paths are alternatively called prefixes or folders by different cloud storage services.The external table appends this path to any path specified in the stage definition. To view the stage definition, execute
DESC STAGE stage_name
and check theurl
property value. For example, if the stage URL includes patha
and the external table location includes pathb
, then the external table reads files staged instage/a/b
.Note that the
[ WITH ] LOCATION
value cannot reference specific filenames. To point an external table to individual staged files, use thePATTERN
parameter.
FILE_FORMAT = ( FORMAT_NAME = 'file_format_name' )
or .FILE_FORMAT = ( TYPE = CSV | JSON | AVRO | ORC | PARQUET [ ... ] )
String (constant) that specifies the file format:
FORMAT_NAME = file_format_name
Specifies an existing named file format that describes the staged data files to scan. The named file format determines the format type (CSV, JSON, etc.), as well as any other format options, for data files.
TYPE = CSV | JSON | AVRO | ORC | PARQUET [ ... ]
Specifies the format type of the staged data files to scan when querying the external table.
If a file format type is specified, additional format-specific options can be specified. For more details, see Format Type Options (in this topic).
The file format options can be configured at either the external table or stage level. Any settings specified at the external table level take precedence. Any settings not specified at either level assume the default values.
Default:
TYPE = CSV
.Important
The external table does not inherit the file format, if any, in the stage definition. You must explicitly specify any file format options for the external table using the FILE_FORMAT parameter.
Note
FORMAT_NAME
andTYPE
are mutually exclusive; to avoid unintended behavior, you should only specify one or the other when creating an external table.
Optional Parameters¶
col_name
String that specifies the column identifier (i.e. name). All the requirements for table identifiers also apply to column identifiers.
For more details, see Identifier Requirements.
col_type
String (constant) that specifies the data type for the column. The data type must match the result of
expr
for the column.For details about the data types that can be specified for table columns, see Data Types.
expr
String that specifies the expression for the column. When queried, the column returns results derived from this expression.
External table columns are virtual columns, which are defined using an explicit expression. Add virtual columns as expressions using the VALUE column and/or the METADATA$FILENAME pseudocolumn:
- VALUE
A VARIANT type column that represents a single row in the external file.
- CSV
The VALUE column structures each row as an object with elements identified by column position (i.e.
{c1: <column_1_value>, c2: <column_2_value>, c3: <column_1_value> ...}
).For example, add a VARCHAR column named
mycol
that references the first column in the staged CSV files:mycol varchar as (value:c1::varchar)
- Semi-structured data
Enclose element names and values in double-quotes. Traverse the path in the VALUE column using dot notation.
For example, suppose the following represents a single row of semi-structured data in a staged file:
{ "a":"1", "b": { "c":"2", "d":"3" } }
Add a VARCHAR column named
mycol
that references the nested repeatingc
element in the staged file:mycol varchar as (value:"b"."c"::varchar)
- Apache Parquet
Either reference the element names using the semi-structured data notation or reference the Parquet field ID for a specific column.
For example, using the field ID, add a VARCHAR column named
mycol
that references the second column in the staged Parquet files:mycol VARCHAR as (value:"2"::VARCHAR)
- METADATA$FILENAME
A pseudocolumn that identifies the name of each staged data file included in the external table, including its path in the stage. For an example, see Partitions Added Automatically From Partition Column Expressions (in this topic).
CONSTRAINT ...
String that defines an inline or out-of-line constraint for the specified column(s) in the table.
For syntax details, see CREATE | ALTER TABLE … CONSTRAINT. For more information about constraints, see Constraints.
REFRESH_ON_CREATE = TRUE | FALSE
Specifies whether to automatically refresh the external table metadata once, immediately after the external table is created. Refreshing the external table metadata synchronizes the metadata with the current list of data files in the specified stage path. This action is required for the metadata to register any existing data files in the named external stage specified in the
[ WITH ] LOCATION =
setting.TRUE
Snowflake automatically refreshes the external table metadata once after creation.
Note
If the specified location contains close to 1 million files or more, we recommend that you set
REFRESH_ON_CREATE = FALSE
. After creating the external table, refresh the metadata incrementally by executing ALTER EXTERNAL TABLE … REFRESH statements that specify subpaths in the location (i.e. subsets of files to include in the refresh) until the metadata includes all of the files in the location.FALSE
Snowflake does not automatically refresh the external table metadata. To register any existing data files in the stage, you must manually refresh the external table metadata once using ALTER EXTERNAL TABLE … REFRESH.
Default:
TRUE
AUTO_REFRESH = TRUE | FALSE
Specifies whether Snowflake should enable triggering automatic refreshes of the external table metadata when new or updated data files are available in the named external stage specified in the
[ WITH ] LOCATION =
setting.Note
You must configure an event notification for your storage location to notify Snowflake when new or updated data is available to read into the external table metadata. For more information, see the instructions for your cloud storage service:
- Google Cloud Storage
Refreshing External Tables Automatically for Google Cloud Storage
- Microsoft Azure
Refreshing External Tables Automatically for Azure Blob Storage
When an external table is created, its metadata is refreshed automatically once unless
REFRESH_ON_CREATE = FALSE
.
TRUE
Snowflake enables triggering automatic refreshes of the external table metadata.
FALSE
Snowflake does not enable triggering automatic refreshes of the external table metadata. You must manually refresh the external table metadata periodically using ALTER EXTERNAL TABLE … REFRESH to synchronize the metadata with the current list of files in the stage path.
Default:
TRUE
PATTERN = 'regex_pattern'
A regular expression pattern string, enclosed in single quotes, specifying the filenames and/or paths on the external stage to match.
Tip
For the best performance, try to avoid applying patterns that filter on a large number of files.
AWS_SNS_TOPIC = 'string'
Required only when configuring AUTO_REFRESH for Amazon S3 stages using Amazon Simple Notification Service (SNS). Specifies the Amazon Resource Name (ARN) for the SNS topic for your S3 bucket. The CREATE EXTERNAL TABLE statement subscribes the Amazon Simple Queue Service (SQS) queue to the specified SNS topic. Event notifications via the SNS topic trigger metadata refreshes. For more information, see Refreshing External Tables Automatically for Amazon S3.
TABLE_FORMAT = DELTA
Identifies the external table as referencing a Delta Lake on the cloud storage location. A Delta Lake on Amazon S3, Google Cloud Storage, or Microsoft Azure cloud storage is supported.
When this parameter is set, the external table scans for Delta Lake transaction log files in the
[ WITH ] LOCATION
location. Delta log files have names like_delta_log/00000000000000000000.json
,_delta_log/00000000000000000010.checkpoint.parquet
, etc.When the metadata for an external table is refreshed, Snowflake parses the Delta Lake transaction logs and determines which Parquet files are current. In the background, the refresh performs add and remove file operations to keep the external table metadata in sync.
Note
The ordering of event notifications triggered by DDL operations in cloud storage is not guaranteed. Therefore, the ability to automatically refresh is not available for external tables that reference Delta Lake files. Both
REFRESH_ON_CREATE
andAUTO_REFRESH
must be set to FALSE.Periodically execute an ALTER EXTERNAL TABLE … REFRESH statement to register any added or removed files.
The
FILE_FORMAT
value must specify Parquet as the file type.For optimal performance, we recommend defining partition columns for the external table.
COPY GRANTS
Specifies to retain the access permissions from the original table when an external table is recreated using the CREATE OR REPLACE TABLE variant. The parameter copies all permissions, except OWNERSHIP, from the existing table to the new table. By default, the role that executes the CREATE EXTERNAL TABLE command owns the new external table.
Note:
The operation to copy grants occurs atomically in the CREATE EXTERNAL TABLE command (i.e. within the same transaction).
ROW ACCESS POLICY <policy_name> ON (VALUE)
Specifies the row access policy to set on the table.
Specify the VALUE column when applying a row access policy to an external table.
TAG ( tag_name = 'tag_value' [ , tag_name = 'tag_value' , ... ] )
Specifies the tag name (i.e. the key) and the tag value.
The tag value is always a string, and the maximum number of characters for the tag value is 256. The maximum number of unique tag keys that can be set on an object is 20.
Note
For a table or view and its columns, the total number of unique tag keys that can be set is 20.
For example, if a single column in a table has 10 unique tag keys set on the column, Snowflake allows 10 additional unique tag keys to be set on either that column, other columns in the table, the table itself, or some combination of the table and its columns. Once the limit of 20 unique tag keys is met, no additional tag keys can be set on the table or its columns.
COMMENT = 'string_literal'
String (literal) that specifies a comment for the external table.
Default: No value
Partitioning Parameters¶
Use these parameters to partition your external table.
part_col_name col_type AS part_expr
Defines one or more partition columns in the external table.
The format of a partition column definition differs depending on whether partitions are computed and added automatically from an expression in each partition column or the partitions are added manually.
- Added from an expression
A partition column must evaluate as an expression that parses the path and/or filename information in the METADATA$FILENAME pseudocolumn. Partition columns optimize query performance by pruning out the data files that do not need to be scanned (i.e. partitioning the external table). A partition consists of all data files that match the path and/or filename in the expression for the partition column.
part_col_name
String that specifies the partition column identifier (i.e. name). All the requirements for table identifiers also apply to column identifiers.
col_type
String (constant) that specifies the data type for the column. The data type must match the result of
part_expr
for the column.part_expr
String that specifies the expression for the column. The expression must include the METADATA$FILENAME pseudocolumn.
External tables currently support the following subset of functions in partition expressions:
List of supported functions:
=
,<>
,>
,>=
,<
,<=
||
+
,-
-
(negate)*
AND
,OR
NOT
- Added manually
Required: Also set the
PARTITION_TYPE
parameter value toUSER_SPECIFIED
.A partition column definition is an expression that parses the column metadata in the internal (hidden) METADATA$EXTERNAL_TABLE_PARTITION column. Essentially, the definition only defines the data type for the column. The format of the partition column definition is as follows:
part_col_name col_type AS ( PARSE_JSON (METADATA$EXTERNALTABLE_PARTITION:part_col_name::data_type )
For example, suppose columns
col1
,col2
, andcol3
contain varchar, number, and timestamp (time zone) data, respectively:col1 varchar as (parse_json(metadata$external_table_partition):col1::varchar), col2 number as (parse_json(metadata$external_table_partition):col2::number), col3 timestamp_tz as (parse_json(metadata$external_table_partition):col3::timestamp_tz)
After defining any partition columns for the table, identify these columns using the PARTITION BY clause.
PARTITION_TYPE = USER_SPECIFIED
Defines the partition type for the external table as user-defined. The owner of the external table (i.e. the role that has the OWNERSHIP privilege on the external table) must add partitions to the external metadata manually by executing ALTER EXTERNAL TABLE … ADD PARTITION statements.
Do not set this parameter if partitions are added to the external table metadata automatically upon evaluation of expressions in the partition columns.
[ PARTITION BY ( part_col_name [, part_col_name ... ] ) ]
Specifies any partition columns to evaluate for the external table.
- Usage
When querying an external table, include one or more partition columns in a WHERE clause, e.g.:
... WHERE part_col_name = 'filter_value'
Snowflake filters on the partition columns to restrict the set of data files to scan. Note that all rows in these files are scanned. If a WHERE clause includes non-partition columns, those filters are evaluated after the data files have been filtered.
A common practice is to partition the data files based on increments of time; or, if the data files are staged from multiple sources, to partition by a data source identifier and date or timestamp.
Cloud Provider Parameters (cloudProviderParams
)¶
Google Cloud Storage
INTEGRATION = integration_name
Specifies the name of the notification integration used to automatically refresh the external table metadata using Google Pub/Sub event notifications. A notification integration is a Snowflake object that provides an interface between Snowflake and third-party cloud message queuing services.
This parameter is required to enable auto-refresh operations for the external table. For instructions on configuring the auto-refresh capability, see Refreshing External Tables Automatically for Google Cloud Storage.
Microsoft Azure
INTEGRATION = integration_name
Specifies the name of the notification integration used to automatically refresh the external table metadata using Azure Event Grid notifications. A notification integration is a Snowflake object that provides an interface between Snowflake and third-party cloud message queuing services.
This parameter is required to enable auto-refresh operations for the external table. For instructions on configuring the auto-refresh capability, see Refreshing External Tables Automatically for Azure Blob Storage.
Format Type Options (formatTypeOptions
)¶
Format type options are used for loading data into and unloading data out of tables.
Depending on the file format type specified (FILE_FORMAT = ( TYPE = ... )
), you can include one or more of the following format-specific options (separated by blank spaces, commas, or new lines):
TYPE = CSV¶
COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE
String (constant) that specifies the current compression algorithm for the data files to be loaded. Snowflake uses this option to detect how already-compressed data files were compressed so that the compressed data in the files can be extracted for loading.
Supported Values
Notes
AUTO
Compression algorithm detected automatically, except for Brotli-compressed files, which cannot currently be detected automatically. If loading Brotli-compressed files, explicitly use
BROTLI
instead ofAUTO
.GZIP
BZ2
BROTLI
Must be specified when loading Brotli-compressed files.
ZSTD
Zstandard v0.8 (and higher) supported.
DEFLATE
Deflate-compressed files (with zlib header, RFC1950).
RAW_DEFLATE
Raw Deflate-compressed files (without header, RFC1951).
NONE
Data files to load have not been compressed.
RECORD_DELIMITER = 'character' | NONE
One or more characters that separate records in an input file. Accepts common escape sequences or the following singlebyte or multibyte characters:
- Singlebyte characters
Octal values (prefixed by
\\
) or hex values (prefixed by0x
or\x
). For example, for records delimited by the circumflex accent (^
) character, specify the octal (\\5e
) or hex (0x5e
) value.- Multibyte characters
Hex values (prefixed by
\x
). For example, for records delimited by the cent (¢
) character, specify the hex (\xC2\xA2
) value.The delimiter for RECORD_DELIMITER or FIELD_DELIMITER cannot be a substring of the delimiter for the other file format option (e.g.
FIELD_DELIMITER = 'aa' RECORD_DELIMITER = 'aabb'
).
The specified delimiter must be a valid UTF-8 character and not a random sequence of bytes. Also note that the delimiter is limited to a maximum of 20 characters.
Also accepts a value of
NONE
.Default: New line character. Note that “new line” is logical such that
\r\n
is understood as a new line for files on a Windows platform.FIELD_DELIMITER = 'character' | NONE
One or more singlebyte or multibyte characters that separate fields in an input file. Accepts common escape sequences or the following singlebyte or multibyte characters:
- Singlebyte characters
Octal values (prefixed by
\\
) or hex values (prefixed by0x
or\x
). For example, for records delimited by the circumflex accent (^
) character, specify the octal (\\5e
) or hex (0x5e
) value.- Multibyte characters
Hex values (prefixed by
\x
). For example, for records delimited by the cent (¢
) character, specify the hex (\xC2\xA2
) value.The delimiter for RECORD_DELIMITER or FIELD_DELIMITER cannot be a substring of the delimiter for the other file format option (e.g.
FIELD_DELIMITER = 'aa' RECORD_DELIMITER = 'aabb'
).
The specified delimiter must be a valid UTF-8 character and not a random sequence of bytes. Also note that the delimiter is limited to a maximum of 20 characters.
Also accepts a value of
NONE
.Default: comma (
,
)SKIP_HEADER = integer
Number of lines at the start of the file to skip.
Note that SKIP_HEADER does not use the RECORD_DELIMITER or FIELD_DELIMITER values to determine what a header line is; rather, it simply skips the specified number of CRLF (Carriage Return, Line Feed)-delimited lines in the file. RECORD_DELIMITER and FIELD_DELIMITER are then used to determine the rows of data to load.
Default:
0
SKIP_BLANK_LINES = TRUE | FALSE
- Use
Data loading only
- Definition
Boolean that specifies to skip any blank lines encountered in the data files; otherwise, blank lines produce an end-of-record error (default behavior).
Default:
FALSE
TYPE = JSON¶
COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE
String (constant) that specifies the current compression algorithm for the data files to be loaded. Snowflake uses this option to detect how already-compressed data files were compressed so that the compressed data in the files can be extracted for loading.
Supported Values
Notes
AUTO
Compression algorithm detected automatically, except for Brotli-compressed files, which cannot currently be detected automatically. If loading Brotli-compressed files, explicitly use
BROTLI
instead ofAUTO
.GZIP
BZ2
BROTLI
ZSTD
DEFLATE
Deflate-compressed files (with zlib header, RFC1950).
RAW_DEFLATE
Raw Deflate-compressed files (without header, RFC1951).
NONE
Indicates the files for loading data have not been compressed.
Default:
AUTO
TYPE = AVRO¶
COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE
String (constant) that specifies the current compression algorithm for the data files to be loaded. Snowflake uses this option to detect how already-compressed data files were compressed so that the compressed data in the files can be extracted for loading.
Supported Values
Notes
AUTO
Compression algorithm detected automatically, except for Brotli-compressed files, which cannot currently be detected automatically. If loading Brotli-compressed files, explicitly use
BROTLI
instead ofAUTO
.GZIP
BZ2
BROTLI
ZSTD
DEFLATE
Deflate-compressed files (with zlib header, RFC1950).
RAW_DEFLATE
Raw Deflate-compressed files (without header, RFC1951).
NONE
Data files to load have not been compressed.
Default:
AUTO
TYPE = ORC¶
TRIM_SPACE = TRUE | FALSE
Boolean that specifies whether to remove leading and trailing white space from strings.
For example, if your external database software encloses fields in quotes, but inserts a leading space, Snowflake reads the leading space rather than the opening quotation character as the beginning of the field (i.e. the quotation marks are interpreted as part of the string of field data). Set this option to
TRUE
to remove undesirable spaces during the data load.This file format option is applied to the following actions only:
Querying object values in staged ORC data files.
Loading ORC data into separate columns using the MATCH_BY_COLUMN_NAME copy option.
Loading ORC data into separate columns by specifying a query in the COPY statement (i.e. COPY transformation).
Default:
FALSE
NULL_IF = ( 'string1' [ , 'string2' , ... ] )
String used to convert to and from SQL NULL. Snowflake replaces these strings in the data load source with SQL NULL. To specify more than one string, enclose the list of strings in parentheses and use commas to separate each value.
Note that Snowflake converts all instances of the value to NULL, regardless of the data type. For example, if
2
is specified as a value, all instances of2
as either a string or number are converted.For example:
NULL_IF = ('\\N', 'NULL', 'NUL', '')
Note that this option can include empty strings.
This file format option is applied to the following actions only:
Querying object values in staged ORC data files.
Loading ORC data into separate columns using the MATCH_BY_COLUMN_NAME copy option.
Loading ORC data into separate columns by specifying a query in the COPY statement (i.e. COPY transformation).
Default:
\\N
(i.e. NULL, which assumes theESCAPE_UNENCLOSED_FIELD
value is\\
)
TYPE = PARQUET¶
COMPRESSION = AUTO | SNAPPY | NONE
String (constant) that specifies the current compression algorithm for columns in the Parquet files.
Supported Values
Notes
AUTO
Compression algorithm detected automatically. Supports the following compression algorithms: Brotli, gzip, Lempel–Ziv–Oberhumer (LZO), LZ4, Snappy, or Zstandard v0.8 (and higher).
SNAPPY
NONE
Data files to load have not been compressed.
Default:
AUTO
Access Control Requirements¶
A role used to execute this SQL command must have the following privileges at a minimum:
Privilege |
Object |
Notes |
---|---|---|
CREATE EXTERNAL TABLE |
Schema |
|
CREATE STAGE |
Schema |
Required if creating a new stage. |
USAGE |
Stage |
Required if referencing an existing stage. |
Note that operating on any object in a schema also requires the USAGE privilege on the parent database and schema.
For instructions on creating a custom role with a specified set of privileges, see Creating Custom Roles.
For general information about roles and privilege grants for performing SQL actions on securable objects, see Access Control in Snowflake.
Usage Notes¶
External tables support external (i.e. S3, Azure, or GCS) stages only; internal (i.e. Snowflake) stages are not supported.
You cannot access data held in archival cloud storage classes that requires restoration before it can be retrieved. These archival storage classes include, for example, the Amazon S3 Glacier Flexible Retrieval or Glacier Deep Archive storage class, or Microsoft Azure Archive Storage.
Snowflake does not enforce integrity constraints on external tables. In particular, unlike normal tables, Snowflake does not enforce NOT NULL constraints.
External tables include the following metadata column:
METADATA$FILENAME: Name of each staged data file included in the external table. Includes the path to the data file in the stage.
METADATA$FILE_ROW_NUMBER: Row number for each record in the staged data file.
The following are not supported for external tables:
Clustering keys
Cloning
Data in XML format
Time Travel is not supported for external tables.
Policies:
You cannot add a masking policy to an external table column while creating the external table because a masking policy cannot be attached to a virtual column.
You can add a row access policy to an external table while creating the external table.
Using
OR REPLACE
is the equivalent of using DROP EXTERNAL TABLE on the existing external table and then creating a new external table with the same name. The drop and create actions occur in a single atomic operation. This means that any queries concurrent with the CREATE OR REPLACE EXTERNAL TABLE operation use either the old or new external table version.Regarding metadata:
Attention
Customers should ensure that no personal data (other than for a User object), sensitive data, export-controlled data, or other regulated data is entered as metadata when using the Snowflake service. For more information, see Metadata Fields in Snowflake.
When creating an external table with a row access policy added to the external table, use the POLICY_CONTEXT function to simulate a query on the external table protected by a row access policy.
Examples¶
Partitions Added Automatically From Partition Column Expressions¶
Create an external table with partitions computed from expressions in the partition column definitions.
In the following example, the data files are organized in cloud storage with the following structure: logs/YYYY/MM/DD/HH24
.
For example:
logs/2018/08/05/0524/
logs/2018/08/27/1408/
Create an external stage named
s1
for the storage location where the data files are stored. For more information, see CREATE STAGE.The stage definition includes the path
/files/logs/
:Amazon S3
CREATE STAGE s1 URL='s3://mybucket/files/logs/' ... ;
Google Cloud Storage
CREATE STAGE s1 URL='gcs://mybucket/files/logs/' ... ;
Microsoft Azure
CREATE STAGE s1 URL='azure://mycontainer/files/logs/' ... ;
Query the METADATA$FILENAME pseudocolumn in the staged data. Use the results to develop your partition column(s):
SELECT metadata$filename FROM @s1/; +----------------------------------------+ | METADATA$FILENAME | |----------------------------------------| | files/logs/2018/08/05/0524/log.parquet | | files/logs/2018/08/27/1408/log.parquet | +----------------------------------------+
Create the partitioned external table.
The partition column
date_part
castsYYYY/MM/DD
in the METADATA$FILENAME pseudocolumn as a date using TO_DATE , DATE. The SQL command also specifies Parquet as the file format type.The external tables for Amazon S3 and Microsoft Azure cloud storage include the parameter required to refresh the metadata automatically when triggered by event notifications from the respective cloud messaging service:
Amazon S3
CREATE EXTERNAL TABLE et1( date_part date AS TO_DATE(SPLIT_PART(metadata$filename, '/', 3) || '/' || SPLIT_PART(metadata$filename, '/', 4) || '/' || SPLIT_PART(metadata$filename, '/', 5), 'YYYY/MM/DD'), timestamp bigint AS (value:timestamp::bigint), col2 varchar AS (value:col2::varchar)) PARTITION BY (date_part) LOCATION=@s1/logs/ AUTO_REFRESH = true FILE_FORMAT = (TYPE = PARQUET) AWS_SNS_TOPIC = 'arn:aws:sns:us-west-2:001234567890:s3_mybucket';
Google Cloud Storage
CREATE EXTERNAL TABLE et1( date_part date AS TO_DATE(SPLIT_PART(metadata$filename, '/', 3) || '/' || SPLIT_PART(metadata$filename, '/', 4) || '/' || SPLIT_PART(metadata$filename, '/', 5), 'YYYY/MM/DD'), timestamp bigint AS (value:timestamp::bigint), col2 varchar AS (value:col2::varchar)) PARTITION BY (date_part) LOCATION=@s1/logs/ AUTO_REFRESH = true FILE_FORMAT = (TYPE = PARQUET);
Microsoft Azure
CREATE EXTERNAL TABLE et1( date_part date AS TO_DATE(SPLIT_PART(metadata$filename, '/', 3) || '/' || SPLIT_PART(metadata$filename, '/', 4) || '/' || SPLIT_PART(metadata$filename, '/', 5), 'YYYY/MM/DD'), timestamp bigint AS (value:timestamp::bigint), col2 varchar AS (value:col2::varchar)) PARTITION BY (date_part) INTEGRATION = 'MY_INT' LOCATION=@s1/logs/ AUTO_REFRESH = true FILE_FORMAT = (TYPE = PARQUET);
Refresh the external table metadata:
ALTER EXTERNAL TABLE et1 REFRESH;
When querying the external table, filter the data by the partition column using a WHERE clause. Snowflake only scans the files in the specified partitions that match the filter conditions:
SELECT timestamp, col2 FROM et1 WHERE date_part = to_date('08/05/2018');
Partitions Added Manually¶
Create an external table with user-defined partitions (i.e. the partitions are added manually by the external table owner).
Create an external stage named
s2
for the storage location where the data files are stored:The stage definition includes the path
/files/logs/
:Amazon S3
CREATE STAGE s2 URL='s3://mybucket/files/logs/' ... ;
Google Cloud Storage
CREATE STAGE s2 URL='gcs://mybucket/files/logs/' ... ;
Microsoft Azure
CREATE STAGE s2 URL='azure://mycontainer/files/logs/' ... ;
Create the partitioned external table. The external table includes three partition columns with different data types.
Note that the syntax for each of the three cloud storage services (Amazon S3, Google Cloud Storage, and Microsoft Azure) is identical because the external table metadata is not refreshed:
CREATE EXTERNAL TABLE et2( col1 DATE AS (PARSE_JSON(METADATA$EXTERNAL_TABLE_PARTITION):col1::DATE), col2 VARCHAR AS (PARSE_JSON(METADATA$EXTERNAL_TABLE_PARTITION):col2::VARCHAR), col3 NUMBER AS (PARSE_JSON(METADATA$EXTERNAL_TABLE_PARTITION):col3::NUMBER)) PARTITION BY (col1,col2,col3) LOCATION=@s2/logs/ PARTITION_TYPE = USER_SPECIFIED FILE_FORMAT = (TYPE = PARQUET);
Add partitions for the partition columns:
ALTER EXTERNAL TABLE et2 ADD PARTITION(col1='2022-01-24', col2='a', col3='12') LOCATION '2022/01';
Snowflake adds the partitions to the metadata for the external table. The operation also adds any new data files in the specified location to the metadata:
+---------------------------------------+----------------+-------------------------------+ | file | status | description | +---------------------------------------+----------------+-------------------------------+ | mycontainer/files/logs/2022/01/24.csv | REGISTERED_NEW | File registered successfully. | | mycontainer/files/logs/2022/01/25.csv | REGISTERED_NEW | File registered successfully. | +---------------------------------------+----------------+-------------------------------+
When querying the external table, filter the data by the partition columns using a WHERE clause. This example returns the records in the order they are stored in the staged data files:
SELECT col1, col2, col3 FROM et1 WHERE col1 = TO_DATE('2022-01-24') AND col2 = 'a' ORDER BY METADATA$FILE_ROW_NUMBER;
External Table that References Files in a Delta Lake¶
Create a partitioned external table named ext_twitter_feed
that references the Delta Lake files in Parquet format in the mystage
external stage and daily
path.
The partition column date_part
casts YYYY/MM/DD
in the METADATA$FILENAME pseudocolumn as a date using
TO_DATE , DATE:
Amazon S3
CREATE EXTERNAL TABLE ext_twitter_feed( date_part date AS TO_DATE(SPLIT_PART(metadata$filename, '/', 3) || '/' || SPLIT_PART(metadata$filename, '/', 4) || '/' || SPLIT_PART(metadata$filename, '/', 5), 'YYYY/MM/DD'), timestamp bigint AS (value:timestamp::bigint), col2 varchar AS (value:col2::varchar)) PARTITION BY (date_part) LOCATION=@mystage/daily/ REFRESH_ON_CREATE = FALSE AUTO_REFRESH = FALSE FILE_FORMAT = (TYPE = PARQUET) TABLE_FORMAT = DELTA;
Google Cloud Storage
CREATE EXTERNAL TABLE ext_twitter_feed( date_part date AS TO_DATE(SPLIT_PART(metadata$filename, '/', 3) || '/' || SPLIT_PART(metadata$filename, '/', 4) || '/' || SPLIT_PART(metadata$filename, '/', 5), 'YYYY/MM/DD'), timestamp bigint AS (value:timestamp::bigint), col2 varchar AS (value:col2::varchar)) PARTITION BY (date_part) LOCATION=@mystage/daily/ REFRESH_ON_CREATE = FALSE AUTO_REFRESH = FALSE FILE_FORMAT = (TYPE = PARQUET) TABLE_FORMAT = DELTA;
Microsoft Azure
CREATE EXTERNAL TABLE ext_twitter_feed( date_part date AS TO_DATE(SPLIT_PART(metadata$filename, '/', 3) || '/' || SPLIT_PART(metadata$filename, '/', 4) || '/' || SPLIT_PART(metadata$filename, '/', 5), 'YYYY/MM/DD'), timestamp bigint AS (value:timestamp::bigint), col2 varchar AS (value:col2::varchar)) PARTITION BY (date_part) LOCATION=@mystage/daily/ REFRESH_ON_CREATE = FALSE AUTO_REFRESH = FALSE FILE_FORMAT = (TYPE = PARQUET) TABLE_FORMAT = DELTA;
Materialized View on an External Table¶
Create a materialized view based on a subquery of the columns in the external table created in the Partitions Added Automatically From Partition Column Expressions example:
CREATE MATERIALIZED VIEW et1_mv
AS
SELECT col2 FROM et1;
For general syntax, usage notes, and further examples for this SQL command, see CREATE MATERIALIZED VIEW.