- Categories:
CREATE EXTERNAL TABLE¶
Creates a new external table in the current/specified schema or replaces an existing external table. When queried, an external table reads data from a set of one or more files in a specified external stage and outputs the data in a single VARIANT column.
Additional columns can be defined, with each column definition consisting of a name, data type, and optionally whether the column requires a value (NOT NULL) or has any referential integrity constraints (primary key, foreign key, etc.). See the usage notes for more information.
- See also:
ALTER EXTERNAL TABLE , DROP EXTERNAL TABLE , SHOW EXTERNAL TABLES
In this Topic:
Syntax¶
CREATE [ OR REPLACE ] EXTERNAL TABLE [IF NOT EXISTS]
<table_name>
( ( <col_name> <col_type> AS <expr> ) | ( <part_col_name> <col_type> AS <part_expr> )
[ inlineConstraint ]
[ , ( <col_name> <col_type> AS <expr> ) | ( <part_col_name> <col_type> AS <part_expr> ) ... ]
[ , ... ] )
cloudProviderParams
[ PARTITION BY ( <identifier> [, <identifier> ... ] ) ]
[ WITH ] LOCATION = externalStage
[ REFRESH_ON_CREATE = { TRUE | FALSE } ]
[ AUTO_REFRESH = { TRUE | FALSE } ]
[ PATTERN = '<regex_pattern>' ]
FILE_FORMAT = ( { FORMAT_NAME = '<file_format_name>' | TYPE = { CSV | JSON | AVRO | ORC | PARQUET } [ formatTypeOptions ] } )
[ COPY GRANTS ]
[ COMMENT = '<string_literal>' ]
Where:
inlineConstraintinlineConstraint ::= [ NOT NULL ] [ CONSTRAINT <constraint_name> ] { UNIQUE | PRIMARY KEY | { [ FOREIGN KEY ] REFERENCES <ref_table_name> [ ( <ref_col_name> ) } } [ <constraint_properties> ]For additional inline constraint details, see CREATE | ALTER TABLE … CONSTRAINT.
cloudProviderParams(for Microsoft Azure container)cloudProviderParams ::= [ INTEGRATION = '<integration_name>' ]
externalStageexternalStage ::= @[<namespace>.]<ext_stage_name>[/<path>]
formatTypeOptionsformatTypeOptions ::= -- If TYPE = CSV COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE RECORD_DELIMITER = '<character>' | NONE FIELD_DELIMITER = '<character>' | NONE FILE_EXTENSION = '<string>' SKIP_HEADER = <integer> SKIP_BLANK_LINES = TRUE | FALSE DATE_FORMAT = '<string>' | AUTO TIME_FORMAT = '<string>' | AUTO TIMESTAMP_FORMAT = '<string>' | AUTO BINARY_FORMAT = HEX | BASE64 | UTF8 ESCAPE = '<character>' | NONE ESCAPE_UNENCLOSED_FIELD = '<character>' | NONE TRIM_SPACE = TRUE | FALSE FIELD_OPTIONALLY_ENCLOSED_BY = '<character>' | NONE NULL_IF = ( '<string>' [ , '<string>' ... ] ) ERROR_ON_COLUMN_COUNT_MISMATCH = TRUE | FALSE REPLACE_INVALID_CHARACTERS = TRUE | FALSE VALIDATE_UTF8 = TRUE | FALSE EMPTY_FIELD_AS_NULL = TRUE | FALSE SKIP_BYTE_ORDER_MARK = TRUE | FALSE ENCODING = '<string>' | UTF8 -- If TYPE = JSON COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE FILE_EXTENSION = '<string>' ENABLE_OCTAL = TRUE | FALSE ALLOW_DUPLICATE = TRUE | FALSE STRIP_OUTER_ARRAY = TRUE | FALSE STRIP_NULL_VALUES = TRUE | FALSE IGNORE_UTF8_ERRORS = TRUE | FALSE SKIP_BYTE_ORDER_MARK = TRUE | FALSE -- If TYPE = AVRO COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE -- If TYPE = ORC -- none (no options are supported) -- If TYPE = PARQUET COMPRESSION = AUTO | LZO | SNAPPY | NONE BINARY_AS_TEXT = TRUE | FALSE SNAPPY_COMPRESSION = TRUE | FALSE -- If TYPE = XML COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE IGNORE_UTF8_ERRORS = TRUE | FALSE PRESERVE_SPACE = TRUE | FALSE STRIP_OUTER_ELEMENT = TRUE | FALSE DISABLE_SNOWFLAKE_DATA = TRUE | FALSE DISABLE_AUTO_CONVERT = TRUE | FALSE SKIP_BYTE_ORDER_MARK = TRUE | FALSE
Required Parameters¶
table_nameString that specifies the identifier (i.e. name) for the table; must be unique for the schema in which the table is created.
In addition, the identifier must start with an alphabetic character and cannot contain spaces or special characters unless the entire identifier string is enclosed in double quotes (e.g.
"My object"). Identifiers enclosed in double quotes are also case-sensitive.For more details, see Identifier Requirements.
[ WITH ] LOCATION =Specifies the external stage where the files containing data to be read are staged:
@[namespace.]ext_stage_name[/path]Files are in the specified named external stage.
Where:
namespaceis the database and/or schema in which the external stage resides, in the form ofdatabase_name.schema_nameorschema_name. It is optional if a database and schema are currently in use within the user session; otherwise, it is required.The optional
pathparameter restricts the set of files being queried to the files under the folder prefix. Ifpathis specified, but no file is explicitly named in the path, all data files in the path are queried.Note that the external table appends this path to any path specified in the stage definition. To view the stage definition, execute
DESC STAGE stage_nameand check the url property value. For example, if the stage URL includes pathaand the external table location includes pathb, then the external table reads files staged instage/a/b.
FILE_FORMAT = ( FORMAT_NAME = 'file_format_name' )or .FILE_FORMAT = ( TYPE = CSV | JSON | AVRO | ORC | PARQUET [ ... ] )String (constant) that specifies the file format:
FORMAT_NAME = file_format_nameSpecifies an existing named file format that describes the staged data files to scan. The named file format determines the format type (CSV, JSON, etc.), as well as any other format options, for data files.
TYPE = CSV | JSON | AVRO | ORC | PARQUET [ ... ]Specifies the format type of the staged data files to scan when querying the external table.
If a file format type is specified, additional format-specific options can be specified. For more details, see Format Type Options (in this topic).
The file format options can be configured at either the external table or stage level. Any settings specified at the external table level take precedence. Any settings not specified at either level assume the default values.
Default:
TYPE = CSV.Important
The external table does not inherit the file format, if any, in the stage definition. You must explicitly specify any file format options for the external table using the FILE_FORMAT parameter.
Note
FORMAT_NAMEandTYPEare mutually exclusive; to avoid unintended behavior, you should only specify one or the other when creating an external table.
Optional Parameters¶
col_nameString that specifies the column identifier (i.e. name). All the requirements for table identifiers also apply to column identifiers.
External table columns are virtual columns, which are defined using an explicit expression.
For more details, see Identifier Requirements.
col_typeString (constant) that specifies the data type for the column. The data type must match the result of
exprfor the column.For details about the data types that can be specified for table columns, see Data Types.
exprString that specifies the expression for the column. When queried, the column returns results derived from this expression.
CONSTRAINT ...String that defines an inline or out-of-line constraint for the specified column(s) in the table.
For syntax details, see CREATE | ALTER TABLE … CONSTRAINT. For more information about constraints, see Constraints.
REFRESH_ON_CREATE = TRUE | FALSESpecifies whether to automatically refresh the external table metadata once, immediately after the external table is created. Refreshing the external table metadata synchronizes the metadata with the current list of data files in the specified stage path. This action is required for the metadata to register any existing data files in the named external stage specified in the
[ WITH ] LOCATION =setting.TRUESnowflake automatically refreshes the external table metadata once after creation.
FALSESnowflake does not automatically refresh the external table metadata. To register any existing data files in the stage, you must manually refresh the external table metadata once using ALTER EXTERNAL TABLE … REFRESH.
Default:
TRUEAUTO_REFRESH = TRUE | FALSESpecifies whether Snowflake should enable triggering automatic refreshes of the external table metadata when new or updated data files are available in the named external stage specified in the
[ WITH ] LOCATION =setting.Note
You must configure an event notification for your storage location (Amazon S3 or Microsoft Azure) to notify Snowflake when new or updated data is available to read into the external table metadata. For more information, see Refreshing External Tables Automatically for Amazon S3 (S3) or Refreshing External Tables Automatically for Azure Blob Storage (Azure).
Currently, the ability to automatically refresh the metadata is not available for external tables that reference Google Cloud Storage stages.
As a workaround, we suggest following our best practices for staging your data files and periodically executing an ALTER EXTERNAL TABLE … REFRESH statement to register any missed files. For satisfactory performance, we also recommend using a selective path prefix with ALTER EXTERNAL TABLE to reduce the number of files that need to be listed and checked if they have been registered already (e.g.
bucket_name/YYYY/MM/DD/or evenbucket_name/YYYY/MM/DD/HH/depending on your volume).When an external table is created, its metadata is refreshed automatically once unless
REFRESH_ON_CREATE = FALSE.
TRUESnowflake enables triggering automatic refreshes of the external table metadata.
FALSESnowflake does not enable triggering automatic refreshes of the external table metadata. You must manually refresh the external table metadata periodically using ALTER EXTERNAL TABLE … REFRESH to synchronize the metadata with the current list of files in the stage path.
Default:
TRUE
PATTERN = 'regex_pattern'A regular expression pattern string, enclosed in single quotes, specifying the file names and/or paths on the external stage to match.
Tip
For the best performance, try to avoid applying patterns that filter on a large number of files.
Note
Currently, this parameter is only supported when the external table metadata is refreshed manually by executing an
ALTER EXTERNAL TABLE ... REFRESHstatement to register files. The parameter is not supported when the metadata is refreshed using event notifications.COPY GRANTSSpecifies to retain the access permissions from the original table when an external table is recreated using the CREATE OR REPLACE TABLE variant. The parameter copies all permissions, except OWNERSHIP, from the existing table to the new table. By default, the role that executes the CREATE EXTERNAL TABLE command owns the new external table.
Note:
The operation to copy grants occurs atomically in the CREATE EXTERNAL TABLE command (i.e. within the same transaction).
COMMENT = 'string_literal'String (literal) that specifies a comment for the external table.
Default: No value
Partitioning Parameters¶
Use these parameters to partition your external table.
part_col_name col_type AS part_exprRequired for partitioning the data in an external table
Specifies one or more partition columns in the external table.
A partition column must evaluate as an expression that parses the path and/or filename information in the METADATA$FILENAME pseudocolumn. Partition columns optimize query performance by pruning out the data files that do not need to be scanned (i.e. partitioning the external table). A partition consists of all data files that match the path and/or filename in the expression for the partition column.
part_col_nameString that specifies the partition column identifier (i.e. name). All the requirements for table identifiers also apply to column identifiers.
col_typeString (constant) that specifies the data type for the column. The data type must match the result of
part_exprfor the column.part_exprString that specifies the expression for the column. The expression must include the METADATA$FILENAME pseudocolumn.
External tables currently support the following subset of functions in partition expressions:
List of supported functions
=,<>,>,>=,<,<=||+,--(negate)*AND,ORNOT
After defining any partition columns for the table, identify these columns using the PARTITION BY clause.
[ PARTITION BY ( part_col_name [, part_col_name ... ] ) ]Specifies any partition columns to evaluate for the external table.
- Usage
When querying an external table, include one or more partition columns in a WHERE clause, e.g.:
... WHERE part_col_name = 'filter_value'Snowflake filters on the partition columns to restrict the set of data files to scan. Note that all rows in these files are scanned. If a WHERE clause includes non-partition columns, those filters are evaluated after the data files have been filtered.
A common practice is to partition the data files based on increments of time; or, if the data files are staged from multiple sources, to partition by a data source identifier and date or timestamp.
Cloud Provider Parameters (cloudProviderParams)¶
Microsoft Azure
INTEGRATION = integration_nameSpecifies the name of the notification integration used to automatically refresh the external table metadata using Azure Event Grid notifications. A notification integration is a Snowflake object that provides an interface between Snowflake and third-party cloud message queuing services.
This parameter is required to enable auto-refresh operations for the external table. For instructions on configuring the auto-refresh capability, see Refreshing External Tables Automatically for Azure Blob Storage.
Format Type Options (formatTypeOptions)¶
Format type options are used for loading data into and unloading data out of tables.
Depending on the file format type specified (FILE_FORMAT = ( TYPE = ... )), you can include one or more of the following format-specific options (separated by blank spaces, commas, or new lines):
TYPE = CSV¶
File format options
COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE- Use
Data loading and unloading
- Definition
When loading data, specifies the current compression algorithm for the data file. Snowflake uses this option to detect how an already-compressed data file was compressed so that the compressed data in the file can be extracted for loading.
When unloading data, compresses the data file using the specified compression algorithm.
- Values
Supported Values
Notes
AUTOWhen loading data, compression algorithm detected automatically, except for Brotli-compressed files, which cannot currently be detected automatically. When unloading data, files are automatically compressed using the default, which is gzip.
GZIPBZ2BROTLIMust be specified if loading/unloading Brotli-compressed files.
ZSTDZstandard v0.8 (and higher) is supported.
DEFLATEDeflate-compressed files (with zlib header, RFC1950).
RAW_DEFLATERaw Deflate-compressed files (without header, RFC1951).
NONEWhen loading data, indicates that the files have not been compressed. When unloading data, specifies that the unloaded files are not compressed.
- Default
AUTO
RECORD_DELIMITER = 'character' | NONE- Use
Data loading and unloading
- Definition
One or more singlebyte or multibyte characters that separate records in an input file (data loading) or unloaded file (data unloading).
Accepts common escape sequences, octal values (prefixed by
\\), or hex values (prefixed by0x). For example, for records delimited by the thorn (Þ) character, specify the octal (\\336) or hex (0xDE) value. Also accepts a value ofNONE.The specified delimiter must be a valid UTF-8 character and not a random sequence of bytes.
Multiple-character delimiters are also supported; however, the delimiter for RECORD_DELIMITER or FIELD_DELIMITER cannot be a substring of the delimiter for the other file format option (e.g.
FIELD_DELIMITER = 'aa' RECORD_DELIMITER = 'aabb'). The delimiter is limited to a maximum of 20 characters. Do not specify characters used for other file format options such asESCAPEorESCAPE_UNENCLOSED_FIELD.- Default
- Data loading
New line character. Note that “new line” is logical such that
\r\nwill be understood as a new line for files on a Windows platform.- Data unloading
New line character (
\n).
FIELD_DELIMITER = 'character' | NONE- Use
Data loading and unloading
- Definition
One or more singlebyte or multibyte characters that separate fields in an input file (data loading) or unloaded file (data unloading).
Accepts common escape sequences, octal values (prefixed by
\\), or hex values (prefixed by0x). For example, for fields delimited by the thorn (Þ) character, specify the octal (\\336) or hex (0xDE) value. Also accepts a value ofNONE.The specified delimiter must be a valid UTF-8 character and not a random sequence of bytes.
Multiple-character delimiters are also supported; however, the delimiter for RECORD_DELIMITER or FIELD_DELIMITER cannot be a substring of the delimiter for the other file format option (e.g.
FIELD_DELIMITER = 'aa' RECORD_DELIMITER = 'aabb'). The delimiter is limited to a maximum of 20 characters. Do not specify characters used for other file format options such asESCAPEorESCAPE_UNENCLOSED_FIELD.- Default
comma (
,)
FILE_EXTENSION = 'string' | NONE- Use
Data unloading only
- Definition
Specifies the extension for files unloaded to a stage. Accepts any extension. The user is responsible for specifying a file extension that can be read by any desired software or services.
- Default
null, meaning the file extension is determined by the format type:
.csv[compression], wherecompressionis the extension added by the compression method, ifCOMPRESSIONis set.
Note
If the
SINGLEcopy option isTRUE, then the COPY command unloads a file without a file extension by default. To specify a file extension, provide a file name and extension in theinternal_locationorexternal_locationpath (e.g.copy into @stage/data.csv).SKIP_HEADER = integer- Use
Data loading only
- Definition
Number of lines at the start of the file to skip.
Note that SKIP_HEADER does not use the RECORD_DELIMITER or FIELD_DELIMITER values to determine what a header line is; rather, it simply skips the specified number of CRLF (Carriage Return, Line Feed)-delimited lines in the file. RECORD_DELIMITER and FIELD_DELIMITER are then used to determine the rows of data to load.
- Default
0
SKIP_BLANK_LINES = TRUE | FALSE- Use
Data loading only
- Definition
Boolean that specifies to skip any blank lines encountered in the data files; otherwise, blank lines produce an end-of-record error (default behavior).
Default:
FALSEDATE_FORMAT = 'string' | AUTO- Use
Data loading and unloading
- Definition
Defines the format of date values in the data files (data loading) or table (data unloading). If a value is not specified or is
AUTO, the value for the DATE_INPUT_FORMAT (data loading) or DATE_OUTPUT_FORMAT (data unloading) parameter is used.- Default
AUTO
TIME_FORMAT = 'string' | AUTO- Use
Data loading and unloading
- Definition
Defines the format of time values in the data files (data loading) or table (data unloading). If a value is not specified or is
AUTO, the value for the TIME_INPUT_FORMAT (data loading) or TIME_OUTPUT_FORMAT (data unloading) parameter is used.- Default
AUTO
TIMESTAMP_FORMAT = string' | AUTO- Use
Data loading and unloading
- Definition
Defines the format of timestamp values in the data files (data loading) or table (data unloading). If a value is not specified or is
AUTO, the value for the TIMESTAMP_INPUT_FORMAT (data loading) or TIMESTAMP_OUTPUT_FORMAT (data unloading) parameter is used.- Default
AUTO
BINARY_FORMAT = HEX | BASE64 | UTF8- Use
Data loading and unloading
- Definition
Defines the encoding format for binary input or output. The option can be used when loading data into or unloading data from binary columns in a table.
- Default
HEX
ESCAPE = 'character' | NONE- Use
Data loading and unloading
- Definition
Single character string used as the escape character for any field values. Accepts common escape sequences, octal values, or hex values. Also accepts a value of
NONE.When loading data, specifies the escape character for enclosed fields. Specify the character used to enclose fields by setting
FIELD_OPTIONALLY_ENCLOSED_BY.When unloading data, if this option is set, it overrides the escape character set for
ESCAPE_UNENCLOSED_FIELD.
- Default
NONE
ESCAPE_UNENCLOSED_FIELD = 'character' | NONE- Use
Data loading and unloading
- Definition
Single character string used as the escape character for unenclosed field values only. Accepts common escape sequences, octal values, or hex values. Also accepts a value of
NONE.When unloading data, if
ESCAPEis set, the escape character set for that option overrides this option.- Default
backslash (
\\)
TRIM_SPACE = TRUE | FALSE- Use
Data loading only
- Definition
Boolean that specifies whether to remove white space from fields.
For example, if your external database software encloses fields in quotes, but inserts a leading space, Snowflake reads the leading space rather than the opening quotation character as the beginning of the field (i.e. the quotation marks are interpreted as part of the string of field data). Set this option to
TRUEto remove undesirable spaces during the data load.As another example, if leading or trailing spaces surround quotes that enclose strings, you can remove the surrounding spaces using this option and the quote character using the
FIELD_OPTIONALLY_ENCLOSED_BYoption. Note that any spaces within the quotes are preserved. For example, assumingFIELD_DELIMITER = '|'andFIELD_OPTIONALLY_ENCLOSED_BY = '"':|"Hello world"| /* loads as */ >Hello world< |" Hello world "| /* loads as */ > Hello world < | "Hello world" | /* loads as */ >Hello world<
(the brackets in this example are not loaded; they are used to demarcate the beginning and end of the loaded strings)
- Default
FALSE
FIELD_OPTIONALLY_ENCLOSED_BY = 'character' | NONE- Use
Data loading and unloading
- Definition
Character used to enclose strings. Value can be
NONE, single quote character ('), or double quote character ("). To use the single quote character, use the octal or hex representation (0x27) or the double single-quoted escape ('').When a field contains this character, escape it using the same character. For example, if the value is the double quote character and a field contains the string
A "B" C, escape the double quotes as follows:A ""B"" C- Default
NONE
NULL_IF = ( 'string1' [ , 'string2' , ... ] )- Use
Data loading and unloading
- Definition
String used to convert to and from SQL NULL:
When loading data, Snowflake replaces these strings in the data load source with SQL NULL. To specify more than one string, enclose the list of strings in parentheses and use commas to separate each value.
For example:
NULL_IF = ('\\N', 'NULL', 'NUL', '')Note that this option can include empty strings and only applies to columns that are nullable.
When unloading data, Snowflake converts SQL NULL values to the first value in the list.
- Default
\\N(i.e. NULL, which assumes theESCAPE_UNENCLOSED_FIELDvalue is\\
ERROR_ON_COLUMN_COUNT_MISMATCH = TRUE | FALSE- Use
Data loading only
- Definition
Boolean that specifies whether to generate a parsing error if the number of delimited columns (i.e. fields) in an input file does not match the number of columns in the corresponding table.
If set to
FALSE, an error is not generated and the load continues. If the file is successfully loaded:If the input file contains records with more fields than columns in the table, the matching fields are loaded in order of occurrence in the file and the remaining fields are not loaded.
If the input file contains records with fewer fields than columns in the table, the non-matching columns in the table are loaded with NULL values.
This option assumes all the records within the input file are the same length (i.e. a file containing records of varying length return an error regardless of the value specified for this parameter).
- Default
TRUE
Note
When transforming data during loading (i.e. using a query as the source for the COPY command), this option is ignored. There is no requirement for your data files to have the same number and ordering of columns as your target table.
REPLACE_INVALID_CHARACTERS = TRUE | FALSE- Use
Data loading only
- Definition
Boolean that specifies whether to replace invalid UTF-8 characters with the Unicode replacement character (�).
If set to
TRUE, Snowflake replaces invalid UTF-8 characters with the Unicode replacement character.If set to
FALSE, the load operation produces an error when invalid UTF-8 character encoding is detected.- Default
FALSE
VALIDATE_UTF8 = TRUE | FALSE- Use
Data loading only
- Definition
Boolean that specifies whether to validate UTF-8 character encoding in string column data.
If set to
TRUE, Snowflake validates UTF-8 character encoding in string column data. When invalid UTF-8 character encoding is detected, the COPY command produces an error.- Default
TRUE
Important
This option is provided only to ensure backward compatibility with earlier versions of Snowflake. You should not disable this option unless instructed by Snowflake Support.
EMPTY_FIELD_AS_NULL = TRUE | FALSE- Use
Data loading and unloading
- Definition
When loading data, specifies whether to insert SQL NULL for empty fields in an input file, which are represented by two successive delimiters (e.g.
,,).If set to
FALSE, Snowflake attempts to cast an empty field to the corresponding column type. An empty string is inserted into columns of type STRING. For other column types, the COPY command produces an error.When unloading data, this option is used in combination with
FIELD_OPTIONALLY_ENCLOSED_BY. WhenFIELD_OPTIONALLY_ENCLOSED_BY = NONE, settingEMPTY_FIELD_AS_NULL = FALSEspecifies to unload empty strings in tables to empty string values without quotes enclosing the field values.If set to
TRUE,FIELD_OPTIONALLY_ENCLOSED_BYmust specify a character to enclose strings.
- Default
TRUE
SKIP_BYTE_ORDER_MARK = TRUE | FALSE- Use
Data loading only
- Definition
Boolean that specifies whether to skip the BOM (byte order mark), if present in a data file. A BOM is a character code at the beginning of a data file that defines the byte order and encoding form.
If set to
FALSE, Snowflake recognizes any BOM in data files, which could result in the BOM either causing an error or being merged into the first column in the table.- Default
TRUE
ENCODING = 'string'- Use
Data loading only
- Definition
String (constant) that specifies the character set of the source data when loading data into a table.
Character Set
ENCODINGValueSupported Languages
Notes
Big5
BIG5Traditional Chinese
EUC-JP
EUCJPJapanese
EUC-KR
EUCKRKorean
GB18030
GB18030Chinese
IBM420
IBM420Arabic
IBM424
IBM424Hebrew
ISO-2022-CN
ISO2022CNSimplified Chinese
ISO-2022-JP
ISO2022JPJapanese
ISO-2022-KR
ISO2022KRKorean
ISO-8859-1
ISO88591Danish, Dutch, English, French, German, Italian, Norwegian, Portuguese, Swedish
ISO-8859-2
ISO88592Czech, Hungarian, Polish, Romanian
ISO-8859-5
ISO88595Russian
ISO-8859-6
ISO88596Arabic
ISO-8859-7
ISO88597Greek
ISO-8859-8
ISO88598Hebrew
ISO-8859-9
ISO88599Turkish
KOI8-R
KOI8RRussian
Shift_JIS
SHIFTJISJapanese
UTF-8
UTF8All languages
For loading data from delimited files (CSV, TSV, etc.), UTF-8 is the default. . . For loading data from all other supported file formats (JSON, Avro, etc.), as well as unloading data, UTF-8 is the only supported character set.
UTF-16
UTF16All languages
UTF-16BE
UTF16BEAll languages
UTF-16LE
UTF16LEAll languages
UTF-32
UTF32All languages
UTF-32BE
UTF32BEAll languages
UTF-32LE
UTF32LEAll languages
windows-1250
WINDOWS1250Czech, Hungarian, Polish, Romanian
windows-1251
WINDOWS1251Russian
windows-1252
WINDOWS1252Danish, Dutch, English, French, German, Italian, Norwegian, Portuguese, Swedish
windows-1253
WINDOWS1253Greek
windows-1254
WINDOWS1254Turkish
windows-1255
WINDOWS1255Hebrew
windows-1256
WINDOWS1256Arabic
- Default
UTF8
Note
Snowflake stores all data internally in the UTF-8 character set. The data is converted into UTF-8 before it is loaded into Snowflake. If the
VALIDATE_UTF8file format option isTRUE, Snowflake validates the UTF-8 character encoding in string column data after it is converted from its original character encoding.
TYPE = JSON¶
File format options
COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE- Use
Data loading and unloading
- Definition
When loading data, specifies the current compression algorithm for the data file. Snowflake uses this option to detect how an already-compressed data file was compressed so that the compressed data in the file can be extracted for loading.
When unloading data, compresses the data file using the specified compression algorithm.
- Values
Supported Values
Notes
AUTOWhen loading data, compression algorithm detected automatically, except for Brotli-compressed files, which cannot currently be detected automatically. When unloading data, files are automatically compressed using the default, which is gzip.
GZIPBZ2BROTLIMust be specified if loading/unloading Brotli-compressed files.
ZSTDZstandard v0.8 (and higher) is supported.
DEFLATEDeflate-compressed files (with zlib header, RFC1950).
RAW_DEFLATERaw Deflate-compressed files (without header, RFC1951).
NONEWhen loading data, indicates that the files have not been compressed. When unloading data, specifies that the unloaded files are not compressed.
- Default
AUTO
DATE_FORMAT = 'string' | AUTO- Use
Data loading only
- Definition
Applied only when loading JSON data into separate columns (i.e. using the MATCH_BY_COLUMN_NAME copy option or a COPY transformation). Defines the format of date string values in the data files. If a value is not specified or is
AUTO, the value for the DATE_INPUT_FORMAT parameter is used.This file format option is currently a Preview Feature.
- Default
AUTO
TIME_FORMAT = 'string' | AUTO- Use
Data loading only
- Definition
Applied only when loading JSON data into separate columns (i.e. using the MATCH_BY_COLUMN_NAME copy option or a COPY transformation). Defines the format of time string values in the data files. If a value is not specified or is
AUTO, the value for the TIME_INPUT_FORMAT parameter is used.This file format option is currently a Preview Feature.
- Default
AUTO
TIMESTAMP_FORMAT = string' | AUTO- Use
Data loading only
- Definition
Applied only when loading JSON data into separate columns (i.e. using the MATCH_BY_COLUMN_NAME copy option or a COPY transformation). Defines the format of timestamp string values in the data files. If a value is not specified or is
AUTO, the value for the TIMESTAMP_INPUT_FORMAT parameter is used.This file format option is currently a Preview Feature.
- Default
AUTO
BINARY_FORMAT = HEX | BASE64 | UTF8- Use
Data loading only
- Definition
Applied only when loading JSON data into separate columns (i.e. using the MATCH_BY_COLUMN_NAME copy option or a COPY transformation). Defines the encoding format for binary string values in the data files. The option can be used when loading data into binary columns in a table.
This file format option is currently a Preview Feature.
- Default
HEX
TRIM_SPACE = TRUE | FALSE- Use
Data loading only
- Definition
Applied only when loading JSON data into separate columns (i.e. using the MATCH_BY_COLUMN_NAME copy option or a COPY transformation). Boolean that specifies whether to remove leading and trailing white space from strings.
For example, if your external database software encloses fields in quotes, but inserts a leading space, Snowflake reads the leading space rather than the opening quotation character as the beginning of the field (i.e. the quotation marks are interpreted as part of the string of field data). Set this option to
TRUEto remove undesirable spaces during the data load.This file format option is currently a Preview Feature.
- Default
FALSE
NULL_IF = ( 'string1' [ , 'string2' , ... ] )- Use
Data loading only
- Definition
Applied only when loading JSON data into separate columns (i.e. using the MATCH_BY_COLUMN_NAME copy option or a COPY transformation). String used to convert to and from SQL NULL. Snowflake replaces these strings in the data load source with SQL NULL. To specify more than one string, enclose the list of strings in parentheses and use commas to separate each value.
For example:
NULL_IF = ('\\N', 'NULL', 'NUL', '')Note that this option can include empty strings and only applies to columns that are nullable.
This file format option is currently a Preview Feature.
- Default
\\N(i.e. NULL, which assumes theESCAPE_UNENCLOSED_FIELDvalue is\\
FILE_EXTENSION = 'string' | NONE- Use
Data unloading only
- Definition
Specifies the extension for files unloaded to a stage. Accepts any extension. The user is responsible for specifying a file extension that can be read by any desired software or services.
- Default
null, meaning the file extension is determined by the format type:
.json[compression], wherecompressionis the extension added by the compression method, ifCOMPRESSIONis set.
ENABLE_OCTAL = TRUE | FALSE- Use
Data loading only
- Definition
Boolean that enables parsing of octal numbers.
- Default
FALSE
ALLOW_DUPLICATE = TRUE | FALSE- Use
Data loading only
- Definition
Boolean that specifies to allow duplicate object field names (only the last one will be preserved).
- Default
FALSE
STRIP_OUTER_ARRAY = TRUE | FALSE- Use
Data loading only
- Definition
Boolean that instructs the JSON parser to remove outer brackets (i.e
[ ]).- Default
FALSE
STRIP_NULL_VALUES = TRUE | FALSE- Use
Data loading only
- Definition
Boolean that instructs the JSON parser to remove object fields or array elements containing
nullvalues. For example, when set toTRUE:Before
After
[null][][null,null,3][,,3]{"a":null,"b":null,"c":123}{"c":123}{"a":[1,null,2],"b":{"x":null,"y":88}}{"a":[1,,2],"b":{"y":88}}- Default
FALSE
IGNORE_UTF8_ERRORS = TRUE | FALSE- Use
Data loading only
- Definition
Boolean that specifies whether UTF-8 encoding errors produce error conditions. If set to
TRUE, any invalid UTF-8 sequences are silently replaced with Unicode characterU+FFFD(i.e. “replacement character”).- Default
FALSE
SKIP_BYTE_ORDER_MARK = TRUE | FALSE- Use
Data loading only
- Definition
Boolean that specifies whether to skip the BOM (byte order mark), if present in a data file. A BOM is a character code at the beginning of a data file that defines the byte order and encoding form.
If set to
FALSE, Snowflake recognizes any BOM in data files, which could result in the BOM either causing an error or being merged into the first column in the table.- Default
TRUE
TYPE = AVRO¶
File format options
COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE- Use
Data loading and unloading
- Definition
When loading data, specifies the current compression algorithm for the data file. Snowflake uses this option to detect how an already-compressed data file was compressed so that the compressed data in the file can be extracted for loading.
When unloading data, compresses the data file using the specified compression algorithm.
- Values
Supported Values
Notes
AUTOWhen loading data, compression algorithm detected automatically, except for Brotli-compressed files, which cannot currently be detected automatically. When unloading data, files are automatically compressed using the default, which is gzip.
GZIPBZ2BROTLIMust be specified if loading/unloading Brotli-compressed files.
ZSTDZstandard v0.8 (and higher) is supported.
DEFLATEDeflate-compressed files (with zlib header, RFC1950).
RAW_DEFLATERaw Deflate-compressed files (without header, RFC1951).
NONEWhen loading data, indicates that the files have not been compressed. When unloading data, specifies that the unloaded files are not compressed.
- Default
AUTO
TRIM_SPACE = TRUE | FALSE- Use
Data loading only
- Definition
Applied only when loading Avro data into separate columns (i.e. using the MATCH_BY_COLUMN_NAME copy option or a COPY transformation). Boolean that specifies whether to remove leading and trailing white space from strings.
For example, if your external database software encloses fields in quotes, but inserts a leading space, Snowflake reads the leading space rather than the opening quotation character as the beginning of the field (i.e. the quotation marks are interpreted as part of the string of field data). Set this option to
TRUEto remove undesirable spaces during the data load.This file format option is currently a Preview Feature.
- Default
FALSE
NULL_IF = ( 'string1' [ , 'string2' , ... ] )- Use
Data loading only
- Definition
Applied only when loading Avro data into separate columns (i.e. using the MATCH_BY_COLUMN_NAME copy option or a COPY transformation). String used to convert to and from SQL NULL. Snowflake replaces these strings in the data load source with SQL NULL. To specify more than one string, enclose the list of strings in parentheses and use commas to separate each value.
For example:
NULL_IF = ('\\N', 'NULL', 'NUL', '')Note that this option can include empty strings and only applies to columns that are nullable.
This file format option is currently a Preview Feature.
- Default
\\N(i.e. NULL, which assumes theESCAPE_UNENCLOSED_FIELDvalue is\\
TYPE = ORC¶
File format options
TRIM_SPACE = TRUE | FALSE- Use
Data loading only
- Definition
Applied only when loading ORC data into separate columns (i.e. using the MATCH_BY_COLUMN_NAME copy option or a COPY transformation). Boolean that specifies whether to remove leading and trailing white space from strings.
For example, if your external database software encloses fields in quotes, but inserts a leading space, Snowflake reads the leading space rather than the opening quotation character as the beginning of the field (i.e. the quotation marks are interpreted as part of the string of field data). Set this option to
TRUEto remove undesirable spaces during the data load.This file format option is currently a Preview Feature.
- Default
FALSE
NULL_IF = ( 'string1' [ , 'string2' , ... ] )- Use
Data loading only
- Definition
Applied only when loading ORC data into separate columns (i.e. using the MATCH_BY_COLUMN_NAME copy option or a COPY transformation). String used to convert to and from SQL NULL. Snowflake replaces these strings in the data load source with SQL NULL. To specify more than one string, enclose the list of strings in parentheses and use commas to separate each value.
For example:
NULL_IF = ('\\N', 'NULL', 'NUL', '')Note that this option can include empty strings and only applies to columns that are nullable.
This file format option is currently a Preview Feature.
- Default
\\N(i.e. NULL, which assumes theESCAPE_UNENCLOSED_FIELDvalue is\\
TYPE = PARQUET¶
File format options
COMPRESSION = AUTO | LZO | SNAPPY | NONE- Use
Data loading and unloading
- Definition
When loading data, specifies the current compression algorithm for the data file. Snowflake uses this option to detect how an already-compressed data file was compressed so that the compressed data in the file can be extracted for loading.
When unloading data, compresses the data file using the specified compression algorithm.
- Values
Supported Values
Notes
AUTOWhen loading data, compression algorithm detected automatically. When unloading data, unloaded files are compressed using the Snappy compression algorithm by default.
LZOSupported for data unloading only. Must be specified if unloading LZO-compressed files.
SNAPPYMay be specified if unloading Snappy-compressed files.
NONEWhen loading data, indicates that the files have not been compressed. When unloading data, specifies that the unloaded files are not compressed.
- Default
AUTO
SNAPPY_COMPRESSION = TRUE | FALSE- Use
Data unloading only
- Definition
Boolean that specifies whether unloaded file(s) are compressed using the SNAPPY algorithm.
Note
Deprecated. Use
COMPRESSION = SNAPPYinstead.- Limitations
Only supported for data unloading operations.
- Default
TRUE
BINARY_AS_TEXT = TRUE | FALSE- Use
Data loading only
- Definition
Boolean that specifies whether to interpret columns with no defined logical data type as UTF-8 text. When set to
FALSE, Snowflake interprets these columns as binary data.- Limitations
Only supported for data loading operations.
- Default
TRUE
TRIM_SPACE = TRUE | FALSE- Use
Data loading only
- Definition
Applied only when loading Parquet data into separate columns (i.e. using the MATCH_BY_COLUMN_NAME copy option or a COPY transformation). Boolean that specifies whether to remove leading and trailing white space from strings.
For example, if your external database software encloses fields in quotes, but inserts a leading space, Snowflake reads the leading space rather than the opening quotation character as the beginning of the field (i.e. the quotation marks are interpreted as part of the string of field data). Set this option to
TRUEto remove undesirable spaces during the data load.This file format option is currently a Preview Feature.
- Default
FALSE
NULL_IF = ( 'string1' [ , 'string2' , ... ] )- Use
Data loading only
- Definition
Applied only when loading Parquet data into separate columns (i.e. using the MATCH_BY_COLUMN_NAME copy option or a COPY transformation). String used to convert to and from SQL NULL. Snowflake replaces these strings in the data load source with SQL NULL. To specify more than one string, enclose the list of strings in parentheses and use commas to separate each value.
For example:
NULL_IF = ('\\N', 'NULL', 'NUL', '')Note that this option can include empty strings and only applies to columns that are nullable.
This file format option is currently a Preview Feature.
- Default
\\N(i.e. NULL, which assumes theESCAPE_UNENCLOSED_FIELDvalue is\\
TYPE = XML¶
File format options
COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE- Use
Data loading and unloading
- Definition
When loading data, specifies the current compression algorithm for the data file. Snowflake uses this option to detect how an already-compressed data file was compressed so that the compressed data in the file can be extracted for loading.
When unloading data, compresses the data file using the specified compression algorithm.
- Values
Supported Values
Notes
AUTOWhen loading data, compression algorithm detected automatically, except for Brotli-compressed files, which cannot currently be detected automatically. When unloading data, files are automatically compressed using the default, which is gzip.
GZIPBZ2BROTLIMust be specified if loading/unloading Brotli-compressed files.
ZSTDZstandard v0.8 (and higher) is supported.
DEFLATEDeflate-compressed files (with zlib header, RFC1950).
RAW_DEFLATERaw Deflate-compressed files (without header, RFC1951).
NONEWhen loading data, indicates that the files have not been compressed. When unloading data, specifies that the unloaded files are not compressed.
- Default
AUTO
IGNORE_UTF8_ERRORS = TRUE | FALSE- Use
Data loading only
- Definition
Boolean that specifies whether UTF-8 encoding errors produce error conditions. If set to
TRUE, any invalid UTF-8 sequences are silently replaced with Unicode characterU+FFFD(i.e. “replacement character”).- Default
FALSE
PRESERVE_SPACE = TRUE | FALSE- Use
Data loading only
- Definition
Boolean that specifies whether the XML parser preserves leading and trailing spaces in element content.
- Default
FALSE
STRIP_OUTER_ELEMENT = TRUE | FALSE- Use
Data loading only
- Definition
Boolean that specifies whether the XML parser strips out the outer XML element, exposing 2nd level elements as separate documents.
- Default
FALSE
DISABLE_SNOWFLAKE_DATA = TRUE | FALSE- Use
Data loading only
- Definition
Boolean that specifies whether the XML parser disables recognition of Snowflake semi-structured data tags.
- Default
FALSE
DISABLE_AUTO_CONVERT = TRUE | FALSE- Use
Data loading only
- Definition
Boolean that specifies whether the XML parser disables automatic conversion of numeric and Boolean values from text to native representation.
- Default
FALSE
SKIP_BYTE_ORDER_MARK = TRUE | FALSE- Use
Data loading only
- Definition
Boolean that specifies whether to skip any BOM (byte order mark) present in an input file. A BOM is a character code at the beginning of a data file that defines the byte order and encoding form.
If set to
FALSE, Snowflake recognizes any BOM in data files, which could result in the BOM either causing an error or being merged into the first column in the table.- Default
TRUE
TRIM_SPACE = TRUE | FALSE- Use
Data loading only
- Definition
Applied only when loading XML data into separate columns (i.e. using the MATCH_BY_COLUMN_NAME copy option or a COPY transformation). Boolean that specifies whether to remove leading and trailing white space from strings.
For example, if your external database software encloses fields in quotes, but inserts a leading space, Snowflake reads the leading space rather than the opening quotation character as the beginning of the field (i.e. the quotation marks are interpreted as part of the string of field data). Set this option to
TRUEto remove undesirable spaces during the data load.This file format option is currently a Preview Feature.
- Default
FALSE
NULL_IF = ( 'string1' [ , 'string2' , ... ] )- Use
Data loading only
- Definition
Applied only when loading XML data into separate columns (i.e. using the MATCH_BY_COLUMN_NAME copy option or a COPY transformation). String used to convert to and from SQL NULL. Snowflake replaces these strings in the data load source with SQL NULL. To specify more than one string, enclose the list of strings in parentheses and use commas to separate each value.
For example:
NULL_IF = ('\\N', 'NULL', 'NUL', '')Note that this option can include empty strings and only applies to columns that are nullable.
This file format option is currently a Preview Feature.
- Default
\\N(i.e. NULL, which assumes theESCAPE_UNENCLOSED_FIELDvalue is\\
Usage Notes¶
External tables support external (i.e. S3, Azure, or GCS) stages only; internal (i.e. Snowflake) stages are not supported.
Every external table has a column named VALUE of type VARIANT. Additional columns might be specified. All of the columns are treated as virtual columns.
The VALUE column structures rows in a CSV data file as JSON objects with elements identified by column position, e.g.
{c1: col_1_value, c2: col_2_value, c3: col_3_value ...}.
No referential integrity constants on external tables are enforced by Snowflake. This differs from the behavior for normal tables, whereby the
NOT NULLconstraint on columns is enforced.External tables include the following metadata column:
METADATA$FILENAME: Name of each staged data file included in the external table. Includes the path to the data file in the stage.
The following are not supported for external tables:
Clustering keys
Cloning
Data sharing; i.e. external tables are not included in shared databases
Data in XML format
Time Travel is not supported for external tables.
Examples¶
Simple External Table¶
Create an external stage named
mystagefor the storage location where a set of Parquet data files are stored. For more information, see CREATE STAGE.Amazon S3
Create an external stage using a private/protected S3 bucket named
mybucketwith a folder path namedfiles:CREATE OR REPLACE STAGE mystage URL='s3://mybucket/files/' .. ;
Google Cloud Storage
Create an external stage using an Google Cloud Storage container named
mybucketwith a folder path namedfiles:CREATE OR REPLACE STAGE mystage URL='gcs://mybucket/files' .. ;
Microsoft Azure
Create an external stage using an Azure storage account named
myaccountand a container namedmycontainerwith a folder path namedfiles:CREATE OR REPLACE STAGE mystage URL='azure://myaccount.blob.core.windows.net/mycontainer/files' .. ;
Create an external table named
ext_twitter_feedthat references the Parquet files in themystageexternal stage. The stage reference includes a folder path nameddaily. The external table appends this path to the stage definition, i.e. the external table references the data files in@mystage/files/daily`.The SQL command specifies Parquet as the file format type. In addition, file pattern matching is applied to include only Parquet files whose names include the string
sales:Amazon S3
CREATE OR REPLACE EXTERNAL TABLE ext_twitter_feed WITH LOCATION = @mystage/daily/ AUTO_REFRESH = true FILE_FORMAT = (TYPE = PARQUET) PATTERN='.*sales.*[.]parquet';
Google Cloud Storage
CREATE OR REPLACE EXTERNAL TABLE ext_twitter_feed WITH LOCATION = @mystage/daily/ FILE_FORMAT = (TYPE = PARQUET) PATTERN='.*sales.*[.]parquet';
Microsoft Azure
CREATE OR REPLACE EXTERNAL TABLE ext_twitter_feed INTEGRATION = 'MY_AZURE_INT' WITH LOCATION = @mystage/daily/ AUTO_REFRESH = true FILE_FORMAT = (TYPE = PARQUET) PATTERN='.*sales.*[.]parquet';
Refresh the external table metadata:
ALTER EXTERNAL TABLE ext_twitter_feed REFRESH;
Partitioned External Table¶
Create a partitioned external table that partitions data by the logical, granular details in the stage path.
In the following example, the data files are organized in cloud storage with the following structure: logs/YYYY/MM/DD/HH24, e.g.:
logs/2018/08/05/0524/logs/2018/08/27/1408/
Create an external stage named
mystagefor the storage location where the data files are stored. For more information, see CREATE STAGE.The stage definition includes the path
/files/logs/:Amazon S3
CREATE STAGE mystage URL='s3://mybucket/files/logs/' .. ;
Google Cloud Storage
CREATE STAGE mystage URL='gcs://mybucket/files/logs/' .. ;
Microsoft Azure
CREATE STAGE mystage URL='azure://mycontainer/files/logs/' .. ;
Query the METADATA$FILENAME pseudocolumn in the staged data. Use the results to develop your partition column(s):
SELECT metadata$filename FROM @mystage/; +----------------------------------------+ | METADATA$FILENAME | |----------------------------------------| | files/logs/2018/08/05/0524/log.parquet | | files/logs/2018/08/27/1408/log.parquet | +----------------------------------------+
Create the partitioned external table.
The partition column
date_partcastsYYYY/MM/DDin the METADATA$FILENAME pseudocolumn as a date using TO_DATE , DATE.The SQL command also specifies Parquet as the file format type:
Amazon S3
CREATE EXTERNAL TABLE exttable_part( date_part date as to_date(substr(metadata$filename, 12, 10), 'YYYY/MM/DD'), timestamp bigint AS (value:timestamp::bigint), col2 varchar AS (value:col2::varchar)) PARTITION BY (date_part) LOCATION=@my_ext_stage/logs/ AUTO_REFRESH = true FILE_FORMAT = (TYPE = PARQUET);
Google Cloud Storage
CREATE EXTERNAL TABLE exttable_part( date_part date as to_date(substr(metadata$filename, 12, 10), 'YYYY/MM/DD'), timestamp bigint AS (value:timestamp::bigint), col2 varchar AS (value:col2::varchar)) PARTITION BY (date_part) LOCATION=@my_ext_stage/logs/ AUTO_REFRESH = true FILE_FORMAT = (TYPE = PARQUET);
Microsoft Azure
CREATE EXTERNAL TABLE exttable_part( date_part date as to_date(substr(metadata$filename, 12, 10), 'YYYY/MM/DD'), timestamp bigint AS (value:timestamp::bigint), col2 varchar AS (value:col2::varchar)) INTEGRATION = 'MY_INT' PARTITION BY (date_part) LOCATION=@my_ext_stage/logs/ AUTO_REFRESH = true FILE_FORMAT = (TYPE = PARQUET);
Refresh the external table metadata:
ALTER EXTERNAL TABLE exttable_part REFRESH;
When querying the external table, filter the data by the partition column using a WHERE clause:
SELECT timestamp, col2 FROM exttable_part WHERE date_part = to_date('08/05/2018');
Create a Materialized View on an External Table¶
Create a materialized view based on a subquery of the columns in the external table created in the Partitioned External Table example:
CREATE MATERIALIZED VIEW exttable_part_mv
AS
SELECT col2 FROM exttable_part;
For general syntax, usage notes, and further examples for this SQL command, see CREATE MATERIALIZED VIEW.
