Categories:

# CREATE FILE FORMAT¶

In this Topic:

## Syntax¶

CREATE [ OR REPLACE ] FILE FORMAT [ IF NOT EXISTS ] <name>
TYPE = { CSV | JSON | AVRO | ORC | PARQUET | XML } [ formatTypeOptions ]
[ COMMENT = '<string_literal>' ]


Where:

formatTypeOptions ::=
-- If TYPE = CSV
COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE
RECORD_DELIMITER = '<character>' | NONE
FIELD_DELIMITER = '<character>' | NONE
FILE_EXTENSION = '<string>'
SKIP_BLANK_LINES = TRUE | FALSE
DATE_FORMAT = '<string>' | AUTO
TIME_FORMAT = '<string>' | AUTO
TIMESTAMP_FORMAT = '<string>' | AUTO
BINARY_FORMAT = HEX | BASE64 | UTF8
ESCAPE = '<character>' | NONE
ESCAPE_UNENCLOSED_FIELD = '<character>' | NONE
TRIM_SPACE = TRUE | FALSE
FIELD_OPTIONALLY_ENCLOSED_BY = '<character>' | NONE
NULL_IF = ( '<string>' [ , '<string>' ... ] )
ERROR_ON_COLUMN_COUNT_MISMATCH = TRUE | FALSE
REPLACE_INVALID_CHARACTERS = TRUE | FALSE
VALIDATE_UTF8 = TRUE | FALSE
EMPTY_FIELD_AS_NULL = TRUE | FALSE
SKIP_BYTE_ORDER_MARK = TRUE | FALSE
ENCODING = '<string>' | UTF8
-- If TYPE = JSON
COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE
DATE_FORMAT = '<string>' | AUTO
TIME_FORMAT = '<string>' | AUTO
TIMESTAMP_FORMAT = '<string>' | AUTO
BINARY_FORMAT = HEX | BASE64 | UTF8
TRIM_SPACE = TRUE | FALSE
NULL_IF = ( '<string>' [ , '<string>' ... ] )
FILE_EXTENSION = '<string>'
ENABLE_OCTAL = TRUE | FALSE
ALLOW_DUPLICATE = TRUE | FALSE
STRIP_OUTER_ARRAY = TRUE | FALSE
STRIP_NULL_VALUES = TRUE | FALSE
IGNORE_UTF8_ERRORS = TRUE | FALSE
SKIP_BYTE_ORDER_MARK = TRUE | FALSE
-- If TYPE = AVRO
COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE
TRIM_SPACE = TRUE | FALSE
NULL_IF = ( '<string>' [ , '<string>' ... ] )
-- If TYPE = ORC
TRIM_SPACE = TRUE | FALSE
NULL_IF = ( '<string>' [ , '<string>' ... ] )
-- If TYPE = PARQUET
COMPRESSION = AUTO | LZO | SNAPPY | NONE
SNAPPY_COMPRESSION = TRUE | FALSE
BINARY_AS_TEXT = TRUE | FALSE
TRIM_SPACE = TRUE | FALSE
NULL_IF = ( '<string>' [ , '<string>' ... ] )
-- If TYPE = XML
COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE
IGNORE_UTF8_ERRORS = TRUE | FALSE
PRESERVE_SPACE = TRUE | FALSE
STRIP_OUTER_ELEMENT = TRUE | FALSE
DISABLE_SNOWFLAKE_DATA = TRUE | FALSE
DISABLE_AUTO_CONVERT = TRUE | FALSE
SKIP_BYTE_ORDER_MARK = TRUE | FALSE
TRIM_SPACE = TRUE | FALSE
NULL_IF = ( '<string>' [ , '<string>' ... ] )


## Required Parameters¶

name

Specifies the identifier for the file format; must be unique for the schema in which the file format is created.

The identifier value must start with an alphabetic character and cannot contain spaces or special characters unless the entire identifier string is enclosed in double quotes (e.g. "My object"), Identifiers enclosed in double quotes are also case-sensitive.

For more details, see Identifier Requirements.

TYPE = CSV | JSON | AVRO | ORC | PARQUET | XML [ ... ]

Specifies the format of the input files (for data loading) or output files (for data unloading). Depending on the format type, additional format-specific options can be specified. For more details, see Format Type Options (in this topic).

CSV (for loading or unloading)

Any flat, delimited plain text file that uses specific characters as:

• Separators for fields within records (e.g. commas).

• Separators for records (e.g. new line characters).

Although the name, CSV, suggests comma-separated values, any valid character can be used as a field separator.

JSON (for loading or unloading)

Any plain text file consisting of one or more JSON documents (objects, arrays, etc). JSON is a semi-structured file format. The documents can be comma-separated (and optionally enclosed in a big array). A single JSON document may span multiple lines.

Note:

• When loading data from files into tables, Snowflake supports either NDJSON (“Newline Delimited JSON”) standard format or comma-separated JSON format.

• When unloading table data to files, Snowflake outputs only to NDJSON format.

AVRO (for loading only; data cannot be unloaded to AVRO format)

Binary file in AVRO format.

ORC (for loading only; data cannot be unloaded to ORC format)

Binary file in ORC format.

PARQUET (for loading or unloading)

Binary file in PARQUET format.

XML (for loading only; data cannot be unloaded from tables to XML format)

Plain text file containing XML elements.

For more details about CSV, see Usage Notes in this topic. For more details about JSON and the other semi-structured file formats, see Introduction to Semi-structured Data.

## Optional Parameters¶

COMMENT = 'string_literal'

Specifies a comment for the file format.

Default: No value

## Format Type Options (formatTypeOptions)¶

Depending on the file format type specified (TYPE = ...), you can include one or more of the following format-specific options (separated by blank spaces, commas, or new lines):

### TYPE = CSV¶

COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE
Use

Definition
• When loading data, specifies the current compression algorithm for the data file. Snowflake uses this option to detect how an already-compressed data file was compressed so that the compressed data in the file can be extracted for loading.

• When unloading data, compresses the data file using the specified compression algorithm.

Values

Supported Values

Notes

AUTO

When loading data, compression algorithm detected automatically, except for Brotli-compressed files, which cannot currently be detected automatically. When unloading data, files are automatically compressed using the default, which is gzip.

GZIP

BZ2

BROTLI

ZSTD

Zstandard v0.8 (and higher) is supported.

DEFLATE

Deflate-compressed files (with zlib header, RFC1950).

RAW_DEFLATE

Raw Deflate-compressed files (without header, RFC1951).

NONE

Default

AUTO

RECORD_DELIMITER = 'character' | NONE
Use

Definition

Accepts common escape sequences, octal values (prefixed by \\), or hex values (prefixed by 0x). For example, for records delimited by the thorn (Þ) character, specify the octal (\\336) or hex (0xDE) value. Also accepts a value of NONE.

The specified delimiter must be a valid UTF-8 character and not a random sequence of bytes.

Multiple-character delimiters are also supported; however, the delimiter for RECORD_DELIMITER or FIELD_DELIMITER cannot be a substring of the delimiter for the other file format option (e.g. FIELD_DELIMITER = 'aa' RECORD_DELIMITER = 'aabb'). The delimiter is limited to a maximum of 20 characters. Do not specify characters used for other file format options such as ESCAPE or ESCAPE_UNENCLOSED_FIELD.

Default

New line character. Note that “new line” is logical such that \r\n will be understood as a new line for files on a Windows platform.

New line character (\n).

FIELD_DELIMITER = 'character' | NONE
Use

Definition

Accepts common escape sequences, octal values (prefixed by \\), or hex values (prefixed by 0x). For example, for fields delimited by the thorn (Þ) character, specify the octal (\\336) or hex (0xDE) value. Also accepts a value of NONE.

The specified delimiter must be a valid UTF-8 character and not a random sequence of bytes.

Multiple-character delimiters are also supported; however, the delimiter for RECORD_DELIMITER or FIELD_DELIMITER cannot be a substring of the delimiter for the other file format option (e.g. FIELD_DELIMITER = 'aa' RECORD_DELIMITER = 'aabb'). The delimiter is limited to a maximum of 20 characters. Do not specify characters used for other file format options such as ESCAPE or ESCAPE_UNENCLOSED_FIELD.

Default

comma (,)

FILE_EXTENSION = 'string' | NONE
Use

Definition

Specifies the extension for files unloaded to a stage. Accepts any extension. The user is responsible for specifying a file extension that can be read by any desired software or services.

Default

null, meaning the file extension is determined by the format type: .csv[compression], where compression is the extension added by the compression method, if COMPRESSION is set.

Note

If the SINGLE copy option is TRUE, then the COPY command unloads a file without a file extension by default. To specify a file extension, provide a file name and extension in the internal_location or external_location path (e.g. copy into @stage/data.csv).

SKIP_HEADER = integer
Use

Definition

Number of lines at the start of the file to skip.

Note that SKIP_HEADER does not use the RECORD_DELIMITER or FIELD_DELIMITER values to determine what a header line is; rather, it simply skips the specified number of CRLF (Carriage Return, Line Feed)-delimited lines in the file. RECORD_DELIMITER and FIELD_DELIMITER are then used to determine the rows of data to load.

Default

0

SKIP_BLANK_LINES = TRUE | FALSE
Use

Definition

Boolean that specifies to skip any blank lines encountered in the data files; otherwise, blank lines produce an end-of-record error (default behavior).

Default: FALSE

DATE_FORMAT = 'string' | AUTO
Use

Definition

Defines the format of date values in the data files (data loading) or table (data unloading). If a value is not specified or is AUTO, the value for the DATE_INPUT_FORMAT (data loading) or DATE_OUTPUT_FORMAT (data unloading) parameter is used.

Default

AUTO

TIME_FORMAT = 'string' | AUTO
Use

Definition

Defines the format of time values in the data files (data loading) or table (data unloading). If a value is not specified or is AUTO, the value for the TIME_INPUT_FORMAT (data loading) or TIME_OUTPUT_FORMAT (data unloading) parameter is used.

Default

AUTO

TIMESTAMP_FORMAT = string' | AUTO
Use

Definition

Defines the format of timestamp values in the data files (data loading) or table (data unloading). If a value is not specified or is AUTO, the value for the TIMESTAMP_INPUT_FORMAT (data loading) or TIMESTAMP_OUTPUT_FORMAT (data unloading) parameter is used.

Default

AUTO

BINARY_FORMAT = HEX | BASE64 | UTF8
Use

Definition

Defines the encoding format for binary input or output. The option can be used when loading data into or unloading data from binary columns in a table.

Default

HEX

ESCAPE = 'character' | NONE
Use

Definition

Single character string used as the escape character for field values. An escape character invokes an alternative interpretation on subsequent characters in a character sequence. You can use the ESCAPE character to interpret instances of the FIELD_DELIMITER, RECORD_DELIMITER, or FIELD_OPTIONALLY_ENCLOSED_BY characters in the data as literals. The escape character can also be used to escape instances of itself in the data.

Accepts common escape sequences, octal values, or hex values.

• When loading data, specifies the escape character for enclosed fields. Specify the character used to enclose fields by setting FIELD_OPTIONALLY_ENCLOSED_BY.

• When unloading data, if this option is set, it overrides the escape character set for ESCAPE_UNENCLOSED_FIELD.

Default

NONE

ESCAPE_UNENCLOSED_FIELD = 'character' | NONE
Use

Definition

Single character string used as the escape character for unenclosed field values only. An escape character invokes an alternative interpretation on subsequent characters in a character sequence. You can use the ESCAPE character to interpret instances of the FIELD_DELIMITER or RECORD_DELIMITER characters in the data as literals. The escape character can also be used to escape instances of itself in the data.

Accepts common escape sequences, octal values, or hex values.

Note that when unloading data, if ESCAPE is set, the escape character set for that file format option overrides this option.

Default

backslash (\\)

TRIM_SPACE = TRUE | FALSE
Use

Definition

Boolean that specifies whether to remove white space from fields.

For example, if your external database software encloses fields in quotes, but inserts a leading space, Snowflake reads the leading space rather than the opening quotation character as the beginning of the field (i.e. the quotation marks are interpreted as part of the string of field data). Set this option to TRUE to remove undesirable spaces during the data load.

As another example, if leading or trailing spaces surround quotes that enclose strings, you can remove the surrounding spaces using this option and the quote character using the FIELD_OPTIONALLY_ENCLOSED_BY option. Note that any spaces within the quotes are preserved. For example, assuming FIELD_DELIMITER = '|' and FIELD_OPTIONALLY_ENCLOSED_BY = '"':

|"Hello world"|    /* loads as */  >Hello world<
|" Hello world "|  /* loads as */  > Hello world <
| "Hello world" |  /* loads as */  >Hello world<


(the brackets in this example are not loaded; they are used to demarcate the beginning and end of the loaded strings)

Default

FALSE

FIELD_OPTIONALLY_ENCLOSED_BY = 'character' | NONE
Use

Definition

Character used to enclose strings. Value can be NONE, single quote character ('), or double quote character ("). To use the single quote character, use the octal or hex representation (0x27) or the double single-quoted escape ('').

When a field contains this character, escape it using the same character. For example, if the value is the double quote character and a field contains the string A "B" C, escape the double quotes as follows:

A ""B"" C

Default

NONE

NULL_IF = ( 'string1' [ , 'string2' , ... ] )
Use

Definition

String used to convert to and from SQL NULL:

• When loading data, Snowflake replaces these strings in the data load source with SQL NULL. To specify more than one string, enclose the list of strings in parentheses and use commas to separate each value.

For example:

NULL_IF = ('\\N', 'NULL', 'NUL', '')

Note that this option can include empty strings.

• When unloading data, Snowflake converts SQL NULL values to the first value in the list.

Default

\\N (i.e. NULL, which assumes the ESCAPE_UNENCLOSED_FIELD value is \\)

ERROR_ON_COLUMN_COUNT_MISMATCH = TRUE | FALSE
Use

Definition

Boolean that specifies whether to generate a parsing error if the number of delimited columns (i.e. fields) in an input file does not match the number of columns in the corresponding table.

If set to FALSE, an error is not generated and the load continues. If the file is successfully loaded:

• If the input file contains records with more fields than columns in the table, the matching fields are loaded in order of occurrence in the file and the remaining fields are not loaded.

• If the input file contains records with fewer fields than columns in the table, the non-matching columns in the table are loaded with NULL values.

This option assumes all the records within the input file are the same length (i.e. a file containing records of varying length return an error regardless of the value specified for this parameter).

Default

TRUE

Note

When transforming data during loading (i.e. using a query as the source for the COPY command), this option is ignored. There is no requirement for your data files to have the same number and ordering of columns as your target table.

REPLACE_INVALID_CHARACTERS = TRUE | FALSE
Use

Definition

Boolean that specifies whether to replace invalid UTF-8 characters with the Unicode replacement character ().

If set to TRUE, Snowflake replaces invalid UTF-8 characters with the Unicode replacement character.

If set to FALSE, the load operation produces an error when invalid UTF-8 character encoding is detected.

Default

FALSE

VALIDATE_UTF8 = TRUE | FALSE
Use

Definition

Boolean that specifies whether to validate UTF-8 character encoding in string column data.

If set to TRUE, Snowflake validates UTF-8 character encoding in string column data. When invalid UTF-8 character encoding is detected, the COPY command produces an error.

Default

TRUE

Important

This option is provided only to ensure backward compatibility with earlier versions of Snowflake. You should not disable this option unless instructed by Snowflake Support.

EMPTY_FIELD_AS_NULL = TRUE | FALSE
Use

Definition
• When loading data, specifies whether to insert SQL NULL for empty fields in an input file, which are represented by two successive delimiters (e.g. ,,).

If set to FALSE, Snowflake attempts to cast an empty field to the corresponding column type. An empty string is inserted into columns of type STRING. For other column types, the COPY command produces an error.

• When unloading data, this option is used in combination with FIELD_OPTIONALLY_ENCLOSED_BY. When FIELD_OPTIONALLY_ENCLOSED_BY = NONE, setting EMPTY_FIELD_AS_NULL = FALSE specifies to unload empty strings in tables to empty string values without quotes enclosing the field values.

If set to TRUE, FIELD_OPTIONALLY_ENCLOSED_BY must specify a character to enclose strings.

Default

TRUE

SKIP_BYTE_ORDER_MARK = TRUE | FALSE
Use

Definition

Boolean that specifies whether to skip the BOM (byte order mark), if present in a data file. A BOM is a character code at the beginning of a data file that defines the byte order and encoding form.

If set to FALSE, Snowflake recognizes any BOM in data files, which could result in the BOM either causing an error or being merged into the first column in the table.

Default

TRUE

ENCODING = 'string'
Use

Definition

String (constant) that specifies the character set of the source data when loading data into a table.

Character Set

ENCODING Value

Supported Languages

Notes

Big5

BIG5

EUC-JP

EUCJP

Japanese

EUC-KR

EUCKR

Korean

GB18030

GB18030

Chinese

IBM420

IBM420

Arabic

IBM424

IBM424

Hebrew

ISO-2022-CN

ISO2022CN

Simplified Chinese

ISO-2022-JP

ISO2022JP

Japanese

ISO-2022-KR

ISO2022KR

Korean

ISO-8859-1

ISO88591

Danish, Dutch, English, French, German, Italian, Norwegian, Portuguese, Swedish

ISO-8859-2

ISO88592

Czech, Hungarian, Polish, Romanian

ISO-8859-5

ISO88595

Russian

ISO-8859-6

ISO88596

Arabic

ISO-8859-7

ISO88597

Greek

ISO-8859-8

ISO88598

Hebrew

ISO-8859-9

ISO88599

Turkish

KOI8-R

KOI8R

Russian

Shift_JIS

SHIFTJIS

Japanese

UTF-8

UTF8

All languages

For loading data from delimited files (CSV, TSV, etc.), UTF-8 is the default. . . For loading data from all other supported file formats (JSON, Avro, etc.), as well as unloading data, UTF-8 is the only supported character set.

UTF-16

UTF16

All languages

UTF-16BE

UTF16BE

All languages

UTF-16LE

UTF16LE

All languages

UTF-32

UTF32

All languages

UTF-32BE

UTF32BE

All languages

UTF-32LE

UTF32LE

All languages

windows-1250

WINDOWS1250

Czech, Hungarian, Polish, Romanian

windows-1251

WINDOWS1251

Russian

windows-1252

WINDOWS1252

Danish, Dutch, English, French, German, Italian, Norwegian, Portuguese, Swedish

windows-1253

WINDOWS1253

Greek

windows-1254

WINDOWS1254

Turkish

windows-1255

WINDOWS1255

Hebrew

windows-1256

WINDOWS1256

Arabic

Default

UTF8

Note

Snowflake stores all data internally in the UTF-8 character set. The data is converted into UTF-8 before it is loaded into Snowflake. If the VALIDATE_UTF8 file format option is TRUE, Snowflake validates the UTF-8 character encoding in string column data after it is converted from its original character encoding.

### TYPE = JSON¶

COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE
Use

Definition
• When loading data, specifies the current compression algorithm for the data file. Snowflake uses this option to detect how an already-compressed data file was compressed so that the compressed data in the file can be extracted for loading.

• When unloading data, compresses the data file using the specified compression algorithm.

Values

Supported Values

Notes

AUTO

When loading data, compression algorithm detected automatically, except for Brotli-compressed files, which cannot currently be detected automatically. When unloading data, files are automatically compressed using the default, which is gzip.

GZIP

BZ2

BROTLI

ZSTD

Zstandard v0.8 (and higher) is supported.

DEFLATE

Deflate-compressed files (with zlib header, RFC1950).

RAW_DEFLATE

Raw Deflate-compressed files (without header, RFC1951).

NONE

Default

AUTO

DATE_FORMAT = 'string' | AUTO
Use

Definition

Applied only when loading JSON data into separate columns (i.e. using the MATCH_BY_COLUMN_NAME copy option or a COPY transformation). Defines the format of date string values in the data files. If a value is not specified or is AUTO, the value for the DATE_INPUT_FORMAT parameter is used.

This file format option is currently a Preview Feature.

Default

AUTO

TIME_FORMAT = 'string' | AUTO
Use

Definition

Applied only when loading JSON data into separate columns (i.e. using the MATCH_BY_COLUMN_NAME copy option or a COPY transformation). Defines the format of time string values in the data files. If a value is not specified or is AUTO, the value for the TIME_INPUT_FORMAT parameter is used.

This file format option is currently a Preview Feature.

Default

AUTO

TIMESTAMP_FORMAT = string' | AUTO
Use

Definition

Applied only when loading JSON data into separate columns (i.e. using the MATCH_BY_COLUMN_NAME copy option or a COPY transformation). Defines the format of timestamp string values in the data files. If a value is not specified or is AUTO, the value for the TIMESTAMP_INPUT_FORMAT parameter is used.

This file format option is currently a Preview Feature.

Default

AUTO

BINARY_FORMAT = HEX | BASE64 | UTF8
Use

Definition

Applied only when loading JSON data into separate columns (i.e. using the MATCH_BY_COLUMN_NAME copy option or a COPY transformation). Defines the encoding format for binary string values in the data files. The option can be used when loading data into binary columns in a table.

This file format option is currently a Preview Feature.

Default

HEX

TRIM_SPACE = TRUE | FALSE
Use

Definition

Applied only when loading JSON data into separate columns (i.e. using the MATCH_BY_COLUMN_NAME copy option or a COPY transformation). Boolean that specifies whether to remove leading and trailing white space from strings.

For example, if your external database software encloses fields in quotes, but inserts a leading space, Snowflake reads the leading space rather than the opening quotation character as the beginning of the field (i.e. the quotation marks are interpreted as part of the string of field data). Set this option to TRUE to remove undesirable spaces during the data load.

This file format option is currently a Preview Feature.

Default

FALSE

NULL_IF = ( 'string1' [ , 'string2' , ... ] )
Use

Definition

Applied only when loading JSON data into separate columns (i.e. using the MATCH_BY_COLUMN_NAME copy option or a COPY transformation). String used to convert to and from SQL NULL. Snowflake replaces these strings in the data load source with SQL NULL. To specify more than one string, enclose the list of strings in parentheses and use commas to separate each value.

For example:

NULL_IF = ('\\N', 'NULL', 'NUL', '')

Note that this option can include empty strings.

This file format option is currently a Preview Feature.

Default

\\N (i.e. NULL, which assumes the ESCAPE_UNENCLOSED_FIELD value is \\)

FILE_EXTENSION = 'string' | NONE
Use

Definition

Specifies the extension for files unloaded to a stage. Accepts any extension. The user is responsible for specifying a file extension that can be read by any desired software or services.

Default

null, meaning the file extension is determined by the format type: .json[compression], where compression is the extension added by the compression method, if COMPRESSION is set.

ENABLE_OCTAL = TRUE | FALSE
Use

Definition

Boolean that enables parsing of octal numbers.

Default

FALSE

ALLOW_DUPLICATE = TRUE | FALSE
Use

Definition

Boolean that specifies to allow duplicate object field names (only the last one will be preserved).

Default

FALSE

STRIP_OUTER_ARRAY = TRUE | FALSE
Use

Definition

Boolean that instructs the JSON parser to remove outer brackets (i.e [ ]).

Default

FALSE

STRIP_NULL_VALUES = TRUE | FALSE
Use

Definition

Boolean that instructs the JSON parser to remove object fields or array elements containing null values. For example, when set to TRUE:

Before

After

[null]

[]

[null,null,3]

[,,3]

{"a":null,"b":null,"c":123}

{"c":123}

{"a":[1,null,2],"b":{"x":null,"y":88}}

{"a":[1,,2],"b":{"y":88}}

Default

FALSE

IGNORE_UTF8_ERRORS = TRUE | FALSE
Use

Definition

Boolean that specifies whether UTF-8 encoding errors produce error conditions. If set to TRUE, any invalid UTF-8 sequences are silently replaced with Unicode character U+FFFD (i.e. “replacement character”).

Default

FALSE

SKIP_BYTE_ORDER_MARK = TRUE | FALSE
Use

Definition

Boolean that specifies whether to skip the BOM (byte order mark), if present in a data file. A BOM is a character code at the beginning of a data file that defines the byte order and encoding form.

If set to FALSE, Snowflake recognizes any BOM in data files, which could result in the BOM either causing an error or being merged into the first column in the table.

Default

TRUE

### TYPE = AVRO¶

COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE
Use

Definition
• When loading data, specifies the current compression algorithm for the data file. Snowflake uses this option to detect how an already-compressed data file was compressed so that the compressed data in the file can be extracted for loading.

• When unloading data, compresses the data file using the specified compression algorithm.

Values

Supported Values

Notes

AUTO

When loading data, compression algorithm detected automatically, except for Brotli-compressed files, which cannot currently be detected automatically. When unloading data, files are automatically compressed using the default, which is gzip.

GZIP

BZ2

BROTLI

ZSTD

Zstandard v0.8 (and higher) is supported.

DEFLATE

Deflate-compressed files (with zlib header, RFC1950).

RAW_DEFLATE

Raw Deflate-compressed files (without header, RFC1951).

NONE

Default

AUTO

TRIM_SPACE = TRUE | FALSE
Use

Definition

Applied only when loading Avro data into separate columns (i.e. using the MATCH_BY_COLUMN_NAME copy option or a COPY transformation). Boolean that specifies whether to remove leading and trailing white space from strings.

For example, if your external database software encloses fields in quotes, but inserts a leading space, Snowflake reads the leading space rather than the opening quotation character as the beginning of the field (i.e. the quotation marks are interpreted as part of the string of field data). Set this option to TRUE to remove undesirable spaces during the data load.

This file format option is currently a Preview Feature.

Default

FALSE

NULL_IF = ( 'string1' [ , 'string2' , ... ] )
Use

Definition

Applied only when loading Avro data into separate columns (i.e. using the MATCH_BY_COLUMN_NAME copy option or a COPY transformation). String used to convert to and from SQL NULL. Snowflake replaces these strings in the data load source with SQL NULL. To specify more than one string, enclose the list of strings in parentheses and use commas to separate each value.

For example:

NULL_IF = ('\\N', 'NULL', 'NUL', '')

Note that this option can include empty strings.

This file format option is currently a Preview Feature.

Default

\\N (i.e. NULL, which assumes the ESCAPE_UNENCLOSED_FIELD value is \\)

### TYPE = ORC¶

TRIM_SPACE = TRUE | FALSE
Use

Definition

Applied only when loading ORC data into separate columns (i.e. using the MATCH_BY_COLUMN_NAME copy option or a COPY transformation). Boolean that specifies whether to remove leading and trailing white space from strings.

For example, if your external database software encloses fields in quotes, but inserts a leading space, Snowflake reads the leading space rather than the opening quotation character as the beginning of the field (i.e. the quotation marks are interpreted as part of the string of field data). Set this option to TRUE to remove undesirable spaces during the data load.

This file format option is currently a Preview Feature.

Default

FALSE

NULL_IF = ( 'string1' [ , 'string2' , ... ] )
Use

Definition

Applied only when loading ORC data into separate columns (i.e. using the MATCH_BY_COLUMN_NAME copy option or a COPY transformation). String used to convert to and from SQL NULL. Snowflake replaces these strings in the data load source with SQL NULL. To specify more than one string, enclose the list of strings in parentheses and use commas to separate each value.

For example:

NULL_IF = ('\\N', 'NULL', 'NUL', '')

Note that this option can include empty strings.

This file format option is currently a Preview Feature.

Default

\\N (i.e. NULL, which assumes the ESCAPE_UNENCLOSED_FIELD value is \\)

### TYPE = PARQUET¶

COMPRESSION = AUTO | SNAPPY | NONE
Use

Definition

• When loading data, specifies the current compression algorithm for columns in the Parquet files.

• When unloading data, compresses the data file using the specified compression algorithm.

Values

Supported Values

Notes

AUTO

When loading data, compression algorithm detected automatically. Supports the following compression algorithms: Brotli, gzip, Lempel–Ziv–Oberhumer (LZO), LZ4, Snappy, or Zstandard v0.8 (and higher). . When unloading data, unloaded files are compressed using the Snappy compression algorithm by default.

SNAPPY

NONE

Default

AUTO

SNAPPY_COMPRESSION = TRUE | FALSE
Use

AUTO | Unloaded files are compressed using the Snappy compression algorithm by default. SNAPPY | May be specified if unloading Snappy-compressed files. NONE | When loading data, indicates that the files have not been compressed. When unloading data, specifies that the unloaded files are not compressed.

Definition

Boolean that specifies whether unloaded file(s) are compressed using the SNAPPY algorithm.

Note

Deprecated. Use COMPRESSION = SNAPPY instead.

Limitations

Default

TRUE

BINARY_AS_TEXT = TRUE | FALSE
Use

Definition

Boolean that specifies whether to interpret columns with no defined logical data type as UTF-8 text. When set to FALSE, Snowflake interprets these columns as binary data.

Limitations

Default

TRUE

TRIM_SPACE = TRUE | FALSE
Use

Definition

Applied only when loading Parquet data into separate columns (i.e. using the MATCH_BY_COLUMN_NAME copy option or a COPY transformation). Boolean that specifies whether to remove leading and trailing white space from strings.

For example, if your external database software encloses fields in quotes, but inserts a leading space, Snowflake reads the leading space rather than the opening quotation character as the beginning of the field (i.e. the quotation marks are interpreted as part of the string of field data). Set this option to TRUE to remove undesirable spaces during the data load.

This file format option is currently a Preview Feature.

Default

FALSE

NULL_IF = ( 'string1' [ , 'string2' , ... ] )
Use

Definition

Applied only when loading Parquet data into separate columns (i.e. using the MATCH_BY_COLUMN_NAME copy option or a COPY transformation). String used to convert to and from SQL NULL. Snowflake replaces these strings in the data load source with SQL NULL. To specify more than one string, enclose the list of strings in parentheses and use commas to separate each value.

For example:

NULL_IF = ('\\N', 'NULL', 'NUL', '')

Note that this option can include empty strings.

This file format option is currently a Preview Feature.

Default

\\N (i.e. NULL, which assumes the ESCAPE_UNENCLOSED_FIELD value is \\)

### TYPE = XML¶

COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE
Use

Definition
• When loading data, specifies the current compression algorithm for the data file. Snowflake uses this option to detect how an already-compressed data file was compressed so that the compressed data in the file can be extracted for loading.

• When unloading data, compresses the data file using the specified compression algorithm.

Values

Supported Values

Notes

AUTO

When loading data, compression algorithm detected automatically, except for Brotli-compressed files, which cannot currently be detected automatically. When unloading data, files are automatically compressed using the default, which is gzip.

GZIP

BZ2

BROTLI

ZSTD

Zstandard v0.8 (and higher) is supported.

DEFLATE

Deflate-compressed files (with zlib header, RFC1950).

RAW_DEFLATE

Raw Deflate-compressed files (without header, RFC1951).

NONE

Default

AUTO

IGNORE_UTF8_ERRORS = TRUE | FALSE
Use

Definition

Boolean that specifies whether UTF-8 encoding errors produce error conditions. If set to TRUE, any invalid UTF-8 sequences are silently replaced with Unicode character U+FFFD (i.e. “replacement character”).

Default

FALSE

PRESERVE_SPACE = TRUE | FALSE
Use

Definition

Boolean that specifies whether the XML parser preserves leading and trailing spaces in element content.

Default

FALSE

STRIP_OUTER_ELEMENT = TRUE | FALSE
Use

Definition

Boolean that specifies whether the XML parser strips out the outer XML element, exposing 2nd level elements as separate documents.

Default

FALSE

DISABLE_SNOWFLAKE_DATA = TRUE | FALSE
Use

Definition

Boolean that specifies whether the XML parser disables recognition of Snowflake semi-structured data tags.

Default

FALSE

DISABLE_AUTO_CONVERT = TRUE | FALSE
Use

Definition

Boolean that specifies whether the XML parser disables automatic conversion of numeric and Boolean values from text to native representation.

Default

FALSE

SKIP_BYTE_ORDER_MARK = TRUE | FALSE
Use

Definition

Boolean that specifies whether to skip any BOM (byte order mark) present in an input file. A BOM is a character code at the beginning of a data file that defines the byte order and encoding form.

If set to FALSE, Snowflake recognizes any BOM in data files, which could result in the BOM either causing an error or being merged into the first column in the table.

Default

TRUE

TRIM_SPACE = TRUE | FALSE
Use

Definition

Applied only when loading XML data into separate columns (i.e. using the MATCH_BY_COLUMN_NAME copy option or a COPY transformation). Boolean that specifies whether to remove leading and trailing white space from strings.

For example, if your external database software encloses fields in quotes, but inserts a leading space, Snowflake reads the leading space rather than the opening quotation character as the beginning of the field (i.e. the quotation marks are interpreted as part of the string of field data). Set this option to TRUE to remove undesirable spaces during the data load.

This file format option is currently a Preview Feature.

Default

FALSE

NULL_IF = ( 'string1' [ , 'string2' , ... ] )
Use

Definition

Applied only when loading XML data into separate columns (i.e. using the MATCH_BY_COLUMN_NAME copy option or a COPY transformation). String used to convert to and from SQL NULL. Snowflake replaces these strings in the data load source with SQL NULL. To specify more than one string, enclose the list of strings in parentheses and use commas to separate each value.

For example:

NULL_IF = ('\\N', 'NULL', 'NUL', '')

Note that this option can include empty strings.

This file format option is currently a Preview Feature.

Default

\\N (i.e. NULL, which assumes the ESCAPE_UNENCLOSED_FIELD value is \\)

## Usage Notes¶

• A field can be optionally enclosed by double quotes and, within the field, all special characters are automatically escaped except the double quote itself needs to be escaped by having two double quotes right next to each other (""). For unenclosed fields, backslash (\) is the default escape character.

• The FIELD_DELIMITER, RECORD_DELIMITER, ESCAPE, and ESCAPE_UNENCLOSED_FIELD format options support the following characters:

• Octal (prefixed by \\) or hex representations (prefixed by 0x).

• ASCII characters, including high-order characters. Specify the octal (prefixed by \\) or hex representation (prefixed by 0x) of the character. For example, for the cents (¢) character, specify the octal (\\242) or hex (0xA2) value.

• Common escape sequences (e.g. \t for tab, \n for newline, \r for carriage return, \\ for backslash).

• The ESCAPE_UNENCLOSED_FIELD default value is \\. When loading data, if a row in a data file ends in the backslash (\) character, this character escapes the newline or carriage return character specified for the RECORD_DELIMITER file format option. As a result, the load operation treats this row and the next row as a single row of data. To avoid this issue, set ESCAPE_UNENCLOSED_FIELD = NONE.

• Conflicting file format values in a SQL statement produce an error. A conflict occurs when the same option is specified multiple times with different values (e.g. ...TYPE = 'CSV' ... TYPE = 'JSON'...).

## Examples¶

Create a CSV file format named my_csv_format that defines the following rules for data files:

• Fields are delimited using the pipe character (|).

• Files include a single header line that will be skipped.

• The strings NULL and null will be replaced with NULL values.

• Empty strings will be interpreted as NULL values.

• Files will be compressed/decompressed using GZIP compression.

CREATE OR REPLACE FILE FORMAT my_csv_format
TYPE = CSV
FIELD_DELIMITER = '|'
NULL_IF = ('NULL', 'null')
EMPTY_FIELD_AS_NULL = true
COMPRESSION = gzip;


Create a JSON file format named my_json_format that uses all the default JSON format options:

CREATE OR REPLACE FILE FORMAT my_json_format
TYPE = JSON;


Create a PARQUET file format named my_parquet_format that does not compress unloaded data files using the Snappy algorithm:

CREATE OR REPLACE FILE FORMAT my_parquet_format
TYPE = PARQUET
COMPRESSION = SNAPPY;