CREATE ICEBERG TABLE

Creates a new Iceberg table in the current/specified schema or replaces an existing Iceberg table. Iceberg tables combine features standard in Snowflake tables, such as fast SQL processing, security and authorization, and data governance with open Apache Iceberg metadata and storage.

In addition, this command supports the following variants for Iceberg tables that use Snowflake as the catalog:

  • CREATE ICEBERG TABLE … AS SELECT (creates a populated table; also referred to as CTAS)

  • CREATE ICEBERG TABLE … LIKE (creates an empty copy of an existing table)

This topic refers to Iceberg tables as simply “tables” except where specifying Iceberg tables avoids confusion.

Note

Prior to creating a table, you must create the external volume where the Iceberg metadata and data files are stored. For instructions, see Configure an external volume for Iceberg tables.

If you use an external Iceberg catalog, or no catalog at all, you must also create a catalog integration for the table. To learn more, see Configure a catalog integration for Iceberg tables.

See also:

ALTER ICEBERG TABLE, DROP ICEBERG TABLE , SHOW ICEBERG TABLES , DESCRIBE ICEBERG TABLE

Syntax

Snowflake as the Iceberg catalog

-- Snowflake as the Iceberg catalog
CREATE [ OR REPLACE ] ICEBERG TABLE [ IF NOT EXISTS ] <table_name> (
    -- Column definition
    <col_name> <col_type>
      [ inlineConstraint ]
      [ NOT NULL ]
      [ COLLATE '<collation_specification>' ]
      [ { DEFAULT <expr>
          | { AUTOINCREMENT | IDENTITY }
            [ { ( <start_num> , <step_num> )
                | START <num> INCREMENT <num>
              } ]
        } ]
      [ [ WITH ] MASKING POLICY <policy_name> [ USING ( <col_name> , <cond_col1> , ... ) ] ]
      [ [ WITH ] TAG ( <tag_name> = '<tag_value>' [ , <tag_name> = '<tag_value>' , ... ] ) ]
      [ COMMENT '<string_literal>' ]

    -- Additional column definitions
    [ , <col_name> <col_type> [ ... ] ]

    -- Out-of-line constraints
    [ , outoflineConstraint [ ... ] ]
  )
  [ CLUSTER BY ( <expr> [ , <expr> , ... ] ) ]
  [ EXTERNAL_VOLUME = '<external_volume_name>' ]
  [ CATALOG = 'SNOWFLAKE' ]
  BASE_LOCATION = '<relative_path_from_external_volume>'
  [ STAGE_FILE_FORMAT = (
    { FORMAT_NAME = '<file_format_name>'
      | TYPE = { CSV | JSON | AVRO | ORC | PARQUET | XML } [ formatTypeOptions ]
    } ) ]
  [ STAGE_COPY_OPTIONS = ( copyOptions ) ]
  [ DATA_RETENTION_TIME_IN_DAYS = <integer> ]
  [ MAX_DATA_EXTENSION_TIME_IN_DAYS = <integer> ]
  [ CHANGE_TRACKING = { TRUE | FALSE } ]
  [ DEFAULT_DDL_COLLATION = '<collation_specification>' ]
  [ COPY GRANTS ]
  [ COMMENT = '<string_literal>' ]
  [ [ WITH ] ROW ACCESS POLICY <policy_name> ON ( <col_name> [ , <col_name> ... ] ) ]
  [ [ WITH ] TAG ( <tag_name> = '<tag_value>' [ , <tag_name> = '<tag_value>' , ... ] ) ]
  [ COMMENT = '<string_literal>' ]
Copy

Where:

inlineConstraint ::=
  [ CONSTRAINT <constraint_name> ]
  { UNIQUE
    | PRIMARY KEY
    | [ FOREIGN KEY ] REFERENCES <ref_table_name> [ ( <ref_col_name> ) ]
  }
  [ <constraint_properties> ]
Copy

For additional inline constraint details, see CREATE | ALTER TABLE … CONSTRAINT.

outoflineConstraint ::=
  [ CONSTRAINT <constraint_name> ]
  { UNIQUE [ ( <col_name> [ , <col_name> , ... ] ) ]
    | PRIMARY KEY [ ( <col_name> [ , <col_name> , ... ] ) ]
    | [ FOREIGN KEY ] [ ( <col_name> [ , <col_name> , ... ] ) ]
      REFERENCES <ref_table_name> [ ( <ref_col_name> [ , <ref_col_name> , ... ] ) ]
  }
  [ <constraint_properties> ]
Copy

For additional out-of-line constraint details, see CREATE | ALTER TABLE … CONSTRAINT.

formatTypeOptions ::=
-- If TYPE = CSV
     COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE
     RECORD_DELIMITER = '<character>' | NONE
     FIELD_DELIMITER = '<character>' | NONE
     FILE_EXTENSION = '<string>'
     PARSE_HEADER = TRUE | FALSE
     SKIP_HEADER = <integer>
     SKIP_BLANK_LINES = TRUE | FALSE
     DATE_FORMAT = '<string>' | AUTO
     TIME_FORMAT = '<string>' | AUTO
     TIMESTAMP_FORMAT = '<string>' | AUTO
     BINARY_FORMAT = HEX | BASE64 | UTF8
     ESCAPE = '<character>' | NONE
     ESCAPE_UNENCLOSED_FIELD = '<character>' | NONE
     TRIM_SPACE = TRUE | FALSE
     FIELD_OPTIONALLY_ENCLOSED_BY = '<character>' | NONE
     NULL_IF = ( '<string>' [ , '<string>' ... ] )
     ERROR_ON_COLUMN_COUNT_MISMATCH = TRUE | FALSE
     REPLACE_INVALID_CHARACTERS = TRUE | FALSE
     EMPTY_FIELD_AS_NULL = TRUE | FALSE
     SKIP_BYTE_ORDER_MARK = TRUE | FALSE
     ENCODING = '<string>' | UTF8
-- If TYPE = JSON
     COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE
     DATE_FORMAT = '<string>' | AUTO
     TIME_FORMAT = '<string>' | AUTO
     TIMESTAMP_FORMAT = '<string>' | AUTO
     BINARY_FORMAT = HEX | BASE64 | UTF8
     TRIM_SPACE = TRUE | FALSE
     NULL_IF = ( '<string>' [ , '<string>' ... ] )
     FILE_EXTENSION = '<string>'
     ENABLE_OCTAL = TRUE | FALSE
     ALLOW_DUPLICATE = TRUE | FALSE
     STRIP_OUTER_ARRAY = TRUE | FALSE
     STRIP_NULL_VALUES = TRUE | FALSE
     REPLACE_INVALID_CHARACTERS = TRUE | FALSE
     IGNORE_UTF8_ERRORS = TRUE | FALSE
     SKIP_BYTE_ORDER_MARK = TRUE | FALSE
-- If TYPE = AVRO
     COMPRESSION = AUTO | GZIP | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE
     TRIM_SPACE = TRUE | FALSE
     REPLACE_INVALID_CHARACTERS = TRUE | FALSE
     NULL_IF = ( '<string>' [ , '<string>' ... ] )
-- If TYPE = ORC
     TRIM_SPACE = TRUE | FALSE
     REPLACE_INVALID_CHARACTERS = TRUE | FALSE
     NULL_IF = ( '<string>' [ , '<string>' ... ] )
-- If TYPE = PARQUET
     COMPRESSION = AUTO | LZO | SNAPPY | NONE
     SNAPPY_COMPRESSION = TRUE | FALSE
     BINARY_AS_TEXT = TRUE | FALSE
     USE_LOGICAL_TYPE = TRUE | FALSE
     TRIM_SPACE = TRUE | FALSE
     REPLACE_INVALID_CHARACTERS = TRUE | FALSE
     NULL_IF = ( '<string>' [ , '<string>' ... ] )
-- If TYPE = XML
     COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE
     IGNORE_UTF8_ERRORS = TRUE | FALSE
     PRESERVE_SPACE = TRUE | FALSE
     STRIP_OUTER_ELEMENT = TRUE | FALSE
     DISABLE_SNOWFLAKE_DATA = TRUE | FALSE
     DISABLE_AUTO_CONVERT = TRUE | FALSE
     REPLACE_INVALID_CHARACTERS = TRUE | FALSE
     SKIP_BYTE_ORDER_MARK = TRUE | FALSE
Copy
copyOptions ::=
     ON_ERROR = { CONTINUE | SKIP_FILE | SKIP_FILE_<num> | 'SKIP_FILE_<num>%' | ABORT_STATEMENT }
     SIZE_LIMIT = <num>
     PURGE = TRUE | FALSE
     RETURN_FAILED_ONLY = TRUE | FALSE
     MATCH_BY_COLUMN_NAME = CASE_SENSITIVE | CASE_INSENSITIVE | NONE
     ENFORCE_LENGTH = TRUE | FALSE
     TRUNCATECOLUMNS = TRUE | FALSE
     FORCE = TRUE | FALSE
Copy

External Iceberg catalog

-- External Iceberg catalog
CREATE [ OR REPLACE ] ICEBERG TABLE [ IF NOT EXISTS ] <table_name>
  [ EXTERNAL_VOLUME = '<external_volume_name>' ]
  [ CATALOG = '<catalog_integration_name>' ]
  externalCatalogParams
  [ COMMENT = '<string_literal>' ]
  [ [ WITH ] TAG ( <tag_name> = '<tag_value>' [ , <tag_name> = '<tag_value>' , ... ] ) ]
Copy

Where:

externalCatalogParams (for AWS Glue Data Catalog) ::=
  CATALOG_TABLE_NAME = '<catalog_table_name>'
  [ CATALOG_NAMESPACE = '<catalog_namespace>' ]
Copy
externalCatalogParams (for Iceberg files in object storage) ::=
  METADATA_FILE_PATH = '<metadata_file_path>'
Copy

Variant syntax

The following variant syntax is supported for creating Iceberg tables that use Snowflake as the catalog.

CREATE TABLE … AS SELECT (also referred to as CTAS)

Creates a new table populated with the data returned by a query:

CREATE [ OR REPLACE ] ICEBERG TABLE <table_name> [ ( <col_name> [ <col_type> ] , <col_name> [ <col_type> ] , ... ) ]
  [ CLUSTER BY ( <expr> [ , <expr> , ... ] ) ]
  [ COPY GRANTS ]
  AS SELECT <query>
  [ ... ]
Copy

A masking policy can be applied to a column in a CTAS statement. Specify the masking policy after the column data type. Similarly, a row access policy can be applied to the table. For example:

CREATE ICEBERG TABLE <table_name> ( <col1> <data_type> [ WITH ] MASKING POLICY <policy_name> [ , ... ] )
  ...
  [ WITH ] ROW ACCESS POLICY <policy_name> ON ( <col1> [ , ... ] )
  AS SELECT <query>
  [ ... ]
Copy

Note

In a CTAS, the COPY GRANTS clause is valid only when combined with the OR REPLACE clause. COPY GRANTS copies privileges from the table being replaced with CREATE OR REPLACE (if it already exists), not from the source table(s) being queried in the SELECT statement. CTAS with COPY GRANTS allows you to overwrite a table with a new set of data while keeping existing grants on that table.

For more details about COPY GRANTS, see COPY GRANTS in this document.

CREATE ICEBERG TABLE … LIKE

Creates a new table with the same column definitions as an existing table, but without copying data from the existing table. Column names, types, defaults, and constraints are copied to the new table:

CREATE [ OR REPLACE ] ICEBERG TABLE <table_name> LIKE <source_table>
  [ CLUSTER BY ( <expr> [ , <expr> , ... ] ) ]
  [ COPY GRANTS ]
  [ ... ]
Copy

For more details about COPY GRANTS, see COPY GRANTS in this document.

Note

CREATE TABLE … LIKE for a table with an auto-increment sequence accessed through a data share is currently not supported.

Required parameters

table_name

Specifies the identifier (name) for the table; must be unique for the schema in which the table is created.

In addition, the identifier must start with an alphabetic character and cannot contain spaces or special characters unless the entire identifier string is enclosed in double quotes (for example, "My object"). Identifiers enclosed in double quotes are also case-sensitive.

For more details, see Identifier requirements.

col_name

Specifies the column identifier (name). All the requirements for table identifiers also apply to column identifiers.

For more details, see Identifier requirements and Reserved & Limited Keywords.

Note

In addition to the standard reserved keywords, the following keywords cannot be used as column identifiers because they are reserved for ANSI-standard context functions:

  • CURRENT_DATE

  • CURRENT_ROLE

  • CURRENT_TIME

  • CURRENT_TIMESTAMP

  • CURRENT_USER

For the list of reserved keywords, see Reserved & Limited Keywords.

col_type

Specifies the data type for the column.

For details about the data types that can be specified for table columns, see Iceberg table data types.

query

Subquery that calls the INFER_SCHEMA function and formats the output as an array.

Optional parameters

EXTERNAL_VOLUME = 'external_volume_name'

Specifies the identifier (name) for the external volume where the Iceberg table stores its metadata files and data in Parquet format. Iceberg metadata and manifest files store the table schema, partitions, snapshots, and other metadata.

If you do not specify this parameter, the Iceberg table defaults to the external volume for the schema, database, or account. The schema takes precedence over the database, and the database takes precedence over the account.

CONSTRAINT ...

Defines an inline or out-of-line constraint for the specified column(s) in the table.

For syntax details, see CREATE | ALTER TABLE … CONSTRAINT. For more information about constraints, see Constraints.

COLLATE 'collation_specification'

Specifies the collation to use for column operations such as string comparison. This option applies only to text columns (VARCHAR, STRING, TEXT, etc.). For more details, see Collation Specifications.

DEFAULT ... or . AUTOINCREMENT ...

Specifies whether a default value is automatically inserted in the column if a value is not explicitly specified via an INSERT or CREATE TABLE AS SELECT statement:

DEFAULT expr

Column default value is defined by the specified expression which can be any of the following:

  • Constant value.

  • Sequence reference (seq_name.NEXTVAL).

  • Simple expression that returns a scalar value.

    The simple expression can include a SQL UDF (user-defined function) if the UDF is not a secure UDF.

    Note

    If a default expression refers to a SQL UDF, then the function is replaced by its definition at table creation time. If the user-defined function is redefined in the future, this does not update the column’s default expression.

    The simple expression cannot contain references to:

    • Subqueries.

    • Aggregates.

    • Window functions.

    • Secure UDFs.

    • UDFs written in languages other than SQL (e.g. Java, JavaScript).

    • External functions.

{ AUTOINCREMENT | IDENTITY } [ { ( start_num , step_num ) | START num INCREMENT num } ]

AUTOINCREMENT and IDENTITY are synonymous. When either is used, the default value for the column starts with a specified number and each successive value automatically increments by the specified amount.

Caution

Snowflake uses a sequence to generate the values for an auto-incremented column. Sequences have limitations; see Sequence Semantics.

The default value for both start and step/increment is 1.

AUTOINCREMENT and IDENTITY can be used only for columns with numeric data types.

Default: No value (the column has no default value)

Note

DEFAULT and AUTOINCREMENT are mutually exclusive; only one can be specified for a column.

MASKING POLICY = policy_name

Specifies the masking policy to set on a column.

COMMENT 'string_literal'

Specifies a comment for the column.

(Note that comments can be specified at the column level or the table level. The syntax for each is slightly different.)

USING ( col_name , cond_col_1 ... )

Specifies the arguments to pass into the conditional masking policy SQL expression.

The first column in the list specifies the column for the policy conditions to mask or tokenize the data and must match the column to which the masking policy is set.

The additional columns specify the columns to evaluate to determine whether to mask or tokenize the data in each row of the query result when a query is made on the first column.

If the USING clause is omitted, Snowflake treats the conditional masking policy as a normal masking policy.

CLUSTER BY ( expr [ , expr , ... ] )

Specifies one or more columns or column expressions in the table as the clustering key. For more details, see Clustering Keys & Clustered Tables.

Default: No value (no clustering key is defined for the table)

Important

Clustering keys are not intended or recommended for all tables; they typically benefit very large (i.e. multi-terabyte) tables.

Before you specify a clustering key for a table, you should understand micro-partitions. For more information, see Understanding Snowflake Table Structures.

STAGE_FILE_FORMAT = ( FORMAT_NAME = 'file_format_name' ) or . STAGE_FILE_FORMAT = ( TYPE = CSV | JSON | AVRO | ORC | PARQUET | XML [ ... ] )

Specifies the default file format for the table (for data loading and unloading), which can be either:

FORMAT_NAME = file_format_name

Specifies an existing named file format to use for loading/unloading data into the table. The named file format determines the format type (CSV, JSON, etc.), as well as any other format options, for data files. For more details, see CREATE FILE FORMAT.

TYPE = CSV | JSON | AVRO | ORC | PARQUET | XML [ ... ]

Specifies the type of files to load/unload into the table.

If a file format type is specified, additional format-specific options can be specified. For more details, see Format Type Options (in this topic).

Default: TYPE = CSV

Note

FORMAT_NAME and TYPE are mutually exclusive; to avoid unintended behavior, you should only specify one or the other when creating a table.

STAGE_COPY_OPTIONS = ( ... )

Specifies one (or more) options to use when loading data into the table. For more details, see Copy Options (in this topic).

DATA_RETENTION_TIME_IN_DAYS = integer

Specifies the retention period for the table so that Time Travel actions (SELECT, CLONE, UNDROP) can be performed on historical data in the table. For more details, see Understanding & Using Time Travel and Working with Temporary and Transient Tables.

For a detailed description of this object-level parameter, as well as more information about object parameters, see Parameters.

Values:

  • Standard Edition: 0 or 1

  • Enterprise Edition:

    • 0 to 90 for permanent tables

    • 0 or 1 for temporary and transient tables

Default:

  • Standard Edition: 1

  • Enterprise Edition (or higher): 1 (unless a different default value was specified at the schema, database, or account level)

Note

A value of 0 effectively disables Time Travel for the table.

MAX_DATA_EXTENSION_TIME_IN_DAYS = integer

Object parameter that specifies the maximum number of days for which Snowflake can extend the data retention period for the table to prevent streams on the table from becoming stale.

For a detailed description of this parameter, see MAX_DATA_EXTENSION_TIME_IN_DAYS.

CHANGE_TRACKING = { TRUE | FALSE }

Specifies whether to enable change tracking on the table.

  • TRUE enables change tracking on the table. This setting adds a pair of hidden columns to the source table and begins storing change tracking metadata in the columns. These columns consume a small amount of storage.

    The change tracking metadata can be queried using the CHANGES clause for SELECT statements, or by creating and querying one or more streams on the table.

  • FALSE does not enable change tracking on the table.

Default: FALSE

DEFAULT_DDL_COLLATION = 'collation_specification'

Specifies a default collation specification for the columns in the table, including columns added to the table in the future.

For more details about the parameter, see DEFAULT_DDL_COLLATION.

COPY GRANTS

Specifies to retain the access privileges from the original table when a new table is created using any of the following CREATE TABLE variants:

  • CREATE OR REPLACE TABLE

  • CREATE TABLE … LIKE

  • CREATE TABLE … CLONE

The parameter copies all privileges, except OWNERSHIP, from the existing table to the new table. The new table does not inherit any future grants defined for the object type in the schema. By default, the role that executes the CREATE TABLE statement owns the new table.

If the parameter is not included in the CREATE ICEBERG TABLE statement, then the new table does not inherit any explicit access privileges granted on the original table, but does inherit any future grants defined for the object type in the schema.

Note:

  • With data sharing:

    • If the existing table was shared to another account, the replacement table is also shared.

    • If the existing table was shared with your account as a data consumer, and access was further granted to other roles in the account (using GRANT IMPORTED PRIVILEGES on the parent database), access is also granted to the replacement table.

  • The SHOW GRANTS output for the replacement table lists the grantee for the copied privileges as the role that executed the CREATE ICEBERG TABLE statement, with the current timestamp when the statement was executed.

  • The operation to copy grants occurs atomically in the CREATE ICEBERG TABLE command (i.e. within the same transaction).

COMMENT = 'string_literal'

Specifies a comment for the table.

Default: No value

(Note that comments can be specified at the column level or the table level. The syntax for each is slightly different.)

ROW ACCESS POLICY policy_name ON ( col_name [ , col_name ... ] )

Specifies the row access policy to set on a table.

TAG ( tag_name = 'tag_value' [ , tag_name = 'tag_value' , ... ] )

Specifies the tag name and the tag string value.

The tag value is always a string, and the maximum number of characters for the tag value is 256.

For details about specifying tags in a statement, refer to Tag quotas for objects and columns.

Snowflake catalog parameters

CATALOG = 'SNOWFLAKE'

Specifies Snowflake as the Iceberg catalog. Snowflake handles all life-cycle maintenance, such as compaction, for the table.

BASE_LOCATION = 'relative_path_from_external_volume'

Specifies a relative path from the table’s EXTERNAL_VOLUME location to a directory where Snowflake can write table data and metadata.

This parameter cannot be changed after you create a table.

External catalog parameters (externalCatalogParams)

CATALOG = 'catalog_integration_name'

Specifies the identifier (name) of the catalog integration for this table.

If not specified, the Iceberg table defaults to the catalog integration for the schema, database, or account. The schema takes precedence over the database, and the database takes precedence over the account.

AWS Glue

CATALOG_TABLE_NAME = 'catalog_table_name'

Specifies the table name as recognized by your AWS Glue Data Catalog. For an example of using CATALOG_TABLE_NAME when you create an Iceberg table, see Examples (in this topic). This parameter cannot be changed after you create the table.

CATALOG_NAMESPACE = 'catalog_namespace'

Optionally specifies the namespace (for example, my_glue_database) for the AWS Glue Data Catalog source, and overrides the default catalog namespace specified with the catalog integration. By specifying a namespace at the table level, you can use a single catalog integration for AWS Glue to create Iceberg tables across different databases.

If not specified, the table uses the default catalog namespace associated with the catalog integration.

Iceberg files in object storage

METADATA_FILE_PATH = 'metadata_file_path'

Specifies the relative path of the Iceberg metadata file to use for column definitions. For example, if s3://mybucket_us_east_1/metadata/v1.metadata.json is the full path to your metadata file, and the external volume storage location is s3://mybucket_us_east_1/, specify metadata/v1.metadata.json as the value for METADATA_FILE_PATH.

Prior to Snowflake version 7.34, this parameter was called METADATA_FILE_NAME.

Note

Prior to Snowflake version 7.34, a parameter named BASE_LOCATION (also referred to as FILE_PATH in previous versions) was required to create a table from Iceberg files in object storage. The parameter specified a relative path from the table’s EXTERNAL_VOLUME location. With Snowflake versions 7.34 and later, you do not specify a BASE_LOCATION to create a table from Iceberg files in object storage.

You can continue to execute a script or statement that uses the earlier version of the CREATE ICEBERG TABLE syntax. However, doing so affects the value that you specify as the metadata-file-relative-path when you refresh the table. For more information, see ALTER ICEBERG TABLE … REFRESH.

Format Type Options (formatTypeOptions)

Format type options are used for loading data into and unloading data out of tables.

Depending on the file format type specified (STAGE_FILE_FORMAT = ( TYPE = ... )), you can include one or more of the following format-specific options (separated by blank spaces, commas, or new lines):

TYPE = CSV

COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE
Use

Data loading, data unloading, and external tables

Definition
  • When loading data, specifies the current compression algorithm for the data file. Snowflake uses this option to detect how an already-compressed data file was compressed so that the compressed data in the file can be extracted for loading.

  • When unloading data, compresses the data file using the specified compression algorithm.

Values

Supported Values

Notes

AUTO

When loading data, compression algorithm detected automatically, except for Brotli-compressed files, which cannot currently be detected automatically. When unloading data, files are automatically compressed using the default, which is gzip.

GZIP

BZ2

BROTLI

Must be specified when loading/unloading Brotli-compressed files.

ZSTD

Zstandard v0.8 (and higher) is supported.

DEFLATE

Deflate-compressed files (with zlib header, RFC1950).

RAW_DEFLATE

Raw Deflate-compressed files (without header, RFC1951).

NONE

When loading data, indicates that the files have not been compressed. When unloading data, specifies that the unloaded files are not compressed.

Default

AUTO

RECORD_DELIMITER = 'character' | NONE
Use

Data loading, data unloading, and external tables

Definition

One or more singlebyte or multibyte characters that separate records in an input file (data loading) or unloaded file (data unloading). Accepts common escape sequences or the following singlebyte or multibyte characters:

Singlebyte characters

Octal values (prefixed by \\) or hex values (prefixed by 0x or \x). For example, for records delimited by the circumflex accent (^) character, specify the octal (\\136) or hex (0x5e) value.

Multibyte characters

Hex values (prefixed by \x). For example, for records delimited by the cent (¢) character, specify the hex (\xC2\xA2) value.

The delimiter for RECORD_DELIMITER or FIELD_DELIMITER cannot be a substring of the delimiter for the other file format option (e.g. FIELD_DELIMITER = 'aa' RECORD_DELIMITER = 'aabb').

The specified delimiter must be a valid UTF-8 character and not a random sequence of bytes. Also note that the delimiter is limited to a maximum of 20 characters.

Also accepts a value of NONE.

Default
Data loading

New line character. Note that “new line” is logical such that \r\n will be understood as a new line for files on a Windows platform.

Data unloading

New line character (\n).

FIELD_DELIMITER = 'character' | NONE
Use

Data loading, data unloading, and external tables

Definition

One or more singlebyte or multibyte characters that separate fields in an input file (data loading) or unloaded file (data unloading). Accepts common escape sequences or the following singlebyte or multibyte characters:

Singlebyte characters

Octal values (prefixed by \\) or hex values (prefixed by 0x or \x). For example, for records delimited by the circumflex accent (^) character, specify the octal (\\136) or hex (0x5e) value.

Multibyte characters

Hex values (prefixed by \x). For example, for records delimited by the cent (¢) character, specify the hex (\xC2\xA2) value.

The delimiter for RECORD_DELIMITER or FIELD_DELIMITER cannot be a substring of the delimiter for the other file format option (e.g. FIELD_DELIMITER = 'aa' RECORD_DELIMITER = 'aabb').

Note

For non-ASCII characters, you must use the hex byte sequence value to get a deterministic behavior.

The specified delimiter must be a valid UTF-8 character and not a random sequence of bytes. Also note that the delimiter is limited to a maximum of 20 characters.

Also accepts a value of NONE.

Default

comma (,)

FILE_EXTENSION = 'string' | NONE
Use

Data unloading only

Definition

Specifies the extension for files unloaded to a stage. Accepts any extension. The user is responsible for specifying a file extension that can be read by any desired software or services.

Default

null, meaning the file extension is determined by the format type: .csv[compression], where compression is the extension added by the compression method, if COMPRESSION is set.

Note

If the SINGLE copy option is TRUE, then the COPY command unloads a file without a file extension by default. To specify a file extension, provide a file name and extension in the internal_location or external_location path (e.g. copy into @stage/data.csv).

PARSE_HEADER = TRUE | FALSE
Use

Data loading only

Definition

Boolean that specifies whether to use the first row headers in the data files to determine column names.

This file format option is applied to the following actions only:

  • Automatically detecting column definitions by using the INFER_SCHEMA function.

  • Loading CSV data into separate columns by using the INFER_SCHEMA function and MATCH_BY_COLUMN_NAME copy option.

If the option is set to TRUE, the first row headers will be used to determine column names. The default value FALSE will return column names as c*, where * is the position of the column.

Note that the SKIP_HEADER option is not supported with PARSE_HEADER = TRUE.

Default: FALSE

SKIP_HEADER = integer
Use

Data loading and external tables

Definition

Number of lines at the start of the file to skip.

Note that SKIP_HEADER does not use the RECORD_DELIMITER or FIELD_DELIMITER values to determine what a header line is; rather, it simply skips the specified number of CRLF (Carriage Return, Line Feed)-delimited lines in the file. RECORD_DELIMITER and FIELD_DELIMITER are then used to determine the rows of data to load.

Default

0

SKIP_BLANK_LINES = TRUE | FALSE
Use

Data loading and external tables

Definition

Boolean that specifies to skip any blank lines encountered in the data files; otherwise, blank lines produce an end-of-record error (default behavior).

Default: FALSE

DATE_FORMAT = 'string' | AUTO
Use

Data loading and unloading

Definition

Defines the format of date values in the data files (data loading) or table (data unloading). If a value is not specified or is AUTO, the value for the DATE_INPUT_FORMAT (data loading) or DATE_OUTPUT_FORMAT (data unloading) parameter is used.

Default

AUTO

TIME_FORMAT = 'string' | AUTO
Use

Data loading and unloading

Definition

Defines the format of time values in the data files (data loading) or table (data unloading). If a value is not specified or is AUTO, the value for the TIME_INPUT_FORMAT (data loading) or TIME_OUTPUT_FORMAT (data unloading) parameter is used.

Default

AUTO

TIMESTAMP_FORMAT = string' | AUTO
Use

Data loading and unloading

Definition

Defines the format of timestamp values in the data files (data loading) or table (data unloading). If a value is not specified or is AUTO, the value for the TIMESTAMP_INPUT_FORMAT (data loading) or TIMESTAMP_OUTPUT_FORMAT (data unloading) parameter is used.

Default

AUTO

BINARY_FORMAT = HEX | BASE64 | UTF8
Use

Data loading and unloading

Definition

Defines the encoding format for binary input or output. The option can be used when loading data into or unloading data from binary columns in a table.

Default

HEX

ESCAPE = 'character' | NONE
Use

Data loading and unloading

Definition

A singlebyte character string used as the escape character for enclosed or unenclosed field values. An escape character invokes an alternative interpretation on subsequent characters in a character sequence. You can use the ESCAPE character to interpret instances of the FIELD_OPTIONALLY_ENCLOSED_BY character in the data as literals.

Accepts common escape sequences, octal values, or hex values.

Loading data

Specifies the escape character for enclosed fields only. Specify the character used to enclose fields by setting FIELD_OPTIONALLY_ENCLOSED_BY.

Note

This file format option supports singlebyte characters only. Note that UTF-8 character encoding represents high-order ASCII characters as multibyte characters. If your data file is encoded with the UTF-8 character set, you cannot specify a high-order ASCII character as the option value.

In addition, if you specify a high-order ASCII character, we recommend that you set the ENCODING = 'string' file format option as the character encoding for your data files to ensure the character is interpreted correctly.

Unloading data

If this option is set, it overrides the escape character set for ESCAPE_UNENCLOSED_FIELD.

Default

NONE

ESCAPE_UNENCLOSED_FIELD = 'character' | NONE
Use

Data loading, data unloading, and external tables

Definition

A singlebyte character string used as the escape character for unenclosed field values only. An escape character invokes an alternative interpretation on subsequent characters in a character sequence. You can use the ESCAPE character to interpret instances of the FIELD_DELIMITER or RECORD_DELIMITER characters in the data as literals. The escape character can also be used to escape instances of itself in the data.

Accepts common escape sequences, octal values, or hex values.

Loading data

Specifies the escape character for unenclosed fields only.

Note

  • The default value is \\. If a row in a data file ends in the backslash (\) character, this character escapes the newline or carriage return character specified for the RECORD_DELIMITER file format option. As a result, the load operation treats this row and the next row as a single row of data. To avoid this issue, set the value to NONE.

  • This file format option supports singlebyte characters only. Note that UTF-8 character encoding represents high-order ASCII characters as multibyte characters. If your data file is encoded with the UTF-8 character set, you cannot specify a high-order ASCII character as the option value.

    In addition, if you specify a high-order ASCII character, we recommend that you set the ENCODING = 'string' file format option as the character encoding for your data files to ensure the character is interpreted correctly.

Unloading data

If ESCAPE is set, the escape character set for that file format option overrides this option.

Default

backslash (\\)

TRIM_SPACE = TRUE | FALSE
Use

Data loading and external tables

Definition

Boolean that specifies whether to remove white space from fields.

For example, if your external database software encloses fields in quotes, but inserts a leading space, Snowflake reads the leading space rather than the opening quotation character as the beginning of the field (i.e. the quotation marks are interpreted as part of the string of field data). Set this option to TRUE to remove undesirable spaces during the data load.

As another example, if leading or trailing spaces surround quotes that enclose strings, you can remove the surrounding spaces using this option and the quote character using the FIELD_OPTIONALLY_ENCLOSED_BY option. Note that any spaces within the quotes are preserved. For example, assuming FIELD_DELIMITER = '|' and FIELD_OPTIONALLY_ENCLOSED_BY = '"':

|"Hello world"|    /* loads as */  >Hello world<
|" Hello world "|  /* loads as */  > Hello world <
| "Hello world" |  /* loads as */  >Hello world<
Copy

(the brackets in this example are not loaded; they are used to demarcate the beginning and end of the loaded strings)

Default

FALSE

FIELD_OPTIONALLY_ENCLOSED_BY = 'character' | NONE
Use

Data loading, data unloading, and external tables

Definition

Character used to enclose strings. Value can be NONE, single quote character ('), or double quote character ("). To use the single quote character, use the octal or hex representation (0x27) or the double single-quoted escape ('').

When a field contains this character, escape it using the same character. For example, if the value is the double quote character and a field contains the string A "B" C, escape the double quotes as follows:

A ""B"" C

Default

NONE

NULL_IF = ( 'string1' [ , 'string2' , ... ] )
Use

Data loading, data unloading, and external tables

Definition

String used to convert to and from SQL NULL:

  • When loading data, Snowflake replaces these values in the data load source with SQL NULL. To specify more than one string, enclose the list of strings in parentheses and use commas to separate each value.

    Note that Snowflake converts all instances of the value to NULL, regardless of the data type. For example, if 2 is specified as a value, all instances of 2 as either a string or number are converted.

    For example:

    NULL_IF = ('\\N', 'NULL', 'NUL', '')

    Note that this option can include empty strings.

  • When unloading data, Snowflake converts SQL NULL values to the first value in the list.

Default

\\N (i.e. NULL, which assumes the ESCAPE_UNENCLOSED_FIELD value is \\)

ERROR_ON_COLUMN_COUNT_MISMATCH = TRUE | FALSE
Use

Data loading only

Definition

Boolean that specifies whether to generate a parsing error if the number of delimited columns (i.e. fields) in an input file does not match the number of columns in the corresponding table.

If set to FALSE, an error is not generated and the load continues. If the file is successfully loaded:

  • If the input file contains records with more fields than columns in the table, the matching fields are loaded in order of occurrence in the file and the remaining fields are not loaded.

  • If the input file contains records with fewer fields than columns in the table, the non-matching columns in the table are loaded with NULL values.

This option assumes all the records within the input file are the same length (i.e. a file containing records of varying length return an error regardless of the value specified for this parameter).

Default

TRUE

Note

When transforming data during loading (i.e. using a query as the source for the COPY command), this option is ignored. There is no requirement for your data files to have the same number and ordering of columns as your target table.

REPLACE_INVALID_CHARACTERS = TRUE | FALSE
Use

Data loading only

Definition

Boolean that specifies whether to replace invalid UTF-8 characters with the Unicode replacement character ().

If set to TRUE, Snowflake replaces invalid UTF-8 characters with the Unicode replacement character.

If set to FALSE, the load operation produces an error when invalid UTF-8 character encoding is detected.

Default

FALSE

EMPTY_FIELD_AS_NULL = TRUE | FALSE
Use

Data loading, data unloading, and external tables

Definition
  • When loading data, specifies whether to insert SQL NULL for empty fields in an input file, which are represented by two successive delimiters (e.g. ,,).

    If set to FALSE, Snowflake attempts to cast an empty field to the corresponding column type. An empty string is inserted into columns of type STRING. For other column types, the COPY command produces an error.

  • When unloading data, this option is used in combination with FIELD_OPTIONALLY_ENCLOSED_BY. When FIELD_OPTIONALLY_ENCLOSED_BY = NONE, setting EMPTY_FIELD_AS_NULL = FALSE specifies to unload empty strings in tables to empty string values without quotes enclosing the field values.

    If set to TRUE, FIELD_OPTIONALLY_ENCLOSED_BY must specify a character to enclose strings.

Default

TRUE

SKIP_BYTE_ORDER_MARK = TRUE | FALSE
Use

Data loading only

Definition

Boolean that specifies whether to skip the BOM (byte order mark), if present in a data file. A BOM is a character code at the beginning of a data file that defines the byte order and encoding form.

If set to FALSE, Snowflake recognizes any BOM in data files, which could result in the BOM either causing an error or being merged into the first column in the table.

Default

TRUE

ENCODING = 'string'
Use

Data loading and external tables

Definition

String (constant) that specifies the character set of the source data when loading data into a table.

Character Set

ENCODING Value

Supported Languages

Notes

Big5

BIG5

Traditional Chinese

EUC-JP

EUCJP

Japanese

EUC-KR

EUCKR

Korean

GB18030

GB18030

Chinese

IBM420

IBM420

Arabic

IBM424

IBM424

Hebrew

IBM949

IBM949

Korean

ISO-2022-CN

ISO2022CN

Simplified Chinese

ISO-2022-JP

ISO2022JP

Japanese

ISO-2022-KR

ISO2022KR

Korean

ISO-8859-1

ISO88591

Danish, Dutch, English, French, German, Italian, Norwegian, Portuguese, Swedish

ISO-8859-2

ISO88592

Czech, Hungarian, Polish, Romanian

ISO-8859-5

ISO88595

Russian

ISO-8859-6

ISO88596

Arabic

ISO-8859-7

ISO88597

Greek

ISO-8859-8

ISO88598

Hebrew

ISO-8859-9

ISO88599

Turkish

ISO-8859-15

ISO885915

Danish, Dutch, English, French, German, Italian, Norwegian, Portuguese, Swedish

Identical to ISO-8859-1 except for 8 characters, including the Euro currency symbol.

KOI8-R

KOI8R

Russian

Shift_JIS

SHIFTJIS

Japanese

UTF-8

UTF8

All languages

For loading data from delimited files (CSV, TSV, etc.), UTF-8 is the default. . . For loading data from all other supported file formats (JSON, Avro, etc.), as well as unloading data, UTF-8 is the only supported character set.

UTF-16

UTF16

All languages

UTF-16BE

UTF16BE

All languages

UTF-16LE

UTF16LE

All languages

UTF-32

UTF32

All languages

UTF-32BE

UTF32BE

All languages

UTF-32LE

UTF32LE

All languages

windows-949

WINDOWS949

Korean

windows-1250

WINDOWS1250

Czech, Hungarian, Polish, Romanian

windows-1251

WINDOWS1251

Russian

windows-1252

WINDOWS1252

Danish, Dutch, English, French, German, Italian, Norwegian, Portuguese, Swedish

windows-1253

WINDOWS1253

Greek

windows-1254

WINDOWS1254

Turkish

windows-1255

WINDOWS1255

Hebrew

windows-1256

WINDOWS1256

Arabic

Default

UTF8

Note

Snowflake stores all data internally in the UTF-8 character set. The data is converted into UTF-8 before it is loaded into Snowflake.

TYPE = JSON

COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE
Use

Data loading and external tables

Definition
  • When loading data, specifies the current compression algorithm for the data file. Snowflake uses this option to detect how an already-compressed data file was compressed so that the compressed data in the file can be extracted for loading.

  • When unloading data, compresses the data file using the specified compression algorithm.

Values

Supported Values

Notes

AUTO

When loading data, compression algorithm detected automatically, except for Brotli-compressed files, which cannot currently be detected automatically. When unloading data, files are automatically compressed using the default, which is gzip.

GZIP

BZ2

BROTLI

Must be specified if loading/unloading Brotli-compressed files.

ZSTD

Zstandard v0.8 (and higher) is supported.

DEFLATE

Deflate-compressed files (with zlib header, RFC1950).

RAW_DEFLATE

Raw Deflate-compressed files (without header, RFC1951).

NONE

When loading data, indicates that the files have not been compressed. When unloading data, specifies that the unloaded files are not compressed.

Default

AUTO

DATE_FORMAT = 'string' | AUTO
Use

Data loading only

Definition

Defines the format of date string values in the data files. If a value is not specified or is AUTO, the value for the DATE_INPUT_FORMAT parameter is used.

This file format option is applied to the following actions only:

  • Loading JSON data into separate columns using the MATCH_BY_COLUMN_NAME copy option.

  • Loading JSON data into separate columns by specifying a query in the COPY statement (i.e. COPY transformation).

Default

AUTO

TIME_FORMAT = 'string' | AUTO
Use

Data loading only

Definition

Defines the format of time string values in the data files. If a value is not specified or is AUTO, the value for the TIME_INPUT_FORMAT parameter is used.

This file format option is applied to the following actions only:

  • Loading JSON data into separate columns using the MATCH_BY_COLUMN_NAME copy option.

  • Loading JSON data into separate columns by specifying a query in the COPY statement (i.e. COPY transformation).

Default

AUTO

TIMESTAMP_FORMAT = string' | AUTO
Use

Data loading only

Definition

Defines the format of timestamp string values in the data files. If a value is not specified or is AUTO, the value for the TIMESTAMP_INPUT_FORMAT parameter is used.

This file format option is applied to the following actions only:

  • Loading JSON data into separate columns using the MATCH_BY_COLUMN_NAME copy option.

  • Loading JSON data into separate columns by specifying a query in the COPY statement (i.e. COPY transformation).

Default

AUTO

BINARY_FORMAT = HEX | BASE64 | UTF8
Use

Data loading only

Definition

Defines the encoding format for binary string values in the data files. The option can be used when loading data into binary columns in a table.

This file format option is applied to the following actions only:

  • Loading JSON data into separate columns using the MATCH_BY_COLUMN_NAME copy option.

  • Loading JSON data into separate columns by specifying a query in the COPY statement (i.e. COPY transformation).

Default

HEX

TRIM_SPACE = TRUE | FALSE
Use

Data loading only

Definition

Boolean that specifies whether to remove leading and trailing white space from strings.

For example, if your external database software encloses fields in quotes, but inserts a leading space, Snowflake reads the leading space rather than the opening quotation character as the beginning of the field (i.e. the quotation marks are interpreted as part of the string of field data). Set this option to TRUE to remove undesirable spaces during the data load.

This file format option is applied to the following actions only when loading JSON data into separate columns using the MATCH_BY_COLUMN_NAME copy option.

Default

FALSE

NULL_IF = ( 'string1' [ , 'string2' , ... ] )
Use

Data loading only

Definition

String used to convert to and from SQL NULL. Snowflake replaces these strings in the data load source with SQL NULL. To specify more than one string, enclose the list of strings in parentheses and use commas to separate each value.

This file format option is applied to the following actions only when loading JSON data into separate columns using the MATCH_BY_COLUMN_NAME copy option.

Note that Snowflake converts all instances of the value to NULL, regardless of the data type. For example, if 2 is specified as a value, all instances of 2 as either a string or number are converted.

For example:

NULL_IF = ('\\N', 'NULL', 'NUL', '')

Note that this option can include empty strings.

Default

\\N (i.e. NULL, which assumes the ESCAPE_UNENCLOSED_FIELD value is \\)

FILE_EXTENSION = 'string' | NONE
Use

Data unloading only

Definition

Specifies the extension for files unloaded to a stage. Accepts any extension. The user is responsible for specifying a file extension that can be read by any desired software or services.

Default

null, meaning the file extension is determined by the format type: .json[compression], where compression is the extension added by the compression method, if COMPRESSION is set.

ENABLE_OCTAL = TRUE | FALSE
Use

Data loading only

Definition

Boolean that enables parsing of octal numbers.

Default

FALSE

ALLOW_DUPLICATE = TRUE | FALSE
Use

Data loading and external tables

Definition

Boolean that specifies to allow duplicate object field names (only the last one will be preserved).

Default

FALSE

STRIP_OUTER_ARRAY = TRUE | FALSE
Use

Data loading and external tables

Definition

Boolean that instructs the JSON parser to remove outer brackets (i.e. [ ]).

Default

FALSE

STRIP_NULL_VALUES = TRUE | FALSE
Use

Data loading and external tables

Definition

Boolean that instructs the JSON parser to remove object fields or array elements containing null values. For example, when set to TRUE:

Before

After

[null]

[]

[null,null,3]

[,,3]

{"a":null,"b":null,"c":123}

{"c":123}

{"a":[1,null,2],"b":{"x":null,"y":88}}

{"a":[1,,2],"b":{"y":88}}

Default

FALSE

REPLACE_INVALID_CHARACTERS = TRUE | FALSE
Use

Data loading and external table

Definition

Boolean that specifies whether to replace invalid UTF-8 characters with the Unicode replacement character (). This option performs a one-to-one character replacement.

Values

If set to TRUE, Snowflake replaces invalid UTF-8 characters with the Unicode replacement character.

If set to FALSE, the load operation produces an error when invalid UTF-8 character encoding is detected.

Default

FALSE

IGNORE_UTF8_ERRORS = TRUE | FALSE
Use

Data loading and external table

Definition

Boolean that specifies whether UTF-8 encoding errors produce error conditions. It is an alternative syntax for REPLACE_INVALID_CHARACTERS.

Values

If set to TRUE, any invalid UTF-8 sequences are silently replaced with the Unicode character U+FFFD (i.e. “replacement character”).

If set to FALSE, the load operation produces an error when invalid UTF-8 character encoding is detected.

Default

FALSE

SKIP_BYTE_ORDER_MARK = TRUE | FALSE
Use

Data loading only

Definition

Boolean that specifies whether to skip the BOM (byte order mark), if present in a data file. A BOM is a character code at the beginning of a data file that defines the byte order and encoding form.

If set to FALSE, Snowflake recognizes any BOM in data files, which could result in the BOM either causing an error or being merged into the first column in the table.

Default

TRUE

TYPE = AVRO

COMPRESSION = AUTO | GZIP | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE
Use

Data loading only

Definition
  • When loading data, specifies the current compression algorithm for the data file. Snowflake uses this option to detect how an already-compressed data file was compressed so that the compressed data in the file can be extracted for loading.

  • When unloading data, compresses the data file using the specified compression algorithm.

Values

Supported Values

Notes

AUTO

When loading data, compression algorithm detected automatically, except for Brotli-compressed files, which cannot currently be detected automatically. When unloading data, files are automatically compressed using the default, which is gzip.

GZIP

BROTLI

Must be specified if loading/unloading Brotli-compressed files.

ZSTD

Zstandard v0.8 (and higher) is supported.

DEFLATE

Deflate-compressed files (with zlib header, RFC1950).

RAW_DEFLATE

Raw Deflate-compressed files (without header, RFC1951).

NONE

When loading data, indicates that the files have not been compressed. When unloading data, specifies that the unloaded files are not compressed.

Default

AUTO.

Note

We recommend that you use the default AUTO option because it will determine both the file and codec compression. Specifying a compression option refers to the compression of files, not the compression of blocks (codecs).

TRIM_SPACE = TRUE | FALSE
Use

Data loading only

Definition

Boolean that specifies whether to remove leading and trailing white space from strings.

For example, if your external database software encloses fields in quotes, but inserts a leading space, Snowflake reads the leading space rather than the opening quotation character as the beginning of the field (i.e. the quotation marks are interpreted as part of the string of field data). Set this option to TRUE to remove undesirable spaces during the data load.

This file format option is applied to the following actions only when loading Avro data into separate columns using the MATCH_BY_COLUMN_NAME copy option.

Default

FALSE

REPLACE_INVALID_CHARACTERS = TRUE | FALSE
Use

Data loading and external table

Definition

Boolean that specifies whether to replace invalid UTF-8 characters with the Unicode replacement character (). This option performs a one-to-one character replacement.

Values

If set to TRUE, Snowflake replaces invalid UTF-8 characters with the Unicode replacement character.

If set to FALSE, the load operation produces an error when invalid UTF-8 character encoding is detected.

Default

FALSE

NULL_IF = ( 'string1' [ , 'string2' , ... ] )
Use

Data loading only

Definition

String used to convert to and from SQL NULL. Snowflake replaces these strings in the data load source with SQL NULL. To specify more than one string, enclose the list of strings in parentheses and use commas to separate each value.

This file format option is applied to the following actions only when loading Avro data into separate columns using the MATCH_BY_COLUMN_NAME copy option.

Note that Snowflake converts all instances of the value to NULL, regardless of the data type. For example, if 2 is specified as a value, all instances of 2 as either a string or number are converted.

For example:

NULL_IF = ('\\N', 'NULL', 'NUL', '')

Note that this option can include empty strings.

Default

\\N (i.e. NULL, which assumes the ESCAPE_UNENCLOSED_FIELD value is \\)

TYPE = ORC

TRIM_SPACE = TRUE | FALSE
Use

Data loading and external tables

Definition

Boolean that specifies whether to remove leading and trailing white space from strings.

For example, if your external database software encloses fields in quotes, but inserts a leading space, Snowflake reads the leading space rather than the opening quotation character as the beginning of the field (i.e. the quotation marks are interpreted as part of the string of field data). Set this option to TRUE to remove undesirable spaces during the data load.

This file format option is applied to the following actions only when loading Orc data into separate columns using the MATCH_BY_COLUMN_NAME copy option.

Default

FALSE

REPLACE_INVALID_CHARACTERS = TRUE | FALSE
Use

Data loading and external table

Definition

Boolean that specifies whether to replace invalid UTF-8 characters with the Unicode replacement character (). This option performs a one-to-one character replacement.

Values

If set to TRUE, Snowflake replaces invalid UTF-8 characters with the Unicode replacement character.

If set to FALSE, the load operation produces an error when invalid UTF-8 character encoding is detected.

Default

FALSE

NULL_IF = ( 'string1' [ , 'string2' , ... ] )
Use

Data loading and external tables

Definition

String used to convert to and from SQL NULL. Snowflake replaces these strings in the data load source with SQL NULL. To specify more than one string, enclose the list of strings in parentheses and use commas to separate each value.

This file format option is applied to the following actions only when loading Orc data into separate columns using the MATCH_BY_COLUMN_NAME copy option.

Note that Snowflake converts all instances of the value to NULL, regardless of the data type. For example, if 2 is specified as a value, all instances of 2 as either a string or number are converted.

For example:

NULL_IF = ('\\N', 'NULL', 'NUL', '')

Note that this option can include empty strings.

Default

\\N (i.e. NULL, which assumes the ESCAPE_UNENCLOSED_FIELD value is \\)

TYPE = PARQUET

COMPRESSION = AUTO | LZO | SNAPPY | NONE
Use

Data loading, data unloading, and external tables

Definition

  • When loading data, specifies the current compression algorithm for columns in the Parquet files.

  • When unloading data, compresses the data file using the specified compression algorithm.

Values

Supported Values

Notes

AUTO

When loading data, compression algorithm detected automatically. Supports the following compression algorithms: Brotli, gzip, Lempel-Ziv-Oberhumer (LZO), LZ4, Snappy, or Zstandard v0.8 (and higher). . When unloading data, unloaded files are compressed using the Snappy compression algorithm by default.

LZO

When unloading data, files are compressed using the Snappy algorithm by default. If unloading data to LZO-compressed files, specify this value.

SNAPPY

When unloading data, files are compressed using the Snappy algorithm by default. You can optionally specify this value.

NONE

When loading data, indicates that the files have not been compressed. When unloading data, specifies that the unloaded files are not compressed.

Default

AUTO

SNAPPY_COMPRESSION = TRUE | FALSE
Use

Data unloading only

Supported Values

Notes

AUTO

Unloaded files are compressed using the Snappy compression algorithm by default.

SNAPPY

May be specified if unloading Snappy-compressed files.

NONE

When loading data, indicates that the files have not been compressed. When unloading data, specifies that the unloaded files are not compressed.

Definition

Boolean that specifies whether unloaded file(s) are compressed using the SNAPPY algorithm.

Note

Deprecated. Use COMPRESSION = SNAPPY instead.

Limitations

Only supported for data unloading operations.

Default

TRUE

BINARY_AS_TEXT = TRUE | FALSE
Use

Data loading and external tables

Definition

Boolean that specifies whether to interpret columns with no defined logical data type as UTF-8 text. When set to FALSE, Snowflake interprets these columns as binary data.

Default

TRUE

Note

Snowflake recommends that you set BINARY_AS_TEXT to FALSE to avoid any potential conversion issues.

TRIM_SPACE = TRUE | FALSE
Use

Data loading only

Definition

Boolean that specifies whether to remove leading and trailing white space from strings.

For example, if your external database software encloses fields in quotes, but inserts a leading space, Snowflake reads the leading space rather than the opening quotation character as the beginning of the field (i.e. the quotation marks are interpreted as part of the string of field data). Set this option to TRUE to remove undesirable spaces during the data load.

This file format option is applied to the following actions only when loading Parquet data into separate columns using the MATCH_BY_COLUMN_NAME copy option.

Default

FALSE

USE_LOGICAL_TYPE = TRUE | FALSE
Use

Data loading, data querying in staged files, and schema detection.

Definition

Boolean that specifies whether to use Parquet logical types. With this file format option, Snowflake can interpret Parquet logical types during data loading. For more information, see Parquet Logical Type Definitions. To enable Parquet logical types, set USE_LOGICAL_TYPE as TRUE when you create a new file format option.

Limitations

Not supported for data unloading.

REPLACE_INVALID_CHARACTERS = TRUE | FALSE
Use

Data loading and external table

Definition

Boolean that specifies whether to replace invalid UTF-8 characters with the Unicode replacement character (). This option performs a one-to-one character replacement.

Values

If set to TRUE, Snowflake replaces invalid UTF-8 characters with the Unicode replacement character.

If set to FALSE, the load operation produces an error when invalid UTF-8 character encoding is detected.

Default

FALSE

NULL_IF = ( 'string1' [ , 'string2' , ... ] )
Use

Data loading only

Definition

String used to convert to and from SQL NULL. Snowflake replaces these strings in the data load source with SQL NULL. To specify more than one string, enclose the list of strings in parentheses and use commas to separate each value.

This file format option is applied to the following actions only when loading Parquet data into separate columns using the MATCH_BY_COLUMN_NAME copy option.

Note that Snowflake converts all instances of the value to NULL, regardless of the data type. For example, if 2 is specified as a value, all instances of 2 as either a string or number are converted.

For example:

NULL_IF = ('\\N', 'NULL', 'NUL', '')

Note that this option can include empty strings.

Default

\\N (i.e. NULL, which assumes the ESCAPE_UNENCLOSED_FIELD value is \\)

TYPE = XML

COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE
Use

Data loading only

Definition
  • When loading data, specifies the current compression algorithm for the data file. Snowflake uses this option to detect how an already-compressed data file was compressed so that the compressed data in the file can be extracted for loading.

  • When unloading data, compresses the data file using the specified compression algorithm.

Values

Supported Values

Notes

AUTO

When loading data, compression algorithm detected automatically, except for Brotli-compressed files, which cannot currently be detected automatically. When unloading data, files are automatically compressed using the default, which is gzip.

GZIP

BZ2

BROTLI

Must be specified if loading/unloading Brotli-compressed files.

ZSTD

Zstandard v0.8 (and higher) is supported.

DEFLATE

Deflate-compressed files (with zlib header, RFC1950).

RAW_DEFLATE

Raw Deflate-compressed files (without header, RFC1951).

NONE

When loading data, indicates that the files have not been compressed. When unloading data, specifies that the unloaded files are not compressed.

Default

AUTO

IGNORE_UTF8_ERRORS = TRUE | FALSE
Use

Data loading and external table

Definition

Boolean that specifies whether UTF-8 encoding errors produce error conditions. It is an alternative syntax for REPLACE_INVALID_CHARACTERS.

Values

If set to TRUE, any invalid UTF-8 sequences are silently replaced with the Unicode character U+FFFD (i.e. “replacement character”).

If set to FALSE, the load operation produces an error when invalid UTF-8 character encoding is detected.

Default

FALSE

PRESERVE_SPACE = TRUE | FALSE
Use

Data loading only

Definition

Boolean that specifies whether the XML parser preserves leading and trailing spaces in element content.

Default

FALSE

STRIP_OUTER_ELEMENT = TRUE | FALSE
Use

Data loading only

Definition

Boolean that specifies whether the XML parser strips out the outer XML element, exposing 2nd level elements as separate documents.

Default

FALSE

DISABLE_SNOWFLAKE_DATA = TRUE | FALSE
Use

Data loading only

Definition

Boolean that specifies whether the XML parser disables recognition of Snowflake semi-structured data tags.

Default

FALSE

DISABLE_AUTO_CONVERT = TRUE | FALSE
Use

Data loading only

Definition

Boolean that specifies whether the XML parser disables automatic conversion of numeric and Boolean values from text to native representation.

Default

FALSE

REPLACE_INVALID_CHARACTERS = TRUE | FALSE
Use

Data loading and external table

Definition

Boolean that specifies whether to replace invalid UTF-8 characters with the Unicode replacement character (). This option performs a one-to-one character replacement.

Values

If set to TRUE, Snowflake replaces invalid UTF-8 characters with the Unicode replacement character.

If set to FALSE, the load operation produces an error when invalid UTF-8 character encoding is detected.

Default

FALSE

SKIP_BYTE_ORDER_MARK = TRUE | FALSE
Use

Data loading only

Definition

Boolean that specifies whether to skip any BOM (byte order mark) present in an input file. A BOM is a character code at the beginning of a data file that defines the byte order and encoding form.

If set to FALSE, Snowflake recognizes any BOM in data files, which could result in the BOM either causing an error or being merged into the first column in the table.

Default

TRUE

Copy Options (copyOptions)

Copy options are used for loading data into and unloading data out of tables.

You can specify one or more of the following copy options (separated by blank spaces, commas, or new lines):

STAGE_COPY_OPTIONS = ( ... )

ON_ERROR = CONTINUE | SKIP_FILE | SKIP_FILE_num | 'SKIP_FILE_num%' | ABORT_STATEMENT
Use

Data loading only

Definition

String (constant) that specifies the error handling for the load operation.

Important

Carefully consider the ON_ERROR copy option value. The default value is appropriate in common scenarios, but is not always the best option.

Values
  • CONTINUE

    Continue to load the file if errors are found. The COPY statement returns an error message for a maximum of one error found per data file.

    Note that the difference between the ROWS_PARSED and ROWS_LOADED column values represents the number of rows that include detected errors. However, each of these rows could include multiple errors. To view all errors in the data files, use the VALIDATION_MODE parameter or query the VALIDATE function.

  • SKIP_FILE

    Skip a file when an error is found.

    Note that the SKIP_FILE action buffers an entire file whether errors are found or not. For this reason, SKIP_FILE is slower than either CONTINUE or ABORT_STATEMENT. Skipping large files due to a small number of errors could result in delays and wasted credits. When loading large numbers of records from files that have no logical delineation (e.g. the files were generated automatically at rough intervals), consider specifying CONTINUE instead.

    Additional patterns:

    SKIP_FILE_num (e.g. SKIP_FILE_10)

    Skip a file when the number of error rows found in the file is equal to or exceeds the specified number.

    'SKIP_FILE_num%' (e.g. 'SKIP_FILE_10%')

    Skip a file when the percentage of error rows found in the file exceeds the specified percentage.

  • ABORT_STATEMENT

    Abort the load operation if any error is found in a data file.

    Note that the load operation is not aborted if the data file cannot be found (e.g. because it does not exist or cannot be accessed), except when data files explicitly specified in the FILES parameter cannot be found.

Default
Bulk loading using COPY

ABORT_STATEMENT

Snowpipe

SKIP_FILE

SIZE_LIMIT = num
Use

Data loading only

Definition

Number (> 0) that specifies the maximum size (in bytes) of data to be loaded for a given COPY statement. When the threshold is exceeded, the COPY operation discontinues loading files. This option is commonly used to load a common group of files using multiple COPY statements. For each statement, the data load continues until the specified SIZE_LIMIT is exceeded, before moving on to the next statement.

For example, suppose a set of files in a stage path were each 10 MB in size. If multiple COPY statements set SIZE_LIMIT to 25000000 (25 MB), each would load 3 files. That is, each COPY operation would discontinue after the SIZE_LIMIT threshold was exceeded.

Note that at least one file is loaded regardless of the value specified for SIZE_LIMIT unless there is no file to be loaded.

Default

null (no size limit)

PURGE = TRUE | FALSE
Use

Data loading only

Definition

Boolean that specifies whether to remove the data files from the stage automatically after the data is loaded successfully.

If this option is set to TRUE, note that a best effort is made to remove successfully loaded data files. If the purge operation fails for any reason, no error is returned currently. We recommend that you list staged files periodically (using LIST) and manually remove successfully loaded files, if any exist.

Default

FALSE

RETURN_FAILED_ONLY = TRUE | FALSE
Use

Data loading only

Definition

Boolean that specifies whether to return only files that have failed to load in the statement result.

Default

FALSE

MATCH_BY_COLUMN_NAME = CASE_SENSITIVE | CASE_INSENSITIVE | NONE
Use

Data loading only

Definition

String that specifies whether to load semi-structured data into columns in the target table that match corresponding columns represented in the data.

This copy option is supported for the following data formats:

  • JSON

  • Avro

  • ORC

  • Parquet

For a column to match, the following criteria must be true:

  • The column represented in the data must have the exact same name as the column in the table. The copy option supports case sensitivity for column names. Column order does not matter.

  • The column in the table must have a data type that is compatible with the values in the column represented in the data. For example, string, number, and Boolean values can all be loaded into a variant column.

Values
CASE_SENSITIVE | CASE_INSENSITIVE

Load semi-structured data into columns in the target table that match corresponding columns represented in the data. Column names are either case-sensitive (CASE_SENSITIVE) or case-insensitive (CASE_INSENSITIVE).

The COPY operation verifies that at least one column in the target table matches a column represented in the data files. If a match is found, the values in the data files are loaded into the column or columns. If no match is found, a set of NULL values for each record in the files is loaded into the table.

Note

  • If additional non-matching columns are present in the data files, the values in these columns are not loaded.

  • If additional non-matching columns are present in the target table, the COPY operation inserts NULL values into these columns. These columns must support NULL values.

  • The COPY statement does not allow specifying a query to further transform the data during the load (i.e. COPY transformation).

NONE

The COPY operation loads the semi-structured data into a variant column or, if a query is included in the COPY statement, transforms the data.

Note

The following limitations currently apply:

  • MATCH_BY_COLUMN_NAME cannot be used with the VALIDATION_MODE parameter in a COPY statement to validate the staged data rather than load it into the target table.

  • Parquet data only. When MATCH_BY_COLUMN_NAME is set to CASE_SENSITIVE or CASE_INSENSITIVE, an empty column value (e.g. "col1": "") produces an error.

Default

NONE

ENFORCE_LENGTH = TRUE | FALSE
Use

Data loading only

Definition

Alternative syntax for TRUNCATECOLUMNS with reverse logic (for compatibility with other systems)

Boolean that specifies whether to truncate text strings that exceed the target column length:

  • If TRUE, the COPY statement produces an error if a loaded string exceeds the target column length.

  • If FALSE, strings are automatically truncated to the target column length.

This copy option supports CSV data, as well as string values in semi-structured data when loaded into separate columns in relational tables.

Note

  • If the length of the target string column is set to the maximum (e.g. VARCHAR (16777216)), an incoming string cannot exceed this length; otherwise, the COPY command produces an error.

  • This parameter is functionally equivalent to TRUNCATECOLUMNS, but has the opposite behavior. It is provided for compatibility with other databases. It is only necessary to include one of these two parameters in a COPY statement to produce the desired output.

Default

TRUE

TRUNCATECOLUMNS = TRUE | FALSE
Use

Data loading only

Definition

Alternative syntax for ENFORCE_LENGTH with reverse logic (for compatibility with other systems)

Boolean that specifies whether to truncate text strings that exceed the target column length:

  • If TRUE, strings are automatically truncated to the target column length.

  • If FALSE, the COPY statement produces an error if a loaded string exceeds the target column length.

This copy option supports CSV data, as well as string values in semi-structured data when loaded into separate columns in relational tables.

Note

  • If the length of the target string column is set to the maximum (e.g. VARCHAR (16777216)), an incoming string cannot exceed this length; otherwise, the COPY command produces an error.

  • This parameter is functionally equivalent to ENFORCE_LENGTH, but has the opposite behavior. It is provided for compatibility with other databases. It is only necessary to include one of these two parameters in a COPY statement to produce the desired output.

Default

FALSE

FORCE = TRUE | FALSE
Use

Data loading only

Definition

Boolean that specifies to load all files, regardless of whether they’ve been loaded previously and have not changed since they were loaded. Note that this option reloads files, potentially duplicating data in a table.

Default

FALSE

Access control requirements

A role used to execute this SQL command must have the following privileges at a minimum:

Privilege

Object

Notes

CREATE ICEBERG TABLE

Schema

CREATE EXTERNAL VOLUME

Account

Required to create a new external volume.

USAGE

External Volume

Required to reference an existing external volume.

CREATE INTEGRATION

Account

Required to create a new catalog integration.

USAGE

Catalog integration

Required to reference an existing catalog integration.

Note that operating on any object in a schema also requires the USAGE privilege on the parent database and schema.

For instructions on creating a custom role with a specified set of privileges, see Creating Custom Roles.

For general information about roles and privilege grants for performing SQL actions on securable objects, see Overview of Access Control.

Usage notes

  • A schema cannot contain tables and/or views with the same name. When creating a table:

    • If a view with the same name already exists in the schema, an error is returned and the table is not created.

    • If a table with the same name already exists in the schema, an error is returned and the table is not created, unless the optional OR REPLACE keyword is included in the command.

  • CREATE OR REPLACE <object> statements are atomic. That is, when an object is replaced, the old object is deleted and the new object is created in a single transaction.

    This means that any queries concurrent with the CREATE OR REPLACE ICEBERG TABLE operation use either the old or new table version.

  • Similar to reserved keywords, ANSI-reserved function names (CURRENT_DATE, CURRENT_TIMESTAMP, etc.) cannot be used as column names.

  • Recreating a table (using the optional OR REPLACE keyword) drops its history, which makes any stream on the table stale. A stale stream is unreadable.

  • Cross-cloud and cross-region Iceberg tables are not currently supported. If CREATE ICEBERG TABLE returns an error message like "External volume <volume_name> must have a STORAGE_LOCATION defined in the local region ...", make sure that your external volume specifies a storage location in the same region as your Snowflake account.

  • If you created your external volume or catalog integration using a double-quoted identifier, you must specify the identifier exactly as created (including the double quotes) in your CREATE ICEBERG TABLE statement. Failure to include the quotes might result in an Object does not exist error (or similar type of error).

    To view an example, see the Examples (in this topic) section.

  • Prior to Snowflake version 7.34, a parameter named BASE_LOCATION (also referred to as FILE_PATH in previous versions) was required to create a table from Iceberg files in object storage. The parameter specified a relative path from the table’s EXTERNAL_VOLUME location. With Snowflake versions 7.34 and later, you do not specify a BASE_LOCATION to create a table from Iceberg files in object storage.

    You can continue to execute a script or statement that uses the earlier version of the CREATE ICEBERG TABLE syntax. However, doing so affects the value that you specify as the metadata-file-relative-path when you refresh the table. For more information, see ALTER ICEBERG TABLE … REFRESH.

  • CREATE TABLE … LIKE:

    • If the source table has clustering keys then the new table has clustering keys. By default, Automatic Clustering is not suspended for the new table even if Automatic Clustering was suspended for the source table.

  • CREATE TABLE … AS SELECT (CTAS):

    • When clustering keys are specified in a CTAS statement:

      • Column definitions are required and must be explicitly specified in the statement.

      • By default, Automatic Clustering is not suspended for the new table – even if Automatic Clustering is suspended for the source table.

      The ORDER BY sub-clause in a CREATE TABLE statement does not affect the order of the rows returned by future SELECT statements on that table. To specify the order of rows in future SELECT statements, use an ORDER BY sub-clause in those statements.

  • Regarding metadata:

    Attention

    Customers should ensure that no personal data (other than for a User object), sensitive data, export-controlled data, or other regulated data is entered as metadata when using the Snowflake service. For more information, see Metadata Fields in Snowflake.

Examples

Create an Iceberg table with Snowflake as the catalog

This example creates an Iceberg table with Snowflake as the Iceberg catalog. The resulting table is managed by Snowflake and supports read and write access. The statement specifies a value for the BASE_LOCATION parameter. This tells Snowflake where to write table data and metadata on the external volume.

CREATE ICEBERG TABLE myTable  (amount NUMBER)
  CATALOG='SNOWFLAKE'
  EXTERNAL_VOLUME='myIcebergVolume'
  BASE_LOCATION='relative_path_from_external_volume';
Copy

Create an Iceberg table with AWS Glue as the catalog

This example creates an Iceberg table that uses the AWS Glue Data Catalog. To override the default catalog namespace and set a catalog namespace for the table, the statement uses the optional CATALOG_NAMESPACE parameter.

CREATE ICEBERG TABLE myGlueTable
  EXTERNAL_VOLUME='glueCatalogVolume'
  CATALOG='glueCatalogInt'
  CATALOG_TABLE_NAME='myGlueTable'
  CATALOG_NAMESPACE='icebergcatalogdb2';
Copy

Create an Iceberg table from Iceberg metadata in object storage

This example creates an Iceberg table from Iceberg metadata stored in external cloud storage. It also specifies a relative path to the table metadata on the external volume.

CREATE ICEBERG TABLE myIcebergTable
  EXTERNAL_VOLUME='icebergMetadataVolume'
  CATALOG='icebergCatalogInt'
  METADATA_FILE_PATH='path/to/metadata/v1.metadata.json';
Copy

Specify an external volume or catalog integration with a double-quoted identifier

This example creates an Iceberg table with an external volume and catalog integration whose identifiers contain double quotes. Identifiers enclosed in double quotes are case-sensitive and often contain special characters.

The identifiers "exvol_lower" and "catint_lower" are specified exactly as created (including the double quotes). Failure to include the quotes might result in an Object does not exist error (or similar type of error).

To learn more, see Double-quoted identifiers.

CREATE OR REPLACE ICEBERG TABLE itable_with_quoted_catalog
  EXTERNAL_VOLUME = '"exvol_lower"'
  CATALOG = '"catint_lower"'
  METADATA_FILE_PATH='path/to/metadata/v1.metadata.json';
Copy