Categories:

DML Commands - Data Unloading

COPY INTO <location>

Unloads data from a table (or query) into one or more files in one of the following locations:

  • Named internal stage (or table/user stage). The files can then be downloaded from the stage/location using the GET command.

  • Named external stage that references an external location (Amazon S3, Google Cloud Storage, or Microsoft Azure).

  • External location (Amazon S3, Google Cloud Storage, or Microsoft Azure).

See also:

COPY INTO <table>

In this Topic:

Syntax

COPY INTO { internalStage | externalStage | externalLocation }
     FROM { [<namespace>.]<table_name> | ( <query> ) }
[ PARTITION BY <expr> ]
[ FILE_FORMAT = ( { FORMAT_NAME = '[<namespace>.]<file_format_name>' |
                    TYPE = { CSV | JSON | PARQUET } [ formatTypeOptions ] } ) ]
[ copyOptions ]
[ HEADER ]

Where:

internalStage ::=
    @[<namespace>.]<int_stage_name>[/<path>]
  | @[<namespace>.]%<table_name>[/<path>]
  | @~[/<path>]
externalStage ::=
  @[<namespace>.]<ext_stage_name>[/<path>]
externalLocation (for Amazon S3) ::=
  's3://<bucket>[/<path>]'
  [ { STORAGE_INTEGRATION = <integration_name> } | { CREDENTIALS = ( {  { AWS_KEY_ID = '<string>' AWS_SECRET_KEY = '<string>' [ AWS_TOKEN = '<string>' ] } } ) } ]
  [ ENCRYPTION = ( [ TYPE = 'AWS_CSE' ] [ MASTER_KEY = '<string>' ] |
                   [ TYPE = 'AWS_SSE_S3' ] |
                   [ TYPE = 'AWS_SSE_KMS' [ KMS_KEY_ID = '<string>' ] |
                   [ TYPE = NONE ] ) ]
externalLocation (for Google Cloud Storage) ::=
  'gcs://<bucket>[/<path>]'
  [ STORAGE_INTEGRATION = <integration_name> ]
  [ ENCRYPTION = ( [ TYPE = 'GCS_SSE_KMS' ] [ KMS_KEY_ID = '<string>' ] | [ TYPE = NONE ] ) ]
externalLocation (for Microsoft Azure) ::=
  'azure://<account>.blob.core.windows.net/<container>[/<path>]'
  [ { STORAGE_INTEGRATION = <integration_name> } | { CREDENTIALS = ( [ AZURE_SAS_TOKEN = <string> ] ) } ]
  [ ENCRYPTION = ( [ TYPE = { 'AZURE_CSE' | NONE } ] [ MASTER_KEY = '<string>' ] ) ]
formatTypeOptions ::=
-- If FILE_FORMAT = ( TYPE = CSV ... )
     COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE
     RECORD_DELIMITER = '<character>' | NONE
     FIELD_DELIMITER = '<character>' | NONE
     FILE_EXTENSION = '<string>'
     ESCAPE = '<character>' | NONE
     ESCAPE_UNENCLOSED_FIELD = '<character>' | NONE
     DATE_FORMAT = '<string>' | AUTO
     TIME_FORMAT = '<string>' | AUTO
     TIMESTAMP_FORMAT = '<string>' | AUTO
     BINARY_FORMAT = HEX | BASE64 | UTF8
     FIELD_OPTIONALLY_ENCLOSED_BY = '<character>' | NONE
     NULL_IF = ( '<string1>' [ , '<string2>' , ... ] )
     EMPTY_FIELD_AS_NULL = TRUE | FALSE
-- If FILE_FORMAT = ( TYPE = JSON ... )
     COMPRESSION = AUTO | GZIP | BZ2 | DEFLATE | RAW_DEFLATE | NONE
     FILE_EXTENSION = '<string>'
-- If FILE_FORMAT = ( TYPE = PARQUET ... )
     COMPRESSION = AUTO | SNAPPY | NONE
     SNAPPY_COMPRESSION = TRUE | FALSE
copyOptions ::=
     OVERWRITE = TRUE | FALSE
     SINGLE = TRUE | FALSE
     MAX_FILE_SIZE = <num>
     INCLUDE_QUERY_ID = TRUE | FALSE
     DETAILED_OUTPUT = TRUE | FALSE

Required Parameters

INTO ...

Specifies the internal or external location where the data files are unloaded:

@[namespace.]int_stage_name[/path]

Files are unloaded to the specified named internal stage.

@[namespace.]ext_stage_name[/path]

Files are unloaded to the specified named external stage.

@[namespace.]%table_name[/path]

Files are unloaded to the stage for the specified table.

@~[/path]

Files are unloaded to the stage for the current user.

's3://bucket[/path]'

Files are unloaded to the specified external location (S3 bucket). Additional parameters could be required. For details, see Additional Cloud Provider Parameters (in this topic). . . Note that S3 buckets in non-public AWS regions, such as government regions, are not supported.

'azure://account.blob.core.windows.net/container[/path]'

Files are unloaded to the specified external location (Azure container). Additional parameters could be required. For details, see Additional Cloud Provider Parameters (in this topic).

Where:

  • namespace is the database and/or schema in which the internal or external stage resides, in the form of database_name.schema_name or schema_name. It is optional if a database and schema are currently in use within the user session; otherwise, it is required.

  • The optional path parameter specifies a folder and filename prefix for the file(s) containing unloaded data. If a filename prefix is not included in path, the filenames for the generated data files are prefixed with data_.

    Relative path modifiers such as /./ and /../ are interpreted literally, because “paths” are literal prefixes for a name. For example:

    -- S3 bucket
    COPY INTO 's3://mybucket/./../a.csv' FROM mytable;
    
    -- Azure container
    COPY INTO 'azure://myaccount.blob.core.windows.net/mycontainer/./../a.csv' FROM mytable;
    

    In these COPY statements, Snowflake creates a file that is literally named ./../a.csv in the storage location.

Note

If the internal or external stage or path name includes special characters, including spaces, enclose the INTO ... string in single quotes.

FROM ...

Specifies the source of the data to be unloaded, which can either be a table or a query:

[namespace.]table_name

Specifies the name of the table from which data is unloaded.

Namespace optionally specifies the database and/or schema in which the table resides, in the form of database_name.schema_name or schema_name. It is optional if a database and schema are currently in use within the user session; otherwise, it is required.

( query )

SELECT statement that returns data to be unloaded into files. You can limit the number of rows returned by specifying a LIMIT / FETCH clause in the query.

Note

When casting column values to a data type using the CAST , :: function, verify the data type supports all of the column values. Values too long for the specified data type could be truncated.

Additional Cloud Provider Parameters

STORAGE_INTEGRATION = integration_name or . CREDENTIALS = ( cloud_specific_credentials )

For use in ad hoc COPY statements (statements that do not reference a named external stage). Specifies the security credentials for connecting to the cloud provider and accessing the private storage container where the unloaded files are staged.

Required only for unloading into an external private cloud storage location; not required for public buckets/containers

Amazon S3

STORAGE_INTEGRATION = integration_name

Specifies the name of the storage integration used to delegate authentication responsibility for external cloud storage to a Snowflake identity and access management (IAM) entity. For more details, see CREATE STORAGE INTEGRATION.

Note

We highly recommend the use of storage integrations. This option avoids the need to supply cloud storage credentials using the CREDENTIALS parameter when creating stages or loading data.

CREDENTIALS = ( AWS_KEY_ID = 'string' AWS_SECRET_KEY = 'string' [ AWS_TOKEN = 'string' ] ) or . CREDENTIALS = ( AWS_ROLE = 'string' )

Specifies the security credentials for connecting to AWS and accessing the private S3 bucket where the unloaded files are staged. For more information, see Configuring Secure Access to Amazon S3.

The credentials you specify depend on whether you associated the Snowflake access permissions for the bucket with an AWS IAM (Identity & Access Management) user or role:

  • IAM user: Temporary IAM credentials are required. Temporary (aka “scoped”) credentials are generated by AWS Security Token Service (STS) and consist of three components:

    • AWS_KEY_ID

    • AWS_SECRET_KEY

    • AWS_TOKEN

    All three are required to access a private bucket. After a designated period of time, temporary credentials expire and can no longer be used. You must then generate a new set of valid temporary credentials.

    Important

    COPY commands contain complex syntax and sensitive information, such as credentials. In addition, they are executed frequently and are often stored in scripts or worksheets, which could lead to sensitive information being inadvertently exposed. The COPY command allows permanent (aka “long-term”) credentials to be used; however, for security reasons, do not use permanent credentials in COPY commands. Instead, use temporary credentials.

    If you must use permanent credentials, use external stages, for which credentials are entered once and securely stored, minimizing the potential for exposure.

  • IAM role: Omit the security credentials and access keys and, instead, identify the role using AWS_ROLE and specify the AWS role ARN (Amazon Resource Name).

    Important

    The ability to use an AWS IAM role to access a private S3 bucket to load or unload data is now deprecated (i.e. support will be removed in a future release, TBD). We highly recommend modifying any existing S3 stages that use this feature to instead reference storage integration objects. For instructions, see Option 1: Configuring a Snowflake Storage Integration to Access Amazon S3.

Google Cloud Storage

STORAGE_INTEGRATION = integration_name

Specifies the name of the storage integration used to delegate authentication responsibility for external cloud storage to a Snowflake identity and access management (IAM) entity. For more details, see CREATE STORAGE INTEGRATION.

Microsoft Azure

STORAGE_INTEGRATION = integration_name

Specifies the name of the storage integration used to delegate authentication responsibility for external cloud storage to a Snowflake identity and access management (IAM) entity. For more details, see CREATE STORAGE INTEGRATION.

Note

We highly recommend the use of storage integrations. This option avoids the need to supply cloud storage credentials using the CREDENTIALS parameter when creating stages or loading data.

CREDENTIALS = ( AZURE_SAS_TOKEN = 'string' )

Specifies the SAS (shared access signature) token for connecting to Azure and accessing the private container where the files containing data are staged. Credentials are generated by Azure.

ENCRYPTION = ( cloud_specific_encryption )

For use in ad hoc COPY statements (statements that do not reference a named external stage). Required only for unloading data to files in encrypted storage locations

Amazon S3

ENCRYPTION = ( [ TYPE = 'AWS_CSE' ] [ MASTER_KEY = '<string>' ] | [ TYPE = 'AWS_SSE_S3' ] | [ TYPE = 'AWS_SSE_KMS' [ KMS_KEY_ID = '<string>' ] | [ TYPE = NONE ] )

TYPE = ...

Specifies the encryption type used. Possible values are:

  • AWS_CSE: Client-side encryption (requires a MASTER_KEY value). Currently, the client-side master key you provide can only be a symmetric key. Note that, when a MASTER_KEY value is provided, Snowflake assumes TYPE = AWS_CSE (i.e. when a MASTER_KEY value is provided, TYPE is not required).

  • AWS_SSE_S3: Server-side encryption that requires no additional encryption settings.

  • AWS_SSE_KMS: Server-side encryption that accepts an optional KMS_KEY_ID value.

For more information about the encryption types, see the AWS documentation for client-side encryption or server-side encryption.

MASTER_KEY = 'string' (applies to AWS_CSE encryption only)

Specifies the client-side master key used to encrypt the files in the bucket. The master key must be a 128-bit or 256-bit key in Base64-encoded form.

KMS_KEY_ID = 'string' (applies to AWS_SSE_KMS encryption only)

Optionally specifies the ID for the AWS KMS-managed key used to encrypt files unloaded into the bucket. If no value is provided, your default KMS key ID is used to encrypt files on unload.

Note that this value is ignored for data loading.

Google Cloud Storage

ENCRYPTION = ( [ TYPE = 'GCS_SSE_KMS' ] [ KMS_KEY_ID = '<string>' ] | [ TYPE = NONE ] )

TYPE = ...

Specifies the encryption type used. Possible values are:

  • GCS_SSE_KMS: Server-side encryption that accepts an optional KMS_KEY_ID value.

For more information, see the Google Cloud Platform documentation:

KMS_KEY_ID = 'string' (applies to GCS_SSE_KMS encryption only)

Optionally specifies the ID for the Cloud KMS-managed key that is used to encrypt files unloaded into the bucket. If no value is provided, your default KMS key ID set on the bucket is used to encrypt files on unload.

Note that this value is ignored for data loading. The load operation should succeed if the service account has sufficient permissions to decrypt data in the bucket.

Microsoft Azure

ENCRYPTION = ( [ TYPE = 'AZURE_CSE' | NONE ] [ MASTER_KEY = 'string' ] )

TYPE = ...

Specifies the encryption type used. Possible values are:

  • AZURE_CSE: Client-side encryption (requires a MASTER_KEY value). For information, see the Client-side encryption information in the Microsoft Azure documentation.

  • NONE: No encryption.

MASTER_KEY = 'string' (applies to AZURE_CSE encryption only)

Specifies the client-side master key used to encrypt files. The master key must be a 128-bit or 256-bit key in Base64-encoded form.

Optional Parameters

PARTITION BY expr

Specifies an expression used to partition the unloaded table rows into separate files. Supports any SQL expression that evaluates to a string.

The unload operation splits the table rows based on the partition expression and determines the number of files to create based on the amount of data and number of parallel operations, distributed among the compute resources in the warehouse.

Filenames include the partition column values. Individual filenames in each partition are identified with a universally unique identifier (UUID). The UUID is the query ID of the COPY statement used to unload the data files.

Caution

COPY INTO <location> statements write partition column values to the unloaded file names. We strongly recommend partitioning your data on common data types such as dates or timestamps rather than potentially sensitive string or integer values.

Note that file URLs are included in the internal logs that Snowflake maintains to aid in debugging issues when customers create Support cases. As a result, data in columns referenced in a PARTITION BY expression is also indirectly stored in internal logs. These logs might be processed outside of your deployment region. Hence, as a best practice, only include dates, timestamps, and Boolean data types in PARTITION BY expressions.

If you prefer to disable the PARTITION BY parameter in COPY INTO <location> statements for your account, please contact Snowflake Support.

Note that Snowflake provides a set of parameters to further restrict data unloading operations:

  • PREVENT_UNLOAD_TO_INLINE_URL prevents ad hoc data unload operations to external cloud storage locations (i.e. COPY INTO <location> statements that specify the cloud storage URL and access settings directly in the statement).

  • PREVENT_UNLOAD_TO_INTERNAL_STAGES prevents data unload operations to any internal stage, including user stages, table stages, or named internal stages.

For an example, see Partitioning Unloaded Rows to Parquet Files (in this topic).

Note

  • The following copy options are not supported in combination with PARTITION BY:

    • OVERWRITE

    • SINGLE

    • INCLUDE_QUERY_ID

  • If the PARTITION BY expression evaluates to NULL, the partition path in the output filename is _NULL_ (e.g. mystage/_NULL_/data_01234567-0123-1234-0000-000000001234_01_0_0.snappy.parquet).

  • When unloading to files of type PARQUET:

    • Small data files unloaded by parallel execution threads are merged automatically into a single file that matches the MAX_FILE_SIZE copy option value as closely as possible.

    • All row groups are 128 MB in size. A row group is a logical horizontal partitioning of the data into rows. There is no physical structure that is guaranteed for a row group. A row group consists of a column chunk for each column in the dataset.

    • The unload operation attempts to produce files as close in size to the MAX_FILE_SIZE copy option setting as possible. The default value for this copy option is 16 MB. Note that this behavior applies only when unloading data to Parquet files.

  • There is no option to omit the columns in the partition expression from the unloaded data files.

FILE_FORMAT = ( FORMAT_NAME = 'file_format_name' ) or . FILE_FORMAT = ( TYPE = CSV | JSON | PARQUET [ ... ] )

Specifies the format of the data files containing unloaded data:

FORMAT_NAME = 'file_format_name'

Specifies an existing named file format to use for unloading data from the table. The named file format determines the format type (CSV, JSON, PARQUET), as well as any other format options, for the data files. For more information, see CREATE FILE FORMAT.

TYPE = CSV | JSON | PARQUET [ ... ]

Specifies the type of files unloaded from the table.

If a format type is specified, additional format-specific options can be specified. For more details, see Format Type Options (in this topic).

Note

  • JSON can only be used to unload data from columns of type VARIANT (i.e. columns containing JSON data).

  • Currently, nested data in VARIANT columns cannot be unloaded successfully in Parquet format.

copyOptions

Specifies one or more copy options for the unloaded data. For more details, see Copy Options (in this topic).

HEADER = TRUE | FALSE

Specifies whether to include the table column headings in the output files.

  • Set this option to TRUE to include the table column headings to the output files.

    Note that if the COPY operation unloads the data to multiple files, the column headings are included in every file.

    When unloading data in Parquet format, the table column names are retained in the output files.

  • Set this option to FALSE to specify the following behavior:

    CSV

    Do not include table column headings in the output files.

    Parquet

    Include generic column headings (e.g. col1, col2, etc.) in the output files.

Default: FALSE

Format Type Options (formatTypeOptions)

Depending on the file format type specified (FILE_FORMAT = ( TYPE = ... )), you can include one or more of the following format-specific options (separated by blank spaces, commas, or new lines):

TYPE = CSV

COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE

String (constant) that specifies to compresses the unloaded data files using the specified compression algorithm.

Supported Values

Notes

AUTO

Unloaded files are automatically compressed using the default, which is gzip.

GZIP

BZ2

BROTLI

Must be specified when loading Brotli-compressed files.

ZSTD

Zstandard v0.8 (and higher) supported.

DEFLATE

Unloaded files are compressed using Deflate (with zlib header, RFC1950).

RAW_DEFLATE

Unloaded files are compressed using Raw Deflate (without header, RFC1951).

NONE

Unloaded files are not compressed.

Default: AUTO

RECORD_DELIMITER = 'character' | NONE

One or more singlebyte or multibyte characters that separate records in an unloaded file. Accepts common escape sequences or the following singlebyte or multibyte characters:

Singlebyte characters

Octal values (prefixed by \\) or hex values (prefixed by 0x or \x). For example, for records delimited by the circumflex accent (^) character, specify the octal (\\5e) or hex (0x5e) value.

Multibyte characters

Hex values (prefixed by \x). For example, for records delimited by the cent (¢) character, specify the hex (\xC2\xA2) value.

The delimiter for RECORD_DELIMITER or FIELD_DELIMITER cannot be a substring of the delimiter for the other file format option (e.g. FIELD_DELIMITER = 'aa' RECORD_DELIMITER = 'aabb').

The specified delimiter must be a valid UTF-8 character and not a random sequence of bytes. Also note that the delimiter is limited to a maximum of 20 characters.

Also accepts a value of NONE.

Default: New line character. Note that “new line” is logical such that \r\n is understood as a new line for files on a Windows platform.

FIELD_DELIMITER = 'character' | NONE

One or more singlebyte or multibyte characters that separate fields in an unloaded file. Accepts common escape sequences or the following singlebyte or multibyte characters:

Singlebyte characters

Octal values (prefixed by \\) or hex values (prefixed by 0x or \x). For example, for records delimited by the circumflex accent (^) character, specify the octal (\\5e) or hex (0x5e) value.

Multibyte characters

Hex values (prefixed by \x). For example, for records delimited by the cent (¢) character, specify the hex (\xC2\xA2) value.

The delimiter for RECORD_DELIMITER or FIELD_DELIMITER cannot be a substring of the delimiter for the other file format option (e.g. FIELD_DELIMITER = 'aa' RECORD_DELIMITER = 'aabb').

The specified delimiter must be a valid UTF-8 character and not a random sequence of bytes. Also note that the delimiter is limited to a maximum of 20 characters.

Also accepts a value of NONE.

Default: comma (,)

FILE_EXTENSION = 'string' | NONE

String that specifies the extension for files unloaded to a stage. Accepts any extension. The user is responsible for specifying a valid file extension that can be read by the desired software or service.

Note

If the SINGLE copy option is TRUE, then the COPY command unloads a file without a file extension by default. To specify a file extension, provide a file name and extension in the internal_location or external_location path. For example:

copy into @stage/data.csv ...

Default: null, meaning the file extension is determined by the format type, e.g. .csv[compression], where compression is the extension added by the compression method, if COMPRESSION is set.

DATE_FORMAT = 'string' | AUTO

String that defines the format of date values in the unloaded data files. If a value is not specified or is set to AUTO, the value for the DATE_OUTPUT_FORMAT parameter is used.

Default: AUTO

TIME_FORMAT = 'string' | AUTO

String that defines the format of time values in the unloaded data files. If a value is not specified or is set to AUTO, the value for the TIME_OUTPUT_FORMAT parameter is used.

Default: AUTO

TIMESTAMP_FORMAT = 'string' | AUTO

String that defines the format of timestamp values in the unloaded data files. If a value is not specified or is set to AUTO, the value for the TIMESTAMP_OUTPUT_FORMAT parameter is used.

Default: AUTO

BINARY_FORMAT = HEX | BASE64 | UTF8

String (constant) that defines the encoding format for binary output. The option can be used when unloading data from binary columns in a table.

Default: HEX

ESCAPE = 'character' | NONE
Use

Data loading and unloading

Definition

Single character string used as the escape character for enclosed or unenclosed field values. An escape character invokes an alternative interpretation on subsequent characters in a character sequence. You can use the ESCAPE character to interpret instances of the FIELD_OPTIONALLY_ENCLOSED_BY character in the data as literals. The escape character can also be used to escape instances of itself in the data.

Accepts common escape sequences, octal values, or hex values.

Specify the character used to enclose fields by setting FIELD_OPTIONALLY_ENCLOSED_BY.

If this option is set, it overrides the escape character set for ESCAPE_UNENCLOSED_FIELD.

Default

NONE

ESCAPE_UNENCLOSED_FIELD = 'character' | NONE
Use

Data loading and unloading

Definition

Single character string used as the escape character for unenclosed field values only. An escape character invokes an alternative interpretation on subsequent characters in a character sequence. You can use the ESCAPE character to interpret instances of the FIELD_DELIMITER or RECORD_DELIMITER characters in the data as literals. The escape character can also be used to escape instances of itself in the data.

Accepts common escape sequences, octal values, or hex values.

If ESCAPE is set, the escape character set for that file format option overrides this option.

Default

backslash (\\)

FIELD_OPTIONALLY_ENCLOSED_BY = 'character' | NONE

Character used to enclose strings. Value can be NONE, single quote character ('), or double quote character ("). To use the single quote character, use the octal or hex representation (0x27) or the double single-quoted escape ('').

When a field contains this character, escape it using the same character. For example, if the value is the double quote character and a field contains the string A "B" C, escape the double quotes as follows:

A ""B"" C

Default: NONE

NULL_IF = ( 'string1' [ , 'string2' ... ] )

String used to convert from SQL NULL. Snowflake converts SQL NULL values to the first value in the list.

Default: \\N (i.e. NULL, assuming ESCAPE_UNENCLOSED_FIELD=\\)

EMPTY_FIELD_AS_NULL = TRUE | FALSE

Used in combination with FIELD_OPTIONALLY_ENCLOSED_BY. When FIELD_OPTIONALLY_ENCLOSED_BY = NONE, setting EMPTY_FIELD_AS_NULL = FALSE specifies to unload empty strings in tables to empty string values without quotes enclosing the field values.

If set to TRUE, FIELD_OPTIONALLY_ENCLOSED_BY must specify a character to enclose strings.

Default: TRUE

TYPE = JSON

COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE

String (constant). Compresses the data file using the specified compression algorithm.

Supported Values

Notes

AUTO

Unloaded files are automatically compressed using the default, which is gzip.

GZIP

BZ2

BROTLI

ZSTD

DEFLATE

Unloaded files are compressed using Deflate (with zlib header, RFC1950).

RAW_DEFLATE

Unloaded files are compressed using Raw Deflate (without header, RFC1951).

NONE

Unloaded files are not compressed.

Default: AUTO

FILE_EXTENSION = 'string' | NONE

String that specifies the extension for files unloaded to a stage. Accepts any extension. The user is responsible for specifying a valid file extension that can be read by the desired software or service.

Default: null, meaning the file extension is determined by the format type (e.g. .csv[compression]), where compression is the extension added by the compression method, if COMPRESSION is set.

TYPE = PARQUET

COMPRESSION = AUTO | LZO | SNAPPY | NONE

String (constant). Compresses the data file using the specified compression algorithm.

Supported Values

Notes

AUTO

Files are compressed using Snappy, the default compression algorithm.

LZO

Files are compressed using the Snappy algorithm by default. If applying Lempel–Ziv–Oberhumer (LZO) compression instead, specify this value.

SNAPPY

Files are compressed using the Snappy algorithm by default. You can optionally specify this value.

NONE

Specifies that the unloaded files are not compressed.

Default: AUTO

SNAPPY_COMPRESSION = TRUE | FALSE

Boolean that specifies whether the unloaded file(s) are compressed using the SNAPPY algorithm.

Note

Deprecated. Use COMPRESSION = SNAPPY instead.

Default: TRUE

Copy Options (copyOptions)

You can specify one or more of the following copy options (separated by blank spaces, commas, or new lines):

OVERWRITE = TRUE | FALSE
Definition

Boolean that specifies whether the COPY command overwrites existing files with matching names, if any, in the location where files are stored. The option does not remove any existing files that do not match the names of the files that the COPY command unloads.

In many cases, enabling this option helps prevent data duplication in the target stage when the same COPY INTO <location> statement is executed multiple times. However, when an unload operation writes multiple files to a stage, Snowflake appends a suffix that ensures each file name is unique across parallel execution threads (e.g. data_0_1_0). The number of parallel execution threads can vary between unload operations. If the files written by an unload operation do not have the same filenames as files written by a previous operation, SQL statements that include this copy option cannot replace the existing files, resulting in duplicate files.

In addition, in the rare event of a machine or network failure, the unload job is retried. In that scenario, the unload operation writes additional files to the stage without first removing any files that were previously written by the first attempt.

To avoid data duplication in the target stage, we recommend setting the INCLUDE_QUERY_ID = TRUE copy option instead of OVERWRITE = TRUE and removing all data files in the target stage and path (or using a different path for each unload operation) between each unload job.

Default

FALSE

SINGLE = TRUE | FALSE
Definition

Boolean that specifies whether to generate a single file or multiple files. If FALSE, a filename prefix must be included in path.

Important

If SINGLE = TRUE, then COPY ignores the FILE_EXTENSION file format option and outputs a file simply named data. To specify a file extension, provide a filename and extension in the internal or external location path. For example:

COPY INTO @mystage/data.csv ...

In addition, if the COMPRESSION file format option is also explicitly set to one of the supported compression algorithms (e.g. GZIP), then the specified internal or external location path must end in a filename with the corresponding file extension (e.g. gz) so that the file can be uncompressed using the appropriate tool. For example:

COPY INTO @mystage/data.gz ...

COPY INTO @mystage/data.csv.gz ...
Default

FALSE

MAX_FILE_SIZE = num
Definition

Number (> 0) that specifies the upper size limit (in bytes) of each file to be generated in parallel per thread. Note that the actual file size and number of files unloaded are determined by the total amount of data and number of nodes available for parallel processing.

Snowflake utilizes parallel execution to optimize performance. The number of threads cannot be modified.

Maximum: 5 GB (Amazon S3 , Google Cloud Storage, or Microsoft Azure stage)

Note

The COPY command unloads one set of table rows at a time. If you set a very small MAX_FILE_SIZE value, the amount of data in a set of rows could exceed the specified size.

Default

16777216 (16 MB)

INCLUDE_QUERY_ID = TRUE | FALSE
Definition

Boolean that specifies whether to uniquely identify unloaded files by including a universally unique identifier (UUID) in the filenames of unloaded data files. This option helps ensure that concurrent COPY statements do not overwrite unloaded files accidentally.

Values

If TRUE, a UUID is added to the names of unloaded files. The UUID is the query ID of the COPY statement used to unload the data files. The UUID is a segment of the filename: <path>/data_<uuid>_<name>.<extension>.

If FALSE, then a UUID is not added to the unloaded data files.

Note

  • INCLUDE_QUERY_ID = TRUE is not supported when any one of the following copy options is set:

    • SINGLE = TRUE

    • OVERWRITE = TRUE

    • PARTITION BY <expr>

  • In the rare event of a machine or network failure, the unload job is retried. In that scenario, the unload operation removes any files that were written to the stage with the UUID of the current query ID and then attempts to unload the data again. Any new files written to the stage have the retried query ID as the UUID.

Default

FALSE

DETAILED_OUTPUT = TRUE | FALSE
Definition

Boolean that specifies whether the command output should describe the unload operation or the individual files unloaded as a result of the operation.

Values
  • If TRUE, the command output includes a row for each file unloaded to the specified stage. Columns show the path and name for each file, its size, and the number of rows that were unloaded to the file.

  • If FALSE, the command output consists of a single row that describes the entire unload operation. Columns show the total amount of data unloaded from tables, before and after compression (if applicable), and the total number of rows that were unloaded.

Default

FALSE

Usage Notes

  • STORAGE_INTEGRATION or CREDENTIALS only applies if you are unloading directly into a private storage location (Amazon S3, Google Cloud Storage, or Microsoft Azure). If you are unloading into a public bucket, secure access is not required, and if you are unloading into a named external stage, the stage provides all the credential information required for accessing the bucket.

  • If referencing a file format in the current namespace, you can omit the single quotes around the format identifier.

  • JSON can be specified for TYPE only when unloading data from VARIANT columns in tables.

  • When unloading to files of type PARQUET:

    • VARIANT columns are converted into simple JSON strings in the output file.

    • Unloading TIMESTAMP_TZ or TIMESTAMP_LTZ data produces an error.

  • If the source table contains 0 rows, then the COPY operation does not unload a data file.

  • This SQL command does not return a warning when unloading into a non-empty storage location. To avoid unexpected behaviors when files in a storage location are consumed by data pipelines, we recommend only writing to empty storage locations.

  • A failed unload operation can still result in unloaded data files; for example, if the statement exceeds its timeout limit and is canceled. Also, a failed unload operation to cloud storage in a different region results in data transfer costs.

  • If a Column-level Security masking policy is set on a column, the masking policy is applied to the data resulting in unauthorized users seeing masked data in the column.

Examples

Unloading Data from a Table to Files in a Table Stage

Unload data from the orderstiny table into the table’s stage using a folder/filename prefix (result/data_), a named file format (myformat), and gzip compression:

COPY INTO @%orderstiny/result/data_
  FROM orderstiny FILE_FORMAT = (FORMAT_NAME ='myformat' COMPRESSION='GZIP');

Unloading Data from a Query to Files in a Named Internal Stage

Unload the result of a query into a named internal stage (my_stage) using a folder/filename prefix (result/data_), a named file format (myformat), and gzip compression:

COPY INTO @my_stage/result/data_ FROM (SELECT * FROM orderstiny)
   file_format=(format_name='myformat' compression='gzip');

Note that the above example is functionally equivalent to the first example, except the file containing the unloaded data is stored in the stage location for my_stage rather than the table location for orderstiny.

Unloading Data from a Table Directly to Files in an External Location

Unload all data in a table into a storage location using a named my_csv_format file format:

Amazon S3

Access the referenced S3 bucket using a referenced storage integration named myint:

COPY INTO 's3://mybucket/unload/'
  FROM mytable
  STORAGE_INTEGRATION = myint
  FILE_FORMAT = (FORMAT_NAME = my_csv_format);

Access the referenced S3 bucket using supplied credentials:

COPY INTO 's3://mybucket/unload/'
  FROM mytable
  CREDENTIALS = (AWS_KEY_ID='xxxx' AWS_SECRET_KEY='xxxxx' AWS_TOKEN='xxxxxx')
  FILE_FORMAT = (FORMAT_NAME = my_csv_format);

Google Cloud Storage

Access the referenced GCS bucket using a referenced storage integration named myint:

COPY INTO 'gcs://mybucket/unload/'
  FROM mytable
  STORAGE_INTEGRATION = myint
  FILE_FORMAT = (FORMAT_NAME = my_csv_format);

Microsoft Azure

Access the referenced container using a referenced storage integration named myint:

COPY INTO 'azure://myaccount.blob.core.windows.net/unload/'
  FROM mytable
  STORAGE_INTEGRATION = myint
  FILE_FORMAT = (FORMAT_NAME = my_csv_format);

Access the referenced container using supplied credentials:

COPY INTO 'azure://myaccount.blob.core.windows.net/mycontainer/unload/'
  FROM mytable
  CREDENTIALS=(AZURE_SAS_TOKEN='xxxx')
  FILE_FORMAT = (FORMAT_NAME = my_csv_format);

Partitioning Unloaded Rows to Parquet Files

The following example partitions unloaded rows into Parquet files by the values in two columns: a date column and a time column. The example specifies a maximum size for each unloaded file:

CREATE or replace TABLE t1 (
  dt date,
  ts time
  )
AS
  SELECT TO_DATE($1)
        ,TO_TIME($2)
    FROM VALUES
           ('2020-01-28', '18:05')
          ,('2020-01-28', '22:57')
          ,('2020-01-28', NULL)
          ,('2020-01-29', '02:15')
;

SELECT * FROM t1;

+------------+----------+
| DT         | TS       |
|------------+----------|
| 2020-01-28 | 18:05:00 |
| 2020-01-28 | 22:57:00 |
| 2020-01-28 | 22:32:00 |
| 2020-01-29 | 02:15:00 |
+------------+----------+

-- Partition the unloaded data by date and hour. Set ``32000000`` (32 MB) as the upper size limit of each file to be generated in parallel per thread.
COPY INTO @%t1
  FROM t1
  PARTITION BY ('date=' || to_varchar(dt, 'YYYY-MM-DD') || '/hour=' || to_varchar(date_part(hour, ts))) -- Concatenate labels and column values to output meaningful filenames
  FILE_FORMAT = (TYPE=parquet)
  MAX_FILE_SIZE = 32000000
  HEADER=true;

LIST @%t1;

+------------------------------------------------------------------------------------------+------+----------------------------------+------------------------------+
| name                                                                                     | size | md5                              | last_modified                |
|------------------------------------------------------------------------------------------+------+----------------------------------+------------------------------|
| __NULL__/data_019c059d-0502-d90c-0000-438300ad6596_006_4_0.snappy.parquet                |  512 | 1c9cb460d59903005ee0758d42511669 | Wed, 5 Aug 2020 16:58:16 GMT |
| date=2020-01-28/hour=18/data_019c059d-0502-d90c-0000-438300ad6596_006_4_0.snappy.parquet |  592 | d3c6985ebb36df1f693b52c4a3241cc4 | Wed, 5 Aug 2020 16:58:16 GMT |
| date=2020-01-28/hour=22/data_019c059d-0502-d90c-0000-438300ad6596_006_6_0.snappy.parquet |  592 | a7ea4dc1a8d189aabf1768ed006f7fb4 | Wed, 5 Aug 2020 16:58:16 GMT |
| date=2020-01-29/hour=2/data_019c059d-0502-d90c-0000-438300ad6596_006_0_0.snappy.parquet  |  592 | 2d40ccbb0d8224991a16195e2e7e5a95 | Wed, 5 Aug 2020 16:58:16 GMT |
+------------------------------------------------------------------------------------------+------+----------------------------------+------------------------------+

Retaining NULL/Empty Field Data in Unloaded Files

Retain SQL NULL and empty fields in unloaded files:

-- View the table column values

select * from HOME_SALES;

+------------+-------+-------+-------------+--------+-----------+
| CITY       | STATE | ZIP   | TYPE        | PRICE  | SALE_DATE |
|------------+-------+-------+-------------+--------+-----------|
| Lexington  | MA    | 95815 | Residential | 268880 | 3/28/17   |
| Belmont    | MA    | 95815 | Residential |        | 2/21/17   |
| Winchester | MA    | NULL  | Residential |        | 1/31/17   |
+------------+-------+-------+-------------+--------+-----------+

-- Unload the table data into the current user's personal stage. The file format options retain both the NULL value and the empty values in the output file

copy into @~ from HOME_SALES
file_format=(type=csv null_if = ('NULL', 'null')
empty_field_as_null=false);

-- Contents of the output file

95815,MA-Lexington,268880,3/28/17
95815,MA-Belmont,,2/21/17
NULL,MA-Winchester,389921,1/31/17

Unloading Data to a Single File

Unload all rows to a single data file using the SINGLE copy option:

copy into @~ from HOME_SALES
single = true;

Including the UUID in the Unloaded Filenames

Include the UUID in the names of unloaded files by setting the INCLUDE_QUERY_ID copy option to TRUE:

-- Unload rows from the T1 table into the T1 table stage:
COPY INTO @%t1
  FROM t1
  FILE_FORMAT=(TYPE=parquet)
  INCLUDE_QUERY_ID=true;

-- Retrieve the query ID for the COPY INTO location statement.
-- This optional step enables you to see that the query ID for the COPY INTO location statement
-- is identical to the UUID in the unloaded files.
SELECT last_query_id();
+--------------------------------------+
| LAST_QUERY_ID()                      |
|--------------------------------------|
| 019260c2-00c0-f2f2-0000-4383001cf046 |
+--------------------------------------+

LS @%t1;
+----------------------------------------------------------------+------+----------------------------------+-------------------------------+
| name                                                           | size | md5                              | last_modified                 |
|----------------------------------------------------------------+------+----------------------------------+-------------------------------|
| data_019260c2-00c0-f2f2-0000-4383001cf046_0_0_0.snappy.parquet |  544 | eb2215ec3ccce61ffa3f5121918d602e | Thu, 20 Feb 2020 16:02:17 GMT |
+----------------------------------------------------------------+------+----------------------------------+-------------------------------+

Validating Data to be Unloaded (from a Query)

Execute COPY in validation mode to return the result of a query and view the data that will be unloaded from the orderstiny table if COPY is executed in normal mode:

COPY INTO @my_stage
FROM (SELECT * FROM orderstiny LIMIT 5)
VALIDATION_MODE='RETURN_ROWS';

----+--------+----+-----------+------------+----------+-----------------+----+---------------------------------------------------------------------------+
 C1 |   C2   | C3 |    C4     |     C5     |    C6    |       C7        | C8 |                                    C9                                     |
----+--------+----+-----------+------------+----------+-----------------+----+---------------------------------------------------------------------------+
 1  | 36901  | O  | 173665.47 | 1996-01-02 | 5-LOW    | Clerk#000000951 | 0  | nstructions sleep furiously among                                         |
 2  | 78002  | O  | 46929.18  | 1996-12-01 | 1-URGENT | Clerk#000000880 | 0  |  foxes. pending accounts at the pending\, silent asymptot                 |
 3  | 123314 | F  | 193846.25 | 1993-10-14 | 5-LOW    | Clerk#000000955 | 0  | sly final accounts boost. carefully regular ideas cajole carefully. depos |
 4  | 136777 | O  | 32151.78  | 1995-10-11 | 5-LOW    | Clerk#000000124 | 0  | sits. slyly regular warthogs cajole. regular\, regular theodolites acro   |
 5  | 44485  | F  | 144659.20 | 1994-07-30 | 5-LOW    | Clerk#000000925 | 0  | quickly. bold deposits sleep slyly. packages use slyly                    |
----+--------+----+-----------+------------+----------+-----------------+----+---------------------------------------------------------------------------+