Copying Data from a Google Cloud Storage Stage

Load data from your staged files into the target table.

In this Topic:

Loading Your Data

Execute COPY INTO <table> to load your data into the target table.

Note

Loading data requires a warehouse. If you are using a warehouse that is not configured to auto resume, execute ALTER WAREHOUSE to resume the warehouse. Note that starting the warehouse could take up to five minutes.

ALTER WAREHOUSE mywarehouse RESUME;

Important

The list of objects returned for an external stage may include one or more “directory blobs”; essentially, paths that end in a forward slash character (/), e.g.:

LIST @my_gcs_stage;

+---------------------------------------+------+----------------------------------+-------------------------------+
| name                                  | size | md5                              | last_modified                 |
|---------------------------------------+------+----------------------------------+-------------------------------|
| my_gcs_stage/load/                    |  12  | 12348f18bcb35e7b6b628ca12345678c | Mon, 11 Sep 2019 16:57:43 GMT |
| my_gcs_stage/load/data_0_0_0.csv.gz   |  147 | 9765daba007a643bdff4eae10d43218y | Mon, 11 Sep 2019 18:13:07 GMT |
+---------------------------------------+------+----------------------------------+-------------------------------+

These blobs are listed when directories are created in the Google Cloud Platform Console rather than using any other tool provided by Google.

COPY statements that reference a stage can fail when the object list includes directory blobs. To avoid errors, we recommend using file pattern matching to identify the files for inclusion (i.e. the PATTERN clause) when the file list for a stage includes directory blobs. For an example, see Loading Data Using Pattern Matching (in this topic). Alternatively, set ON_ERROR = SKIP_FILE in the COPY statement.

Loading Data Using Pattern Matching

The following example loads data from files in the named my_gcs_stage stage created in Configuring an Integration for Google Cloud Storage. Using pattern matching, the statement only loads files whose names start with the string sales:

COPY INTO mytable
  FROM @my_gcs_stage
  PATTERN='.*sales.*.csv';

Note that file format options are not specified because a named file format was included in the stage definition.

Loading Data Using a Path / Prefix

The following example loads all files with the data/files path (i.e. prefix) in your Cloud Storage bucket using the named my_csv_format file format created in Preparing to Load Data. Note that a path can be combined with pattern matching:

COPY INTO mytable
  FROM gcs://mybucket/data/files
  FILE_FORMAT = (FORMAT_NAME = my_csv_format);

Loading Data Using Ad hoc File Format Options

The following ad hoc example loads data from all files in the Cloud Storage bucket. The COPY command specifies file format options instead of referencing a named file format. This example loads CSV files with a pipe (|) field delimiter. The COPY command skips the first line in the data files:

COPY INTO mytable
  FROM gcs://mybucket/data/files
  STORAGE_INTEGRATION = myint
  FILE_FORMAT = (TYPE = CSV FIELD_DELIMITER = '|' SKIP_HEADER = 1);

Validating Your Data

Before loading your data, you can validate that the data in the uploaded files will load correctly.

To validate data in an uploaded file, execute COPY INTO <table> in validation mode using the VALIDATION_MODE parameter. The VALIDATION_MODE parameter returns errors that it encounters in the file. You can then modify the data in the file to ensure it loads without error.

In addition, COPY INTO <table> provides the ON_ERROR copy option to specify an action to perform if errors are encountered in a file during loading.

Monitoring Data Loads

Snowflake retains historical data for COPY INTO commands executed within the previous 14 days. The metadata can be used to monitor and manage the loading process, including deleting files after upload completes:

  • Monitor the status of each COPY INTO <table> command on the History History tab page of the Snowflake web interface.

  • Use the LOAD_HISTORY Information Schema view to retrieve the history of data loaded into tables using the COPY INTO command.