Overview of data loading

This topic provides an overview of the main options available to load data into Snowflake.

To easily and accurately measure the ingestion latency of your data pipelines, use row timestamps. For more information, see Use row timestamps to measure latency in your pipelines.

Supported file locations

Snowflake refers to the location of data files in cloud storage as a stage. The COPY INTO <table> command used for both bulk and continuous data loads (Snowpipe) supports cloud storage accounts managed by your business entity (external stages) as well as cloud storage contained in your Snowflake account (internal stages).

External stages

Loading data from any of the following cloud storage services is supported regardless of the cloud platform that hosts your Snowflake account:

  • Amazon S3

  • Google Cloud Storage

  • Microsoft Azure

You cannot access data held in archival cloud storage classes that requires restoration before it can be retrieved. These archival storage classes include, for example, the Amazon S3 Glacier Flexible Retrieval or Glacier Deep Archive storage class, or Microsoft Azure Archive Storage.

Upload (i.e. stage) files to your cloud storage account using the tools provided by the cloud storage service.

A named external stage is a database object created in a schema. This object stores the URL to files in cloud storage, the settings used to access the cloud storage account, and convenience settings such as the options that describe the format of staged files. Create stages using the CREATE STAGE command.

Note

Some data transfer billing charges may apply when loading data from files in a cloud storage service in a different region or cloud platform from your Snowflake account. For more information, see Understanding data transfer cost.

Internal stages

Snowflake maintains the following stage types in your account:

User:

A user stage is allocated to each user for storing files. This stage type is designed to store files that are staged and managed by a single user but can be loaded into multiple tables. User stages cannot be altered or dropped.

Table:

A table stage is available for each table created in Snowflake. This stage type is designed to store files that are staged and managed by one or more users but only loaded into a single table. Table stages cannot be altered or dropped.

Note that a table stage is not a separate database object; rather, it is an implicit stage tied to the table itself. A table stage has no grantable privileges of its own. To stage files to a table stage, list the files, query them on the stage, or drop them, you must be the table owner (have the role with the OWNERSHIP privilege on the table).

Named:

A named internal stage is a database object created in a schema. This stage type can store files that are staged and managed by one or more users and loaded into one or more tables. Because named stages are database objects, the ability to create, modify, use, or drop them can be controlled using security access control privileges. Create stages using the CREATE STAGE command.

Upload files to any of the internal stage types from your local file system using the PUT command.

Bulk vs continuous loading

Snowflake provides the following main solutions for data loading. The best solution may depend upon the volume of data to load and the frequency of loading.

Bulk loading using the COPY command

This option enables loading batches of data from files already available in cloud storage, or copying (i.e. staging) data files from a local machine to an internal (i.e. Snowflake) cloud storage location before loading the data into tables using the COPY command.

Compute resources

Bulk loading relies on user-provided virtual warehouses, which are specified in the COPY statement. Users are required to size the warehouse appropriately to accommodate expected loads.

Simple transformations during a load

Snowflake supports transforming data while loading it into a table using the COPY command. Options include:

  • Column reordering

  • Column omission

  • Casts

  • Truncating text strings that exceed the target column length

There is no requirement for your data files to have the same number and ordering of columns as your target table.

Continuous loading using Snowpipe

This option is designed to load small volumes of data (i.e. micro-batches) and incrementally make them available for analysis. Snowpipe loads data within minutes after files are added to a stage and submitted for ingestion. This ensures users have the latest results, as soon as the raw data is available.

Compute resources

Snowpipe uses compute resources provided by Snowflake (i.e. a serverless compute model). These Snowflake-provided resources are automatically resized and scaled up or down as required, and are charged and itemized using per-second billing. Data ingestion is charged based upon the actual workloads.

Simple transformations during a load

The COPY statement in a pipe definition supports the same COPY transformation options as when bulk loading data.

In addition, data pipelines can leverage Snowpipe to continuously load micro-batches of data into staging tables for transformation and optimization using automated tasks and the change data capture (CDC) information in streams.

Continuous loading using Snowpipe Streaming

The Snowpipe Streaming API writes rows of data directly to Snowflake tables without the requirement of staging files. This architecture results in lower load latencies with corresponding lower costs for loading any volume of data, which makes it a powerful tool for handling near real-time data streams.

Snowpipe Streaming is also available for the Snowflake Connector for Kafka, which offers an easy upgrade path to take advantage of the lower latency and lower cost loads.

For more information, refer to Snowpipe Streaming.

Loading data from Apache Kafka topics

The Snowflake Connector for Kafka enables users to connect to an Apache Kafka server, read data from one or more topics, and load that data into Snowflake tables.

DML error logging

When you execute a set of DML statements and one of the statements fails with an error, the DML operation ends and the changes made by the DML statement are rolled back. If you want to continue to execute the rest of the DML statements and log the error that occurred, you can turn on DML error logging for the table. The table for which DML error logging is turned on is called the base table. Errors are logged in an error table that is associated with the base table.

DML error logging is turned on for a table only when both of the following conditions are met:

  • The ERROR_LOGGING property is set to TRUE for the table.

  • The OPT_OUT_ERROR_LOGGING parameter is set to FALSE for the current session.

DML error logging is turned off for a table only when either of the following conditions are met:

  • The ERROR_LOGGING property is set to FALSE for the table.

  • The OPT_OUT_ERROR_LOGGING parameter is set to TRUE for the current session.

The following sections provide more information about DML error logging:

Use cases for DML error logging

You might use DML error logging to avoid failures on errors for the following use cases:

  • Migration of third-party data that relies on DML error logging, such as data from an Oracle database.

  • Enforcement of some table constraints, such as NOT NULL constraints, during data ingestion.

Configure DML error logging for a table

You can turn on or turn off DML error logging for a standard Snowflake table or a Snowflake-managed Iceberg table when you create or alter the table.

To turn on or turn off error logging for a table, use the following SQL commands to set the ERROR_LOGGING property for the table:

The following examples configure DML error logging for tables and show how errors are logged in error tables:

The following examples configure DML error logging for tables and show how errors are logged in error tables:

Log errors when inserting rows directly

The following example logs errors when inserting rows directly into a table:

  1. Create a table and turn on DML error logging for it:

    CREATE TABLE test_dml_error_logging(
      n NUMBER(4, 0) NOT NULL,
      t VARCHAR(5)
      )
      ERROR_LOGGING = true;
    
  2. Run an INSERT statement that tries to insert several rows, including both valid and invalid values:

    INSERT INTO test_dml_error_logging
      VALUES
        ('invalid_cast', '1'),
        (10, 'valid'),
        (NULL, 'toolong');
    
    +-------------------------+
    | number of rows inserted |
    |-------------------------|
    |                       1 |
    +-------------------------+
    
  3. Query the table to confirm that one valid row was inserted:

    SELECT * FROM test_dml_error_logging;
    
    +----+-------+
    |  N | T     |
    |----+-------|
    | 10 | valid |
    +----+-------+
    
  4. Query the error table for the test_dml_error_logging base table to view the errors that were logged:

    SELECT * FROM ERROR_TABLE(test_dml_error_logging);
    
    +-------------------------------+--------------------------------------+------------+----------------------------------------------------------------------+--------------------+
    | TIMESTAMP                     | QUERY_ID                             | ERROR_CODE | ERROR_METADATA                                                       | ERROR_DATA         |
    |-------------------------------+--------------------------------------+------------+----------------------------------------------------------------------+--------------------|
    | 2026-03-12 12:18:39.470 -0700 | 01c2fc06-000e-6668-0000-76b90170a28e |     100038 | {                                                                    | {                  |
    |                               |                                      |            |   "error_code": 100038,                                              |   "N": [           |
    |                               |                                      |            |   "error_message": "Numeric value 'invalid_cast' is not recognized", |     "invalid_cast" |
    |                               |                                      |            |   "error_source": "N",                                               |   ],               |
    |                               |                                      |            |   "sql_state": "22018"                                               |   "T": "1"         |
    |                               |                                      |            | }                                                                    | }                  |
    | 2026-03-12 12:18:39.470 -0700 | 01c2fc06-000e-6668-0000-76b90170a28e |     100072 | {                                                                    | {                  |
    |                               |                                      |            |   "error_code": 100072,                                              |   "N": [           |
    |                               |                                      |            |   "error_message": "NULL result in a non-nullable column",           |     null           |
    |                               |                                      |            |   "error_source": "N",                                               |   ],               |
    |                               |                                      |            |   "sql_state": "22000"                                               |   "T": [           |
    |                               |                                      |            | }                                                                    |     "toolong"      |
    |                               |                                      |            |                                                                      |   ]                |
    |                               |                                      |            |                                                                      | }                  |
    +-------------------------------+--------------------------------------+------------+----------------------------------------------------------------------+--------------------+
    
  5. Turn off DML error logging for the test_dml_error_logging table:

    ALTER TABLE test_dml_error_logging
      SET ERROR_LOGGING = false;
    
  6. Attempt the same INSERT statement that you ran previously. An error is returned and no errors are logged in an error table:

    INSERT INTO test_dml_error_logging
      VALUES
        ('invalid_cast', '1'),
        (10, 'valid'),
        (NULL, 'toolong');
    
    100038 (22018): DML operation to table TEST_DML_ERROR_LOGGING failed on column N with error: Numeric value 'invalid_cast' is not recognized
    

Log errors when inserting rows from one table into another table

The following example logs errors when inserting rows from one table into another table:

  1. Create a source table and insert values:

    CREATE TABLE dml_error_logging_source(col1 INT);
    
    INSERT INTO dml_error_logging_source VALUES (1), (0), (-1);
    
  2. Create a target table with the same definition as the source table:

    CREATE TABLE dml_error_logging_target(col1 INT);
    
  3. Turn on DML error logging on the dml_error_logging_target table:

    ALTER TABLE dml_error_logging_target
      SET ERROR_LOGGING = true;
    
  4. Insert values into the target table by querying the source table so that one of the inserts results in a division by zero error:

    INSERT INTO dml_error_logging_target(col1)
      SELECT 1/col1 FROM dml_error_logging_source;
    
    +-------------------------+
    | number of rows inserted |
    |-------------------------|
    |                       2 |
    +-------------------------+
    
  5. Query the table to confirm that two valid rows were inserted:

    SELECT * FROM dml_error_logging_target;
    
    +------+
    | COL1 |
    |------|
    |    1 |
    |   -1 |
    +------+
    
  6. Query the error table for the dml_error_logging_target base table to view the errors that were logged:

    SELECT * FROM ERROR_TABLE(dml_error_logging_target);
    
    +-------------------------------+--------------------------------------+------------+----------------------------------------+-------------+
    | TIMESTAMP                     | QUERY_ID                             | ERROR_CODE | ERROR_METADATA                         | ERROR_DATA  |
    |-------------------------------+--------------------------------------+------------+----------------------------------------+-------------|
    | 2026-03-12 12:25:56.297 -0700 | 01c2fc0d-000e-6696-0000-76b90170b64a |     100051 | {                                      | {           |
    |                               |                                      |            |   "error_code": 100051,                |   "COL1": [ |
    |                               |                                      |            |   "error_message": "Division by zero", |     1,      |
    |                               |                                      |            |   "error_source": "COL1",              |     0       |
    |                               |                                      |            |   "sql_state": "22012"                 |   ]         |
    |                               |                                      |            | }                                      | }           |
    +-------------------------------+--------------------------------------+------------+----------------------------------------+-------------+
    

Error logging and error tables

When error logging is turned on for a table, Snowflake automatically creates an error table that is associated with the base table. DML operations that encounter supported errors log the errors in the error table instead of failing.

When DML error logging is turned on for a table, the following types of DML statements are logged:

  • Single-table INSERT

  • UPDATE

  • MERGE

Error tables have a fixed definition and can only be accessed by the owner of the base table or a user with a role that has been granted the SELECT ERROR TABLE privilege on the base table. The only supported direct operations on an error table are SELECT and TRUNCATE statements. You can’t run other types of statements directly on error tables. Error tables can’t be used indirectly in materialized views or dynamic tables.

You can copy the data out of the error table to other tables. You can remove the data in an error table by running the TRUNCATE command.

The following sections provide more information about error logging and error tables:

Error table definition

Snowflake creates error tables with a standard definition that can’t be modified.

When you turn off DML error logging for a base table or drop a base table that has an error table, the error table associated with the base table is dropped automatically.

An error table has the following columns:

Name

Type

Description

timestamp

TIMESTAMP

The timestamp of the statement that triggered the error.

query_id

VARCHAR

The unique ID of the statement that triggered the error.

error_code

NUMBER

The error code. When multiple columns in one row contain errors, this column only captures the first error that is encountered.

error_metadata

OBJECT

The error metadata.

The OBJECT values have the following structure:

{
  "error_code": <value>,
  "error_message": "<value>",
  "error_source": "<value>",
  "sql_state": "<value>"
}

The OBJECT values contain the following key-value pairs:

  • error_code: The error code.

  • error_message: The error message.

  • error_source: The origin of the error, such as a column name.

  • sql_state: A five-character code that is modeled on the ANSI SQL standard SQLSTATE. Snowflake uses additional values beyond those in the ANSI SQL standard.

When multiple columns in one row contain errors, this column only captures the first error that is encountered.

error_data

OBJECT

The data that caused the error.

The OBJECT values have the following structure:

{
  "<column_name>": [
    <invalid_column_values>
  ]
  "<column_name>": <valid_column_values>
  ...
}

The OBJECT values contain the key-value pairs that represent each column in the base table. The key is the column name. For invalid column values that caused the DML operation to fail, the value in the key-value pair is an array that contains the values. Valid values are shown directly; that is, they aren’t shown in arrays.

If the data can’t be represented in an OBJECT value, the value is NULL.

Interact with error tables

You can run SELECT statements and TRUNCATE statements on error tables by using the following syntax:

SELECT ... FROM ERROR_TABLE( <base_table_name> )

TRUNCATE [ TABLE ] [ IF EXISTS ] ERROR_TABLE( <base_table_name> )

Where:

base_table_name

The name of the table for which the error table was created.

For example, if the name of the base table is my_table, the following statement queries the error table for this base table:

SELECT * FROM ERROR_TABLE(my_table);

The following statement truncates the error table:

TRUNCATE ERROR_TABLE(my_table);

Access control requirements for error tables

Any role that can insert into a base table can trigger inserts into its error table. Regardless of the current role, direct inserts into an error table aren’t allowed.

The following users can run SELECT statements on an error table:

  • The owner of the error table’s base table.

  • Users who have been granted SELECT ERROR TABLE privilege privileges on the base table, either through a role or directly.

    To grant SELECT ERROR TABLE privilege on a base table, run a GRANT <privileges> … TO ROLE statement or a GRANT <privileges> … TO USER statement.

    These statements use the following syntax:

    GRANT SELECT ERROR TABLE ON TABLE <base_table_name> TO ROLE <role_name>
    
    GRANT SELECT ERROR TABLE ON TABLE <base_table_name> TO USER <user_name>
    

    For example, to grant SELECT ERROR TABLE privilege on a base table named mybasetable to a role named myrole, run the following statement:

    GRANT SELECT ERROR TABLE ON TABLE mybasetable TO ROLE myrole;
    

Alternatively, to grant other roles access to an error table, the base table owner can also create a view based on the error table and grant access to that view.

Metadata for error logging

To determine whether error logging is turned on for a table, you can run the GET_DDL function and pass in the name of the base table:

SELECT GET_DDL('TABLE', '[<namespace>.]<base_table_name>');

For example, for a base table named test_dml_error_logging in the current schema, run the following statement:

SELECT GET_DDL('TABLE', 'test_dml_error_logging');
+--------------------------------------------------+
| GET_DDL('TABLE', 'TEST_DML_ERROR_LOGGING')       |
|--------------------------------------------------|
| create or replace TABLE TEST_DML_ERROR_LOGGING ( |
|     N NUMBER(4,0) NOT NULL,                      |
|     T VARCHAR(5)                                 |
| ) ERROR_LOGGING = true                           |
| ;                                                |
+--------------------------------------------------+

Metrics for error tables are recorded in the following views:

Streams on error tables

Streams aren’t supported directly on error tables. To enable change tracking on error tables, first create a view on the error table, and then create a stream on the view.

The following example shows you how to enable change tracking on error tables:

  1. Run the CREATE VIEW command to create a view on the error table:

    CREATE VIEW my_error_view AS
      SELECT timestamp,
             query_id,
             error_code,
             error_metadata,
             error_data
        FROM ERROR_TABLE(test_dml_error_logging);
    
  2. Run the CREATE STREAM command to create a stream on the view:

    CREATE STREAM my_error_stream ON VIEW my_error_view;
    

DML error logging usage notes

The following usage notes apply when error logging is turned on for a table:

  • Only errors directly related to the base table are logged.

  • The following types of errors are logged:

    • NOT NULL table constraint violations.

    • Type conversion errors that occur when attempting to convert a value from to the base table column.

    • Incompatible precision and scale values.

    • Incompatible length for string and binary types.

    • Some expression evaluation failures, such as division by zero or PARSE_JSON function failures.

  • Multi-table INSERT and CREATE TABLE … AS SELECT (CTAS) statements run normally. They fail on DML errors and don’t log them.

  • If you try to run a COPY INTO statement on a table with error logging enabled, the Error logging is not supported in statement 'COPY INTO' error is returned at compilation time.

  • Errors that aren’t supported by DML error logging cause the DML operation to fail directly.

  • If a SQL statement results in a compilation error, the operation ends and no errors are logged in the error table.

  • Failures that occur in other ingestion paths, such as COPY, Snowpipe, and Snowpipe streaming, aren’t logged in error tables.

  • The following are considerations related to DML error logging and performance:

    • When DML error logging is enabled for a base table, and there are no errors in a DML statement that is run on the base table, no performance difference or very little performance difference is expected.

    • When DML error logging is enabled for a base table, and there are errors in a DML statement that is run on the base table, additional time is required to complete the DML statement because the error information is inserted into the error table.

  • When a base table with an associated error table is cloned, the behavior is as follows:

    • The base table’s schema and content are cloned.

    • The error table’s content isn’t cloned

    • The cloned base table has the ERROR_LOGGING property turned on, which implicitly creates an empty error table for it.

Schema detection of column definitions from staged semi-structured data files

Semi-structured data can include thousands of columns. Snowflake provides robust solutions for handling this data. Options include referencing the data directly in cloud storage using external tables, loading the data into a single column of type VARIANT, or transforming and loading the data into separate columns in a standard relational table. All of these options require some knowledge of the column definitions in the data.

A different solution involves automatically detecting the schema in a set of staged semi-structured data files and retrieving the column definitions. The column definitions include the names, data types, and ordering of columns in the files. Generate syntax in a format suitable for creating Snowflake standard tables, external tables, or views.

Note

This feature supports Apache Parquet, Apache Avro, ORC, JSON, and CSV files.

This support is implemented through the following SQL functions:

INFER_SCHEMA

Detects the column definitions in a set of staged data files and retrieves the metadata in a format suitable for creating Snowflake objects.

GENERATE_COLUMN_DESCRIPTION

Generates a list of columns from a set of staged files using the INFER_SCHEMA function output.

These SQL functions support both internal and external stages.

Create tables or external tables with the column definitions derived from a set of staged files using the CREATE TABLE … USING TEMPLATE or CREATE EXTERNAL TABLE … USING TEMPLATE syntax. The USING TEMPLATE clause accepts an expression that calls the INFER_SCHEMA SQL function to detect the column definitions in the files. After the table is created, you can then use a COPY statement with the MATCH_BY_COLUMN_NAME option to load files directly into the structured table.

Schema detection can also be used in conjunction with table schema evolution, where the structure of tables evolves automatically to support the structure of new data received from the data sources.

Alternatives to loading data

You can use the following option to query your data in cloud storage without loading it into Snowflake tables.

External tables (data lake)

External tables enable querying existing data stored in external cloud storage for analysis without first loading it into Snowflake. The source of truth for the data remains in the external cloud storage. Data sets materialized in Snowflake via materialized views are read-only.

This solution is especially beneficial to accounts that have a large amount of data stored in external cloud storage and only want to query a portion of the data; for example, the most recent data. Users can create materialized views on subsets of this data for improved query performance.

Working with Amazon S3-compatible storage

You can create external stages and tables in Snowflake to access storage in an application or device that is Amazon S3-compatible. This feature lets you manage, govern, and analyze your data, regardless of where the data is stored. For information, see Work with Amazon S3-compatible storage.