Introduction to Snowpipe

Snowpipe enables loading data from files as soon as they’re available in a stage. This means you can load data from files in micro-batches, making it available to users within minutes, rather than manually executing COPY statements on a schedule to load larger batches.

In this Topic:

How Does Snowpipe Work?

Snowpipe loads data from files as soon as they are available in a stage. The data is loaded according to the COPY statement defined in a referenced pipe.

A pipe is a named, first-class Snowflake object that contains a COPY statement used by Snowpipe. The COPY statement identifies the source location of the data files (i.e., a stage) and a target table. All data types are supported, including semi-structured data types such as JSON and Avro.

Different mechanisms for detecting the staged files are available:

Automating Snowpipe using cloud messaging

Automated data loads leverage event notifications for cloud storage to inform Snowpipe of the arrival of new data files to load. Snowpipe copies the files into a queue, from which they are loaded into the target table in a continuous, serverless fashion based on parameters defined in a specified pipe object.

Snowflake currently supports the following storage account types:

  • Amazon Web Services (AWS)

    • Amazon S3

  • Microsoft Azure

    • Blob storage

    • Data Lake Storage Gen2 — Supported as a preview feature.

    • General-purpose v2

For more information, see Automating Continuous Data Loading Using Cloud Messaging.

Calling Snowpipe REST endpoints

Your client application calls a public REST endpoint with the name of a pipe object and a list of data filenames. If new data files matching the list are discovered in the stage referenced by the pipe object, they are queued for loading. Snowflake-provided compute resources load data from the queue into a Snowflake table based on parameters defined in the pipe.

  • Amazon Web Services (AWS)

    • Amazon S3

  • Google Cloud Platform

    • Cloud Storage

  • Microsoft Azure

    • Blob storage

    • Data Lake Storage Gen2 — Supported as a preview feature.

    • General-purpose v2

For more information, see Calling Snowpipe REST Endpoints to Load Data.

How Is Snowpipe Different from Bulk Data Loading?

This section briefly describes the primary differences between Snowpipe and a bulk data load workflow using the COPY command. Additional details are provided throughout the Snowpipe documentation.

Authentication

Bulk data load

Relies on the security options supported by the client for authenticating and initiating a user session.

Snowpipe

When calling the REST endpoints: Requires key pair authentication with JSON Web Token (JWT). JWTs are signed using a public/private key pair with RSA encryption.

Load History

Bulk data load

Stored in the metadata of the target table for 64 days. Available upon completion of the COPY statement as the statement output.

Snowpipe

Stored in the metadata of the pipe for 14 days. Must be requested from Snowflake via a REST endpoint, SQL table function, or ACCOUNT_USAGE view.

Important

To avoid reloading files (and duplicating data), we recommend loading data from a specific set of files using either bulk data loading or Snowpipe but not both.

Transactions

Bulk data load

Loads are always performed in a single transaction. Data is inserted into table alongside any other SQL statements submitted manually by users.

Snowpipe

Loads are combined or split into a single or multiple transactions based on the number and size of the rows in each data file. Rows of partially loaded files (based on the ON_ERROR copy option setting) can also be combined or split into one or more transactions.

Compute Resources

Bulk data load

Requires a user-specified warehouse to execute COPY statements.

Snowpipe

Uses Snowflake-supplied compute resources.

Cost

Bulk data load

Billed for the amount of time each virtual warehouse is active.

Snowpipe

Billed according to the compute resources used in the Snowpipe warehouse while loading the files.

Load Order of Data Files

For each pipe object, Snowflake establishes a single queue to sequence data files awaiting loading. As new data files are discovered in a stage, Snowpipe appends them to the queue. However, multiple processes pull files from the queue; and so, while Snowpipe generally loads older files first, there is no guarantee that files are loaded in the same order they are staged.

Data Duplication

Snowpipe uses file loading metadata associated with each pipe object to prevent reloading the same files (and duplicating data) in a table. This metadata stores the path (i.e. prefix) and name of each loaded file, and prevents loading files with the same name even if they were later modified (i.e. have a different eTag).

Estimating Snowpipe Latency

Given the number of factors that can differentiate Snowpipe loads, it is very difficult for Snowflake to estimate latency. File formats and sizes, and the complexity of COPY statements (including SELECT statement used for transformations), all impact the amount of time required for a Snowpipe load.

We suggest that you experiment by performing a typical set of loads to estimate average latency.

Snowpipe DDL

To support creating and managing pipes, Snowflake provides the following set of special DDL commands:

In addition, providers can view, grant, or revoke access to the necessary database objects for Snowpipe using the following standard access control DDL: