Snowpipe Streaming: High-Performance Architecture¶

The high-performance architecture for Snowpipe Streaming is engineered for modern, data-intensive organizations requiring near real-time insights. This next-generation architecture significantly advances throughput, efficiency, and flexibility for real-time ingestion into Snowflake.

For information about the classic architecture, see Snowpipe Streaming - Classic Architecture. For differences between the classic SDK and the high-performance SDK, see Comparison between the classic SDK and the high-performance SDK.

Key features¶

  • Throughput and latency:

    • High throughput: Designed to support ingest speeds of up to 10 GB/s per table.

    • Near-real-time insights: Achieves end-to-end ingest to query latencies within 5 to 10 seconds.

  • Billing:

  • Flexible ingestion:

    • Java SDK: Utilizes the new snowpipe-streaming SDK, with a Rust-based client core for improved client-side performance and lower resource usage.

    • REST API: Provides a direct ingestion path, simplifying integration for lightweight workloads, IoT device data, and edge deployments.

  • Optimized data handling:

    • In-flight transformations: Supports data cleansing and reshaping during ingestion using COPY command syntax within the PIPE object.

    • Enhanced channel visibility: Improved insight into ingestion status primarily through the channel history view in Snowsight and a new GET_CHANNEL_STATUS API.

This architecture is recommended for:

  • Consistent ingestion of high-volume streaming workloads.

  • Powering real-time analytics and dashboards for time-sensitive decision-making.

  • Efficient integration of data from IoT devices and edge deployments.

  • Organizations seeking transparent, predictable, and throughput-based pricing for streaming ingestion.

New concepts: The PIPE object¶

While inheriting core concepts like channels and offset tokens from Snowpipe Streaming Classic, this architecture introduces the PIPE object as a central component.

The PIPE object is a named Snowflake object that acts as the entry point and definition layer for all ingested streaming data. It provides the following:

  • Data Processing Definition: Defines how streaming data is processed before being committed to the target table, including server-side buffering for transformations or schema mapping.

  • Enabling Transformations: Allows for in-flight data manipulation (e.g., filtering, column reordering, simple expressions) by incorporating COPY command transformation syntax.

  • Table Features Support: Handles ingestion into tables with defined clustering keys, DEFAULT value columns, and AUTOINCREMENT (or IDENTITY) columns.

  • Schema Management: Helps define the expected schema of incoming streaming data and its mapping to target table columns, enabling server-side schema validation.

Differences from Snowpipe Streaming Classic¶

For users familiar with the classic architecture, the high-performance architecture introduces the following changes:

  • New SDK and APIs: Requires the new snowpipe-streaming SDK (Java SDK and REST API), necessitating client code updates for migration.

  • PIPE object requirement: All data ingestion, configuration (for example, transformations), and schema definitions are managed through the server-side PIPE object, a shift from Classic’s more client-driven configuration.

  • Channel association: Client applications open channels against a specific PIPE object, not directly against a target table.

  • Schema validation: Moves from primarily client-side (Classic SDK) to server-side enforcement by Snowflake, based on the PIPE object.

  • Migration requirements: Requires modifying client application code for the new SDK and defining PIPE objects in Snowflake.

Limitations and considerations¶

  • Snowpipe ON_ERROR option: The STREAMING Snowpipe ON_ERROR option only supports CONTINUE.

  • Supported architectures (Rust Core): Arm64 Mac, Windows, Arm64-linux, and X86_64-linux.

  • Linux requirements: If you’re using the SDK on Linux, your system needs to have at least version 2.18 of the glibc library installed.

  • Deployment environment: Only AWS deployments are supported.

  • Private link: PrivateLink is not supported.

  • Clustered tables (Ingest): While clustered tables can be target tables, no clustering will occur during the ingest process.

  • Replication: Replication is not supported.

  • ALTER PIPE SET PIPE_EXECUTION_PAUSED = true: While openChannel fails when paused, ingestion might not stop immediately.

  • Authorization role: The default role is used for authorization. The ability to specify other roles is planned for the future.

  • Timezone: The SDK will automatically use UTC. Users won’t be able to change this setting.

  • Empty payload restriction: The SDK and the REST API do not support the submission of rowsets containing an empty payload. Submissions must include at least one data row for successful ingestion.

  • Error message visibility: While error messages are available in the channel status response, they are not displayed in the new channel history view.