Snowpipe Streaming: High-Performance Architecture¶
The high-performance architecture for Snowpipe Streaming is engineered for modern, data-intensive organizations requiring near real-time insights. This next-generation architecture significantly advances throughput, efficiency, and flexibility for real-time ingestion into Snowflake.
For information about the classic architecture, see Snowpipe Streaming - Classic Architecture. For differences between the classic SDK and the high-performance SDK, see Comparison between the classic SDK and the high-performance SDK.
Key features¶
Throughput and latency:
High throughput: Designed to support ingest speeds of up to 10 GB/s per table.
Near-real-time insights: Achieves end-to-end ingest to query latencies within 5 to 10 seconds.
Billing:
Simplified, transparent, throughput-based billing. For more information, see Snowpipe Streaming High-Performance Architecture: Understanding your costs.
Flexible ingestion:
Java SDK: Utilizes the new
snowpipe-streaming
SDK, with a Rust-based client core for improved client-side performance and lower resource usage.REST API: Provides a direct ingestion path, simplifying integration for lightweight workloads, IoT device data, and edge deployments.
Optimized data handling:
In-flight transformations: Supports data cleansing and reshaping during ingestion using COPY command syntax within the PIPE object.
Enhanced channel visibility: Improved insight into ingestion status primarily through the channel history view in Snowsight and a new
GET_CHANNEL_STATUS
API.
This architecture is recommended for:
Consistent ingestion of high-volume streaming workloads.
Powering real-time analytics and dashboards for time-sensitive decision-making.
Efficient integration of data from IoT devices and edge deployments.
Organizations seeking transparent, predictable, and throughput-based pricing for streaming ingestion.
New concepts: The PIPE object¶
While inheriting core concepts like channels and offset tokens from Snowpipe Streaming Classic, this architecture introduces the PIPE object as a central component.
The PIPE object is a named Snowflake object that acts as the entry point and definition layer for all ingested streaming data. It provides the following:
Data Processing Definition: Defines how streaming data is processed before being committed to the target table, including server-side buffering for transformations or schema mapping.
Enabling Transformations: Allows for in-flight data manipulation (e.g., filtering, column reordering, simple expressions) by incorporating COPY command transformation syntax.
Table Features Support: Handles ingestion into tables with defined clustering keys, DEFAULT value columns, and AUTOINCREMENT (or IDENTITY) columns.
Schema Management: Helps define the expected schema of incoming streaming data and its mapping to target table columns, enabling server-side schema validation.
Differences from Snowpipe Streaming Classic¶
For users familiar with the classic architecture, the high-performance architecture introduces the following changes:
New SDK and APIs: Requires the new
snowpipe-streaming
SDK (Java SDK and REST API), necessitating client code updates for migration.PIPE object requirement: All data ingestion, configuration (for example, transformations), and schema definitions are managed through the server-side PIPE object, a shift from Classic’s more client-driven configuration.
Channel association: Client applications open channels against a specific PIPE object, not directly against a target table.
Schema validation: Moves from primarily client-side (Classic SDK) to server-side enforcement by Snowflake, based on the PIPE object.
Migration requirements: Requires modifying client application code for the new SDK and defining PIPE objects in Snowflake.
Limitations and considerations¶
Snowpipe ON_ERROR option: The STREAMING Snowpipe ON_ERROR option only supports CONTINUE.
Supported architectures (Rust Core): Arm64 Mac, Windows, Arm64-linux, and X86_64-linux.
Linux requirements: If you’re using the SDK on Linux, your system needs to have at least version 2.18 of the glibc library installed.
Deployment environment: Only AWS deployments are supported.
Private link: PrivateLink is not supported.
Clustered tables (Ingest): While clustered tables can be target tables, no clustering will occur during the ingest process.
Replication: Replication is not supported.
ALTER PIPE SET PIPE_EXECUTION_PAUSED = true
: While openChannel fails when paused, ingestion might not stop immediately.Authorization role: The default role is used for authorization. The ability to specify other roles is planned for the future.
Timezone: The SDK will automatically use UTC. Users won’t be able to change this setting.
Empty payload restriction: The SDK and the REST API do not support the submission of rowsets containing an empty payload. Submissions must include at least one data row for successful ingestion.
Error message visibility: While error messages are available in the channel status response, they are not displayed in the new channel history view.