Snowflake High Performance connector for Kafka¶
This topic describes the basic concepts of the Snowflake High Performance connector for Kafka, its use cases, benefits, key features, and limitations.
Note
The Snowflake High Performance connector for Kafka is a sink connector that reads data from Kafka topics and loads that data into Snowflake tables. For more information about Kafka Connect and its framework, see The Apache Kafka and Kafka connect framework.
Benefits¶
The Snowflake High Performance connector for Kafka leverages Snowflake’s high-performance Snowpipe Streaming architecture, which is engineered for modern, data-intensive organizations requiring near real-time insights. This next-generation architecture significantly advances throughput, efficiency, and flexibility for real-time ingestion into Snowflake.
The high-performance architecture offers several key advantages:
Superior throughput and latency: Designed to support ingest speeds of up to 10 GB/s per table with end-to-end ingest to query latencies within 5 to 10 seconds, enabling near-real-time analytics.
Simplified billing: Provides transparent, throughput-based billing that makes costs more predictable and easier to understand.
Enhanced performance: Uses a Rust-based client core that delivers improved client-side performance and lower resource usage compared to previous implementations.
In-flight transformations: Supports data cleansing and reshaping during ingestion using COPY command syntax within the PIPE object, allowing you to transform data before it reaches the target table.
Server-side schema validation: Moves schema validation from the client side to the server side through the PIPE object, ensuring data quality and reducing client complexity.
Pre-clustering capability: Can cluster data during ingestion when the target table has clustering keys defined, improving query performance without requiring post-ingestion maintenance.
The connector uses Snowflake PIPE objects as the central component for managing ingestion. The PIPE object acts as the entry point and definition layer for all streaming data, defining how data is processed, transformed, and validated before being committed to the target table. For more information about how the connector works with tables and pipes, see How the connector works with tables and pipes.
Choosing a connector version¶
The Kafka connector runs in a Kafka Connect cluster, reading data from the Kafka topics and writing into Snowflake tables.
Snowflake provides two versions of the connector. Both versions of the connector provide the same core functionality for streaming data from Kafka to Snowflake.
Confluent version of the connector
High Performance Snowflake Connector for Kafka is not yet available on Confluent Cloud. If you are using Confluent Cloud, you must install the connector manually as a custom plugin connector.
The Confluent version is packaged as a zip file for installation through Confluent Hub or Confluent Control Center and includes all external libraries required to run the connector.
Choose this version if you’re using the Confluent Platform or Confluent Cloud.
Please contact Snowflake support to obtain and install Confluent version of the connector.
For more information, see Kafka Connect.
OSS Apache Kafka version of the connector
Available from open source software (OSS) Apache Kafka package.
The Apache version is distributed as a standard fat JAR file and requires manual installation into your Apache Kafka Connect cluster. This version requires Bouncy Castle cryptography libraries that must be downloaded separately.
For more information, see Apache Kafka.
Using the connector with Apache Iceberg™ tables¶
The connector can ingest data into a Snowflake-managed Apache Iceberg™ tables. Before you configure the Kafka connector for Iceberg table ingestion, you must create an Iceberg table. See Create an Apache Iceberg™ table for ingestion for more information.
Limitations¶
The Snowflake High Performance connector for Kafka has the following limitations.
- Apache Iceberg™ tables and schema evolution
The connector does not support schema evolution for Apache Iceberg™ tables.
- Migration of existing pipelines from version 3.x and below
The connector does not support migration of the existing pipelines from version 3.x and earlier. You must manually migrate the existing pipelines to the new connector. Ensure that existing pipelines don’t rely on any features that are not yet available with this connector.
- Single Message Transformations (SMTs):
Most Single Message Transformations (SMTs) are supported when using community converters, with the exception of
regex.routerwhich is currently not supported.- Not all broken records are sent to Dead Letter Queue (DLQ) by the connector
With
errors.tolerance=allanderrors.deadletterqueue.topic.nameconfigured, the connector guarantees at most once delivery. Only non-convertible records are sent to the DLQ by Kafka Connect. Records that fail Snowflake ingestion are not routed there; Snowpipe Streaming can detect that records failed, but not which specific records.- Broken records which failed to be ingested need to be manually retried
When
errors.tolerance=noneandrows_error_countincreases, the connector task fails. To retry broken records, review the channel history to find the broken records. For more information about troubleshooting broken records and ingestion errors see error handling. You can also use gap finding technique described in Detect and recover from errors using metadata offsets. Kafka offset information needed to use this technique is available in theRECORD_METADATAcolumn.
Limitations of fault tolerance with the connector¶
Kafka Topics can be configured with a limit on storage space or retention time.
If the system is offline for more than the retention time, then expired records will not be loaded. Similarly, if Kafka’s storage space limit is exceeded, some messages will not be delivered.
If messages in the Kafka topic are deleted, these changes will not be reflected in the Snowflake table.
For more information about SMTs, see Kafka Connect Single Message Transform Reference for Confluent Cloud or Confluent Platform.
Snowflake support for the connector¶
The following table describes the supported versions and information about pre-release and release candidates.
Release Series |
Status |
Notes |
|---|---|---|
4.x.x |
Private Preview |
Early access. Currently the migration from 3.x and 2.x is not supported. |
3.x.x |
Officially supported |
Latest version and strongly recommended. |
2.x.x |
Officially supported |
Upgrade recommended. |
1.x.x |
Not supported |
The following features are not supported:
Breaking changes in the Preview version¶
See the release notes for the Preview versions for a list of breaking changes
Next steps¶
Review how the connector works topic for more information about how the connector works with tables and pipes. . Review Set up tasks for the Snowflake High Performance connector for Kafka topic for the steps to set up the Snowflake High Performance connector for Kafka.