Snowflake High Performance connector for Kafka

This topic describes the basic concepts of the Snowflake High Performance connector for Kafka, its use cases, benefits, key features, and limitations.

Note

The Snowflake High Performance connector for Kafka is a sink connector that reads data from Kafka topics and loads that data into Snowflake tables. For more information about Kafka Connect and its framework, see The Apache Kafka and Kafka connect framework.

Benefits

The Snowflake High Performance connector for Kafka leverages Snowflake’s high-performance Snowpipe Streaming architecture, which is engineered for modern, data-intensive organizations requiring near real-time insights. This next-generation architecture significantly advances throughput, efficiency, and flexibility for real-time ingestion into Snowflake.

The high-performance architecture offers several key advantages:

  • Superior throughput and latency: Designed to support ingest speeds of up to 10 GB/s per table with end-to-end ingest to query latencies within 5 to 10 seconds, enabling near-real-time analytics.

  • Simplified billing: Provides transparent, throughput-based billing that makes costs more predictable and easier to understand.

  • Enhanced performance: Uses a Rust-based client core that delivers improved client-side performance and lower resource usage compared to previous implementations.

  • In-flight transformations: Supports data cleansing and reshaping during ingestion using COPY command syntax within the PIPE object, allowing you to transform data before it reaches the target table.

  • Server-side schema validation: Moves schema validation from the client side to the server side through the PIPE object, ensuring data quality and reducing client complexity.

  • Pre-clustering capability: Can cluster data during ingestion when the target table has clustering keys defined, improving query performance without requiring post-ingestion maintenance.

The connector uses Snowflake PIPE objects as the central component for managing ingestion. The PIPE object acts as the entry point and definition layer for all streaming data, defining how data is processed, transformed, and validated before being committed to the target table. For more information about how the connector works with tables and pipes, see How the connector works with tables and pipes.

Choosing a connector version

The Kafka connector runs in a Kafka Connect cluster, reading data from the Kafka topics and writing into Snowflake tables.

Snowflake provides two versions of the connector. Both versions of the connector provide the same core functionality for streaming data from Kafka to Snowflake.

  • Confluent version of the connector

    High Performance Snowflake Connector for Kafka is not yet available on Confluent Cloud. If you are using Confluent Cloud, you must install the connector manually as a custom plugin connector.

    The Confluent version is packaged for easy installation through Confluent Hub or Confluent Control Center and includes optimizations for the Confluent Platform environment.

    Choose this version if you’re using the Confluent Platform, confluent kafka docker image, or Confluent Cloud.

    Please contact Snowflake support to obtain and install Confluent version of the connector.

    For more information about Kafka Connect, see https://docs.confluent.io/current/connect/.

  • open source software (OSS) Apache Kafka package https://mvnrepository.com/artifact/com.snowflake/snowflake-kafka-connector/ - OSS Apache Kafka version of the connector.

    The Apache version is distributed as a standard JAR file and requires manual installation into your Apache Kafka Connect cluster. Choose this version if you’re running Apache Kafka.

    For more information about Apache Kafka, see https://kafka.apache.org/.

Limitations

The Snowflake High Performance connector for Kafka has the following limitations.

Table creation:

Destination tables must ben manually created before starting the connector. The connector does not create tables automatically.

Migration from version 3.x and earlier

You can manually migrate your existing pipelines from version 3.x and earlier to the new connector. Make sure your existing pipelines don’t rely on any features that are not yet available with the new connector.

Configuration stability

Configuration parameter names are subject to change during the Private Preview phase. Any configuration parameters you use may be renamed or restructured before Public Preview. Snowflake will provide migration guidance when parameter names change.

Kafka connector limitations

Migration of existing pipelines from version 3.x and below

The connector does not support migration of the existing pipelines from version 3.x and below. You must manually migrate the existing pipelines to the new connector.

Single Message Transformations (SMTs):

Most Single Message Transformations (SMTs) are supported when using community converters, with the exception of regex.router which is currently not supported.

For more information about SMTs, see Kafka Connect Single Message Transform Reference for Confluent Cloud or Confluent Platform.

Supported Kafka version

Important

Only certain versions of the connector are not supported. Please see the table below for the supported versions and information about pre-release and release candidates.

Release Series

Status

Notes

4.x.x

Private Preview

Early access. Currently the migration from 3.x and 2.x versions is not supported.

3.x.x

Officially supported

Latest version and strongly recommended.

2.x.x

Officially supported

Upgrade recommended.

1.x.x

Not supported

Do not use this release series.

Unsupported features

The following features are not supported:

Schema evolution

Schema evolution is not supported. You must manage schema changes manually. See Schema evolution for more information.

Iceberg tables

Ingestion into Iceberg tables is not supported.

Automatic table creation

The connector does not create tables automatically. You must manually create tables before starting the connector.

Broken records are not sent to Dead Letter Queue (DLQ) by the connector

If you set errors.tolerance=all and errors.deadletterqueue.topic.name, only non-convertible records are sent to the DLQ by the Kafka Connect level error handler. If the record is passed to the connector and it fails to be ingested into Snowflake it will not be sent to the DLQ. This is an exisitng Snowpipe Streaming High Performance limitation. The connector is unable to detect which records were not ingested into Snowflake. It can only detect that the certain amount of records were not ingested. Because of this with errors.tolerance=all parameter the connector only guarrantees at most once delivery.

Broken records which failed to be ingested need to be manually retried

If you set errors.tolerance=none the connector will fail the task as soon as it detects rows_error_count is greater than 0 in the channel status. In order to retry the broken records the user needs to find them by looking at the channel history. For more information about troubleshooting broken records and ingestion errors see error handling. You can also use gap finding technique described in Detect and recover from errors using metadata offsets. Kafka offset information needed to use this technique is available in the RECORD_METADATA column.

Externalizing secrets

Snowflake strongly recommends externalizing secrets such as the private key and storing them in an encrypted form or in a key management service such as AWS Key Management Service (KMS), Microsoft Azure Key Vault, or HashiCorp Vault. This can be accomplished by using a ConfigProvider implementation on your Kafka Connect cluster.

For more information, see the Confluent description of this service.

Cache considerations for testing and prototyping

The connector caches table and pipe existence checks to improve performance during partition rebalances. However, during testing and prototyping, this caching behavior can cause the connector to not immediately detect manually created tables or pipes.

Issue: When you manually create a table or pipe while the connector is running, the connector may continue to use cached existence check results (which may indicate the object doesn’t exist) for up to 5 minutes by default. This can lead to unexpected errors or behavior during testing.

Recommendation for testing: To avoid cache-related issues during testing and prototyping, configure both cache expiration parameters to their minimum value of 1 millisecond or disable the caching:

snowflake.cache.table.exists.expire.ms=1
snowflake.cache.pipe.exists.expire.ms=1
Copy

This configuration ensures that the connector performs fresh existence checks on every partition rebalance, allowing you to see the effects of manually created tables and pipes immediately.

Important

These minimal cache settings are recommended only for testing and prototyping. In production environments, use the default cache expiration values (5 minutes or greater) to minimize metadata queries to Snowflake and improve rebalance performance, especially when handling many partitions.

Breaking changes in the Private Preview version

See the release notes for the Private Preview versions for a list of breaking changes

Next steps

Review Set up tasks for the Snowflake High Performance connector for Kafka topic for the steps to set up the Snowflake High Performance connector for Kafka. . Review how the connector works topic for more information about how the connector works with tables and pipes.