Using the delivery.guarantee Property to Avoid Duplicate Data

Important

The Snowflake Connector for Kafka offering the private preview functionality for delivery.guarantee is no longer supported. All customers should instead use Snowpipe Streaming offering exactly-once delivery.

Data duplication can occur in cases where Kafka offsets are not ingested into Snowflake. Although Snowpipe avoids ingesting duplicate data based on file names, duplication can occur when Kafka offsets overlaps are split across two files with different names.

You can configure the delivery.guarantee property to avoid ingesting duplicate data. This property is an optional property in the Kafka configuration file, as shown below:

{
  "name": "SnowflakeSinkConnector_JP",
  "config": {
    "connector.class": "com.snowflake.kafka.connector.SnowflakeSinkConnector",
    "name": "SnowflakeSinkConnector_JP",
    ...
    "jmx": "true",
    "delivery.guarantee": "<value>"
  }
}
Copy

Where the value of delivery.guarantee is one of the following:

Value

Description

EXACTLY_ONCE

Ensures that an entry is unique within a destination table. Duplicate values are not allowed.

AT_LEAST_ONCE

Indicates that duplicates may occur.

If you do not specify a value for this property, the default value is AT_LEAST_ONCE.

This features uses the Snowpipe’s Client API, which determines the status of the data ingestion based on the Kafka offset number. It also inserts offsets in Snowflake’s internal data store.