Using the delivery.guarantee
Property to Avoid Duplicate Data¶
Important
The Snowflake Connector for Kafka offering the private preview functionality for delivery.guarantee is no longer supported. All customers should instead use Snowpipe Streaming offering exactly-once delivery.
Data duplication can occur in cases where Kafka offsets are not ingested into Snowflake. Although Snowpipe avoids ingesting duplicate data based on file names, duplication can occur when Kafka offsets overlaps are split across two files with different names.
You can configure the delivery.guarantee
property to avoid ingesting duplicate data. This property is
an optional property in the Kafka configuration file, as shown below:
{
"name": "SnowflakeSinkConnector_JP",
"config": {
"connector.class": "com.snowflake.kafka.connector.SnowflakeSinkConnector",
"name": "SnowflakeSinkConnector_JP",
...
"jmx": "true",
"delivery.guarantee": "<value>"
}
}
Where the value of delivery.guarantee
is one of the following:
Value |
Description |
---|---|
EXACTLY_ONCE |
Ensures that an entry is unique within a destination table. Duplicate values are not allowed. |
AT_LEAST_ONCE |
Indicates that duplicates may occur. |
If you do not specify a value for this property, the default value is AT_LEAST_ONCE
.
This features uses the Snowpipe’s Client API, which determines the status of the data ingestion based on the Kafka offset number. It also inserts offsets in Snowflake’s internal data store.