Idempotent Kafka Producer
What is an Idempotent Kafka producer? How do they help avoid duplication?
Retrying to send a failed message often includes a small risk that both messages were successfully written to the broker, leading to duplicates. This can happen as illustrated below.
Kafka producer sends a message to Kafka
The message was successfully written and replicated
Network issues prevented the broker acknowledgment from reaching the producer
The producer will treat the lack of acknowledgment as a temporary network issue and will retry sending the message (since it can’t know that it was received).
In that case, the broker will end up having the same message twice.
Producer idempotence can be enabled for Kafka versions >= 0.11
Producer idempotence ensures that duplicates are not introduced due to unexpected retries.
How does it work internally? (for the curious)
enable.idempotence is set to
true, each producer gets assigned a Producer Id (PID) and the PIDis included every time a producer sends messages to a broker. Additionally, each message gets a monotonically increasing sequence number (different from the offset - used only for protocol purposes). A separate sequence is maintained for each topic partition that a producer sends messages to. On the broker side, on a per partition basis, it keeps track of the largest PID-Sequence Number combination that is successfully written. When a lower sequence number is received, it is discarded.
Starting with Kafka 3.0, producer are by default having
acks=all. See KIP-679 for more details.
If you already use
acks=all then you should enable this feature. All you need to do to turn this feature on is use the producer configuration
Overall, we recommend to enable producer idempotence for all your Kafka Producers. If you are not using the Java SDK by default, ensure your client library has support for this feature!