# Idempotent Kafka producer

*Learn how idempotent producers prevent duplicate messages in 10 minutes*

Idempotent producers ensure that messages are not duplicated even when retries occur, providing exactly-once semantics for producer operations. This is essential for building reliable data pipelines where duplicates cause problems.

**What you'll learn:**
- How idempotent producers prevent duplicates
- The mechanisms Kafka uses for deduplication
- Configuration requirements and best practices
- Limitations to be aware of

## What is producer idempotency?

Producer idempotency means that sending the same message multiple times will result in exactly one copy of the message being written to the Kafka topic, even in the presence of failures and retries.

![Producer retries without idempotency create duplicates, while idempotency uses PID and sequence numbers to deduplicate on the broker](https://www.conduktor.io/assets/kafka/diagrams/idempotent-kafka-producer.svg)

```mermaid
flowchart LR
    subgraph Without["Without idempotency"]
        P1["Producer"] -->|"Send"| B1["Broker"]
        B1 -->|"Network failure"| P1
        P1 -->|"Retry"| B1
        B1 --> D1["Duplicate!"]
    end

    subgraph With["With idempotency"]
        P2["Producer<br/>PID=123, Seq=42"] -->|"Send"| B2["Broker"]
        B2 -->|"Network failure"| P2
        P2 -->|"Retry (same Seq)"| B2
        B2 -->|"Deduplicate"| OK["Single message"]
    end
```

## Enable idempotent producers

Idempotent producers are enabled by default in Kafka 3.0+. For older versions, enable explicitly:

```properties
enable.idempotence=true
```

When idempotency is enabled, Kafka automatically sets these configurations:
- `retries=Integer.MAX_VALUE`
- `max.in.flight.requests.per.connection=5`
- `acks=all`

> **Default in Kafka 3.0+**
> Idempotent producers are enabled by default in Kafka 3.0 and later versions. This provides better out-of-the-box reliability without requiring explicit configuration.

## How Kafka achieves idempotency

Kafka uses two key mechanisms to ensure idempotency:

### 1. Producer ID (PID)

Each producer instance gets a unique Producer ID from the broker:
- Assigned when producer starts up
- Valid for the lifetime of the producer session
- Used to track message sequences

### 2. Sequence numbers

Each message gets a sequence number per topic-partition:
- Starts at 0 for each producer-topic-partition combination
- Incremented for each message sent
- Used by broker to detect duplicates

![Kafka Idempotent Producer Sequence Numbers](https://www.conduktor.io/assets/kafka/Adv-Idempotent-Producer-1.png)

## How deduplication works

When a broker receives a message, it checks:

| Scenario | Action |
|----------|--------|
| Expected sequence | Message is written normally |
| Duplicate sequence | Message is discarded, success response sent |
| Out-of-order sequence | `OutOfOrderSequenceException` thrown |

```
Broker state: Producer 123, Partition 0, Last sequence: 42

Incoming message: Sequence 43 ✅ Accept
Incoming message: Sequence 42 ⚠️ Duplicate (ignore)
Incoming message: Sequence 45 ❌ Out of order (reject)
```

## Configuration requirements

### Required settings

```properties
enable.idempotence=true
acks=all                                    # Automatically set
retries=Integer.MAX_VALUE                   # Automatically set
max.in.flight.requests.per.connection=5     # Max value for idempotency
```

### Recommended production configuration

```properties
# Complete idempotent producer configuration
enable.idempotence=true
acks=all
retries=Integer.MAX_VALUE
max.in.flight.requests.per.connection=5
delivery.timeout.ms=120000
compression.type=snappy
batch.size=32768
linger.ms=5
```

## Performance implications

| Configuration | Throughput | Latency | Duplicates |
|---------------|------------|---------|------------|
| No idempotency, retries=0 | Highest | Lowest | None (data loss possible) |
| No idempotency, retries>0 | High | Medium | Possible |
| Idempotent producer | Medium-High | Medium | None |

### Trade-offs

**Benefits:**
- Exactly-once semantics for producer
- Simplified error handling
- Better reliability

**Costs:**
- Memory overhead on broker for sequence state
- Slightly higher latency for sequence checking
- Max 5 in-flight requests per connection

## Error handling

### Retriable errors (automatic)

Idempotent producers automatically retry:
- `TimeoutException`
- `RetriableException`
- Network connectivity issues
- Broker leadership changes

### Non-retriable errors (require handling)

```java
try {
    producer.send(record).get();
} catch (OutOfOrderSequenceException e) {
    // Sequence numbers are wrong - producer is in bad state
    producer.close();
    // Create new producer
} catch (UnknownProducerIdException e) {
    // Producer ID expired - recreate producer
    producer.close();
}
```

## Limitations

> **Producer restarts**
> If a producer application restarts, it will get a new Producer ID and sequence numbers reset to 0. This means potential duplicates across application restarts, even with idempotency enabled.

| Limitation | Description |
|------------|-------------|
| **Session-based** | Idempotency only guaranteed within single producer session |
| **Partition scope** | No deduplication across different partitions |
| **Topic scope** | No deduplication across different topics |
| **Memory** | Brokers maintain state per producer-partition |

## When to use idempotent producers

| Use case | Recommendation |
|----------|----------------|
| Production applications | Always recommended |
| Financial data | Essential |
| Audit logs | Essential |
| Metrics/logs | Recommended |
| Development/testing | Optional |

> **See it in practice with Conduktor**
> [Conduktor Console](https://docs.conduktor.io/guide/manage-kafka/kafka-resources/topics) lets you monitor topic messages and verify no duplicates are written. Use the message browser to inspect message headers and validate your idempotent producer configuration.

## Next steps

- [Configure message compression](https://www.conduktor.io/kafka/kafka-message-compression) to cut network and storage costs
- [Optimize producer batching](https://www.conduktor.io/kafka/kafka-producer-batching) for throughput
- [Configure producer retries](https://www.conduktor.io/kafka/kafka-producer-retries) for error handling
