# Build Idempotent Kafka Consumers: Patterns That Actually Work

Your consumer will see the same message twice. This isn't a bug—it's how Kafka works.

I've debugged countless production incidents where teams assumed at-least-once meant exactly-once. The producer retries after a timeout. The consumer crashes before committing. A rebalance triggers reprocessing. Each scenario creates duplicates, and each team learns the hard way.

The solution isn't to fight Kafka's delivery semantics. It's to make your consumer idempotent: process the same message twice, get the same result once.

> *We stopped chasing exactly-once and embraced idempotent design. Our duplicate rate dropped to zero and our code got simpler.*
>
> *Senior Engineer at a European fintech*

## Why Duplicates Happen

Three scenarios cause duplicates:

**Producer retries:** A producer sends a message, the broker writes it, but the acknowledgment gets lost. The producer retries, creating a duplicate.

**Consumer crashes:** Your consumer processes a message, calls an external API, then crashes before committing the offset. After restart, Kafka redelivers.

**Rebalancing:** During [consumer group](https://docs.conduktor.io/guide/monitor-brokers-apps) rebalances, partitions move between consumers. Messages processed but not committed get redelivered.

Kafka's idempotent producer (`enable.idempotence=true`) only prevents duplicates at the broker level. Consumer-side duplicates are your problem.

## The Pattern

An idempotent consumer tracks which messages it has processed:

```java
public void consume(ConsumerRecord<String, OrderEvent> record) {
    String key = extractIdempotencyKey(record);

    if (deduplicationStore.hasProcessed(key)) {
        log.debug("Skipping duplicate: {}", key);
        return;
    }

    processOrder(record.value());
    deduplicationStore.markProcessed(key);
}
```

The hard part is making this atomic. If you check, process, and mark in three steps, a crash between any of them creates inconsistency.

## Designing Idempotency Keys

The key must be unique per logical operation, stable across retries, and ideally generated by the producer.

**Option 1: Client-generated UUID** — The safest approach. Generate a UUID at the API layer that flows through the system.

**Option 2: Composite business key** — Derive from business attributes: `customerId:orderId:timestamp`. Works when the combination is truly unique.

**Option 3: Kafka coordinates** — Use `topic-partition-offset` as a key. Simple but breaks if you replay from a different topic.

## Pattern 1: Database Constraint

The simplest pattern. Use a unique constraint to reject duplicates.

```sql
CREATE TABLE processed_messages (
    idempotency_key VARCHAR(255) PRIMARY KEY,
    processed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
```

In your consumer:

```java
@Transactional(isolation = Isolation.READ_COMMITTED)
public void processOrder(ConsumerRecord<String, OrderEvent> record) {
    String key = extractIdempotencyKey(record);

    try {
        processedMessageRepo.save(new ProcessedMessage(key));
        orderService.createOrder(record.value());
    } catch (DataIntegrityViolationException e) {
        log.info("Duplicate detected: {}", key);
    }
}
```

The `@Transactional` wrapper ensures atomicity. If the order creation fails, the deduplication record also rolls back.

**Isolation note:** Without proper isolation, concurrent consumers processing the same duplicate can race between check and insert. Use `READ_COMMITTED` or higher. For high-concurrency scenarios, consider `SELECT FOR UPDATE` or application-level locking.

**Tradeoff:** The deduplication table becomes a write hotspot under high load. For high-throughput services, consider Redis.

## Pattern 2: Redis SETNX

For higher throughput, use Redis with `SETNX`:

```java
public boolean tryProcess(String idempotencyKey) {
    Boolean wasSet = redis.opsForValue()
        .setIfAbsent("dedup:" + idempotencyKey, "1", Duration.ofDays(7));
    return Boolean.TRUE.equals(wasSet);
}
```

The TTL handles cleanup automatically. Set it longer than your maximum expected replay window.

**Redis persistence:** Configure Redis with AOF (`appendonly yes`, `appendfsync everysec`) to survive restarts. With RDB-only snapshots, a Redis crash between snapshots loses recent deduplication state, causing duplicates. For critical workloads, use Redis Cluster with replication.

**Tradeoff:** Redis adds a network hop and infrastructure dependency. If Redis is unavailable, your consumer stops. Consider whether this availability tradeoff works for your use case.

## Pattern 3: Natural Idempotency

When your business logic supports upserts, you might not need a separate deduplication store:

```sql
INSERT INTO inventory (product_id, quantity, event_version)
VALUES (?, ?, ?)
ON CONFLICT (product_id)
DO UPDATE SET quantity = EXCLUDED.quantity
WHERE inventory.event_version < EXCLUDED.event_version;
```

The `WHERE` clause prevents out-of-order events from overwriting newer data.

This works for state replacement events. It doesn't work for delta operations ("add 10 to quantity") or side effects (sending emails).

## Choosing a Pattern

| Pattern | Throughput | Best For |
|---------|------------|----------|
| Database constraint | Low-Medium | CRUD services, existing DB |
| Redis SETNX | High | Stateless services |
| Natural upsert | Medium-High | State replacement events |

For most services that read from Kafka and write to a database, the database constraint pattern is sufficient and simpler than full exactly-once transactions.

## The Deduplication Window

Your window must be longer than your maximum expected delay. Consider:

- How long might your consumer be down?
- Will you ever replay from hours or days ago?
- What's your producer's retry behavior?

Seven days is common. Balance storage costs against your replay requirements.

## Testing Idempotency

Always verify your consumer handles duplicates:

```java
@Test
void shouldHandleDuplicateMessages() {
    String key = UUID.randomUUID().toString();
    ConsumerRecord<String, OrderEvent> record = createRecord(key);

    processor.process(record);
    processor.process(record);

    assertThat(orderRepository.findAll()).hasSize(1);
}
```

Test concurrent duplicates too. Race conditions hide until production.

## At-Least-Once + Idempotent = Effectively Exactly-Once

The combination of at-least-once delivery and idempotent consumers gives you effectively exactly-once processing:

```
At-least-once + Idempotent consumer = Effectively exactly-once
```

This is simpler than Kafka's transactional exactly-once, which requires transactional producers, `read_committed` consumers, and two-phase commit coordination.

For most services, idempotent consumers are sufficient. Build your consumers to handle duplicates, and you'll stop worrying about Kafka's delivery semantics.

[Book a demo](https://www.conduktor.io/contact/demo) to see how Conduktor Console helps you trace message flow and identify deduplication issues across your Kafka clusters.