# Dead Letter Topics: Handling Poison Pills

A poison pill is a message that crashes your consumer every time. Without proper handling, one malformed record blocks the entire partition. Your consumer enters an infinite retry loop while downstream systems starve.

I've seen teams lose hours debugging these scenarios. The consumer crashes, restarts, fetches the same message, crashes again. The logs fill with stack traces. Production alerts fire continuously.

> *We added DLT handling after a single bad message blocked our payment processing for 45 minutes. Now poison pills route to dead letter in milliseconds.*
>
> *SRE at a fintech company*

## What Makes a Poison Pill

Poison pills always fail, regardless of retries:

- **Deserialization failure:** JSON with unexpected schema, Avro missing schema ID
- **Validation failure:** Required field null, value out of range
- **Business logic exception:** Division by zero, invalid state transition
- **Size limits:** Payload too large for downstream system

These differ from transient errors (network timeouts, database connections) which heal on retry. The key distinction: **transient errors heal, poison pills never do.** Implementing [data quality interceptors](https://docs.conduktor.io/guide/conduktor-concepts/data-quality-policies) at the gateway level can catch many poison pills before they reach consumers.

## The DLT Pattern

```text
[Source Topic] → [Consumer] → [Database]
                     ↓
               (on failure)
                     ↓
              [Dead Letter Topic]
```

On permanent failure, publish the original record to a DLT with error metadata, commit the offset, move on.

## Java Implementation

```java
private void processWithRetry(ConsumerRecord<String, String> record) {
    Exception lastException = null;

    for (int attempt = 1; attempt <= maxRetries; attempt++) {
        try {
            processOrder(record.value());
            return;
        } catch (TransientException e) {
            lastException = e;
            backoff(attempt);
        } catch (PermanentException e) {
            sendToDeadLetter(record, e, 1);
            return;
        }
    }
    sendToDeadLetter(record, lastException, maxRetries);
}

private void sendToDeadLetter(ConsumerRecord<String, String> record, Exception e, int attempts) {
    ProducerRecord<String, byte[]> dltRecord = new ProducerRecord<>(dltTopic, record.key(), record.value().getBytes());
    dltRecord.headers()
        .add("dlt.original-topic", sourceTopic.getBytes())
        .add("dlt.original-offset", String.valueOf(record.offset()).getBytes())
        .add("dlt.exception-class", e.getClass().getName().getBytes())
        .add("dlt.retry-count", String.valueOf(attempts).getBytes());
    dltProducer.send(dltRecord, (metadata, ex) -> {
        if (ex != null) log.error("DLT write failed for offset {}: {}", record.offset(), ex.getMessage());
    });
}
```

**Critical:** After sending to DLT, commit the offset explicitly to avoid reprocessing the poison pill on restart. Use `consumer.commitSync()` for the specific partition/offset after DLT write succeeds.

**Key decisions:** Distinguish transient vs permanent failures. Preserve original metadata in headers. Handle DLT write failures.

## Handling Deserialization Errors

Deserialization errors occur before your consumer code runs. Use `ErrorHandlingDeserializer`:

```java
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, ErrorHandlingDeserializer.class);
props.put(ErrorHandlingDeserializer.VALUE_DESERIALIZER_CLASS, JsonDeserializer.class);
```

Failed records return `null` with the exception in a header. Route them to DLT:

```java
Header errorHeader = record.headers().lastHeader(ErrorHandlingDeserializer.VALUE_DESERIALIZER_EXCEPTION_HEADER);
if (errorHeader != null) {
    sendToDeadLetterRaw(record, errorHeader);
    continue;
}
```

## Spring Kafka: @RetryableTopic

Spring Kafka 2.7+ provides declarative DLT handling:

```java
@RetryableTopic(
    attempts = "4",
    backoff = @Backoff(delay = 1000, multiplier = 2.0),
    include = {TransientException.class},
    exclude = {ValidationException.class}
)
@KafkaListener(topics = "orders")
public void listen(Order order) {
    orderService.process(order);
}

@DltHandler
public void handleDlt(Order order, @Header(KafkaHeaders.EXCEPTION_MESSAGE) String error) {
    alertService.notifyPoisonPill(order, error);
}
```

First attempt fails → `orders-retry-0` → second fails → `orders-retry-1` → exhausted → `orders-dlt` → `@DltHandler` processes.

## Retry vs Skip

| Error Type | Example | Strategy |
|------------|---------|----------|
| Permanent | Schema mismatch, validation | Skip to DLT |
| Transient | Database timeout, rate limit | Retry with backoff |
| Unknown | Unexpected exception | Retry once, then DLT |

## Monitoring DLT Health

A growing DLT indicates systematic problems.

| Metric | Warning | Critical |
|--------|---------|----------|
| DLT messages/hour | > 10 | > 100 |
| DLT % of source | > 0.1% | > 1% |

Query DLT headers to categorize failures by exception type. Investigate root causes rather than just replaying.

## Reprocessing Dead Letters

After fixing the root cause:

```java
public void replayDeadLetters(Predicate<ConsumerRecord<String, byte[]>> filter) {
    for (ConsumerRecord<String, byte[]> record : dltRecords) {
        if (filter.test(record)) {
            String originalTopic = new String(record.headers().lastHeader("dlt.original-topic").value());
            producer.send(new ProducerRecord<>(originalTopic, record.key(), record.value()));
        }
    }
}
```

Dead letter handling is defensive programming. The goal isn't to hide failures—it's to contain them so one bad message doesn't take down your entire pipeline.

[Book a demo](https://www.conduktor.io/contact/demo) to see how Conduktor Console surfaces DLT metrics and lets you inspect failed messages.
