Dead Letter Topics: Handling Poison Pills

Implement Kafka dead letter queues to isolate poison pill messages. Java error handling patterns, retry strategies, DLT monitoring, and reprocessing wor...

Stéphane DerosiauxStéphane Derosiaux · February 23, 2025 ·
Dead Letter Topics: Handling Poison Pills

A poison pill is a message that crashes your consumer every time. Without proper handling, one malformed record blocks the entire partition. Your consumer enters an infinite retry loop while downstream systems starve.

I've seen teams lose hours debugging these scenarios. The consumer crashes, restarts, fetches the same message, crashes again. The logs fill with stack traces. Production alerts fire continuously.

We added DLT handling after a single bad message blocked our payment processing for 45 minutes. Now poison pills route to dead letter in milliseconds.

SRE at a fintech company

What Makes a Poison Pill

Poison pills always fail, regardless of retries:

  • Deserialization failure: JSON with unexpected schema, Avro missing schema ID
  • Validation failure: Required field null, value out of range
  • Business logic exception: Division by zero, invalid state transition
  • Size limits: Payload too large for downstream system

These differ from transient errors (network timeouts, database connections) which heal on retry. The key distinction: transient errors heal, poison pills never do. Implementing data quality interceptors at the gateway level can catch many poison pills before they reach consumers.

The DLT Pattern

[Source Topic] → [Consumer] → [Database]
                     ↓
               (on failure)
                     ↓
              [Dead Letter Topic]

On permanent failure, publish the original record to a DLT with error metadata, commit the offset, move on.

Java Implementation

private void processWithRetry(ConsumerRecord<String, String> record) {
    Exception lastException = null;

    for (int attempt = 1; attempt <= maxRetries; attempt++) {
        try {
            processOrder(record.value());
            return;
        } catch (TransientException e) {
            lastException = e;
            backoff(attempt);
        } catch (PermanentException e) {
            sendToDeadLetter(record, e, 1);
            return;
        }
    }
    sendToDeadLetter(record, lastException, maxRetries);
}

private void sendToDeadLetter(ConsumerRecord<String, String> record, Exception e, int attempts) {
    ProducerRecord<String, byte[]> dltRecord = new ProducerRecord<>(dltTopic, record.key(), record.value().getBytes());
    dltRecord.headers()
        .add("dlt.original-topic", sourceTopic.getBytes())
        .add("dlt.original-offset", String.valueOf(record.offset()).getBytes())
        .add("dlt.exception-class", e.getClass().getName().getBytes())
        .add("dlt.retry-count", String.valueOf(attempts).getBytes());
    dltProducer.send(dltRecord, (metadata, ex) -> {
        if (ex != null) log.error("DLT write failed for offset {}: {}", record.offset(), ex.getMessage());
    });
}

Critical: After sending to DLT, commit the offset explicitly to avoid reprocessing the poison pill on restart. Use consumer.commitSync() for the specific partition/offset after DLT write succeeds.

Key decisions: Distinguish transient vs permanent failures. Preserve original metadata in headers. Handle DLT write failures.

Handling Deserialization Errors

Deserialization errors occur before your consumer code runs. Use ErrorHandlingDeserializer:

props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, ErrorHandlingDeserializer.class);
props.put(ErrorHandlingDeserializer.VALUE_DESERIALIZER_CLASS, JsonDeserializer.class);

Failed records return null with the exception in a header. Route them to DLT:

Header errorHeader = record.headers().lastHeader(ErrorHandlingDeserializer.VALUE_DESERIALIZER_EXCEPTION_HEADER);
if (errorHeader != null) {
    sendToDeadLetterRaw(record, errorHeader);
    continue;
}

Spring Kafka: @RetryableTopic

Spring Kafka 2.7+ provides declarative DLT handling:

@RetryableTopic(
    attempts = "4",
    backoff = @Backoff(delay = 1000, multiplier = 2.0),
    include = {TransientException.class},
    exclude = {ValidationException.class}
)
@KafkaListener(topics = "orders")
public void listen(Order order) {
    orderService.process(order);
}

@DltHandler
public void handleDlt(Order order, @Header(KafkaHeaders.EXCEPTION_MESSAGE) String error) {
    alertService.notifyPoisonPill(order, error);
}

First attempt fails → orders-retry-0 → second fails → orders-retry-1 → exhausted → orders-dlt@DltHandler processes.

Retry vs Skip

Error TypeExampleStrategy
PermanentSchema mismatch, validationSkip to DLT
TransientDatabase timeout, rate limitRetry with backoff
UnknownUnexpected exceptionRetry once, then DLT

Monitoring DLT Health

A growing DLT indicates systematic problems.

MetricWarningCritical
DLT messages/hour> 10> 100
DLT % of source> 0.1%> 1%
Query DLT headers to categorize failures by exception type. Investigate root causes rather than just replaying.

Reprocessing Dead Letters

After fixing the root cause:

public void replayDeadLetters(Predicate<ConsumerRecord<String, byte[]>> filter) {
    for (ConsumerRecord<String, byte[]> record : dltRecords) {
        if (filter.test(record)) {
            String originalTopic = new String(record.headers().lastHeader("dlt.original-topic").value());
            producer.send(new ProducerRecord<>(originalTopic, record.key(), record.value()));
        }
    }
}

Dead letter handling is defensive programming. The goal isn't to hide failures—it's to contain them so one bad message doesn't take down your entire pipeline.

Book a demo to see how Conduktor Console surfaces DLT metrics and lets you inspect failed messages.