Kafka Retention: When Messages Disappear and Why

Master Kafka retention—retention.ms, segment.ms, and why messages persist longer than expected. Debug commands included.

Stéphane Derosiaux · May 6, 2024 ·

Kafka Retention: When Messages Disappear and Why

"My retention is set to 10 minutes but I see messages from an hour ago." I hear this weekly. Kafka retention is simple in concept, confusing in practice.

The critical insight: Kafka deletes segments, not messages. The active segment (currently receiving writes) is never deleted, regardless of how old the messages inside it are.

We set 1-hour retention but ran out of disk after a week. Turns out our low-throughput topics had segment.ms at 7 days. Segments weren't rolling, so nothing got deleted.
SRE at a logistics company

How Retention Actually Works

Kafka stores data in append-only log segments. Retention applies to closed segments only.

Background thread runs every 5 minutes (log.retention.check.interval.ms)
Identifies closed segments eligible for deletion
Marks for deletion after 1-minute delay

Your actual retention can exceed configured value by: segment roll time (up to 7 days default) + check interval (5 min) + deletion delay (1 min).

Why Messages Persist Longer Than Expected

Cause 1: Active segment never rolls

Low-throughput topics may never fill a segment (default 1 GB) or age past segment.ms (default 7 days).

# For 1-hour retention, roll segments every 30 minutes
kafka-configs.sh --bootstrap-server localhost:9092 \
  --entity-type topics --entity-name my-topic \
  --alter --add-config retention.ms=3600000,segment.ms=1800000

Cause 2: Segment contains old and new messages

Segment deletion uses the last message's timestamp. A segment with messages from 10:00 and 10:50, with 1-hour retention, is eligible at 11:50. The 10:00 message persists 1 hour 50 minutes.

Cause 3: Future timestamps

Producer clock skew sends messages with future timestamps. Those segments won't be eligible until that future time passes.

Fix: Use broker-controlled timestamps:

kafka-configs.sh --bootstrap-server localhost:9092 \
  --entity-type topics --entity-name my-topic \
  --alter --add-config message.timestamp.type=LogAppendTime

Time vs Size Retention

When both are configured, the most restrictive wins:

Scenario	retention.ms	retention.bytes	Behavior
Time only	7 days	-1 (unlimited)	Delete when > 7 days old
Size only	-1	10 GB	Delete when partition > 10 GB
Both	7 days	10 GB	Delete if > 7 days OR > 10 GB

Important: retention.bytes applies per partition. A topic with 10 partitions and 10 GB retention can use 100 GB total.

Check Your Settings

You can verify retention settings via CLI or through a topic management UI that shows all topic configurations at a glance.

kafka-configs.sh --bootstrap-server localhost:9092 \
  --entity-type topics --entity-name my-topic \
  --describe --all | grep -E "(retention|segment)"

Note: Topic properties use retention.ms, not log.retention.ms. The log. prefix is broker-only.

Common Mistakes

Setting retention shorter than segment roll:

# WRONG: Messages live at least 7 days despite 1-hour retention
retention.ms=3600000
segment.ms=604800000  # default: 7 days

Forgetting retention.bytes is per partition:

A topic with retention.bytes=10GB and 50 partitions can consume 500 GB.

Not accounting for replication:

Total storage = partition size × partitions × replication factor. A 100 GB topic with RF=3 uses 300 GB.

Verify Segments on Disk

ls -la /var/kafka/data/my-topic-0/
# Look at file timestamps to see when segments closed

Retention is a minimum guarantee, not a maximum. Expect messages to live retention.ms + segment.ms + check.interval in the worst case. A cluster that "should" use 500 GB based on throughput × retention might actually use 800 GB due to segment timing.

Book a demo to see how Conduktor Console shows storage usage, retention settings, and segment counts for every topic.