# Kafka as a Database: When to Use Compacted Topics for State

Compacted topics turn Kafka from a message transport into a state layer. You get key-value semantics, durable storage, and built-in replication.

But Kafka is not a database. I've watched teams learn this the hard way—building query patterns that work in development and collapse in production.

> *We built our entire user profile system on compacted topics. Worked great until we hit 10 million users and every "lookup" required scanning from offset zero.*
>
> *Tech Lead at a consumer app*

## How Compaction Works

Standard Kafka topics are append-only with time-based retention. Compacted topics retain the latest value for each key indefinitely.

```bash
kafka-topics --bootstrap-server localhost:9092 \
  --create --topic user-profiles \
  --config cleanup.policy=compact \
  --config min.cleanable.dirty.ratio=0.1
```

You can also [create and configure topics visually](https://docs.conduktor.io/guide/manage-kafka/kafka-resources/topics) instead of managing CLI commands. Produce multiple updates for the same key. Before compaction, all messages exist. After compaction, only the latest survives.

## What You Get (and Don't Get)

**Compacted topics give you:**
- Latest value per key
- Durability across broker failures
- Ordering within partition
- Tombstone support (delete by sending null)
- Replay capability from offset 0

**Compacted topics don't give you:**
- Point queries (`SELECT * FROM topic WHERE key = X`)
- Indexes
- Transactions across keys
- Read-after-write guarantees

The fundamental limitation: every "query" is a full topic scan.

## The Pattern That Works: KTables

The architecture that makes compacted topics useful isn't reading them directly. It's materializing them into a local store.

```java
KTable<String, UserProfile> users = builder.table(
    "user-profiles",
    Materialized.as("users-store")
);

// Fast local lookup
ReadOnlyKeyValueStore<String, UserProfile> store =
    streams.store(StoreQueryParameters.fromNameAndType(
        "users-store", QueryableStoreTypes.keyValueStore()));

UserProfile user = store.get("user-123");  // Milliseconds, not minutes
```

The compacted topic is the source of truth. The local RocksDB store is a cache. On restart, Kafka Streams replays the topic to rebuild the store.

This is the "Kafka as database" pattern that actually works.

## Configuration for State Stores

```bash
kafka-topics --bootstrap-server localhost:9092 \
  --create --topic state-changelog \
  --config cleanup.policy=compact \
  --config min.cleanable.dirty.ratio=0.1 \
  --config segment.ms=300000 \
  --config max.compaction.lag.ms=86400000
```

| Parameter | Value | Effect |
|-----------|-------|--------|
| `min.cleanable.dirty.ratio` | 0.1 | Compact when 10% is duplicates |
| `segment.ms` | 300000 | Roll segments quickly |
| `max.compaction.lag.ms` | 86400000 | Force compaction within 24h |

**Tradeoff:** Lower ratio means more frequent compaction but higher broker CPU.

## Good Fit vs Poor Fit

**Good fit:**
- CDC changelog topics (row state keyed by primary key)
- Configuration distribution
- Kafka Streams state stores
- Entity snapshots for downstream consumers

**Poor fit:**
- Point queries at scale
- Complex queries (filtering, joining)
- High-cardinality random access
- Low-latency reads without materialization

## Common Errors

**Null keys rejected:**
```text
Compacted topic cannot accept message without key
```

Compaction requires keys. Every producer must set one.

**Compaction not running:** Check `segment.ms`. Compaction only runs on closed segments. Low-throughput topics may keep segments open for days.

## The Hybrid Pattern

For production systems needing both Kafka's durability and database queries:

```
Producer → Compacted Topic → Kafka Streams → Local RocksDB
              (truth)         (materialize)    (fast lookups)
```

The topic is the log. Everything else is a derived view. If the downstream store fails, rebuild from the topic.

Compacted topics are powerful when used correctly. They're not a database replacement—they're a durable, replayable source of truth that feeds databases, caches, and local stores.

[Book a demo](https://www.conduktor.io/contact/demo) to see how Conduktor Console provides visual configuration management and compaction metrics.
