Kafka Streams vs the consumer API
Kafka Streams vs the consumer API: Streams is built on the consumer, not a replacement. When a plain consumer is enough, and when Streams pays off.
Decide whether you need Kafka Streams or just a consumer.
Almost everyone starting with stream processing hits this fork: do I reach for Kafka Streams, or is a plain consumer enough? The framing that trips people up is treating them as rivals. They aren't. Kafka Streams is built directly on top of the consumer and producer clients: it's not a different way to talk to Kafka, it's a higher-level processing layer sitting on the same transport.
So the real question isn't "which transport is better." It's "how much of the processing machinery do I want to write myself?" This guide answers that honestly, with the same logic side by side so you can see exactly what the library buys you.
What you'll learn:
- Why Kafka Streams is built on the consumer, not instead of it
- When a plain consumer is the right, simpler choice
- When Kafka Streams earns its operational cost
- The same job written both ways, line for line
Kafka Streams sits on top of the consumer
Open up a running Kafka Streams app and look at what it's doing on the wire: it polls partitions with a consumer, it commits offsets, it produces results with a producer, and its instances form a consumer group that splits partitions between them. That's the consumer API: Streams didn't replace it, it wrapped it.
What the library adds is everything above the poll loop: a topology you describe declaratively, local state that survives restarts, event-time windowing, and atomic read-process-write. You get those by configuring a topology instead of hand-coding them against poll().
🚫 "We'll move off Kafka Streams to plain consumers to simplify, it's just a consumer anyway."
Dropping Streams for a raw consumer doesn't remove complexity from a stateful job: it relocates it into your code. The state store, the windowing, the exactly-once cycle, the rebalance handling: they don't disappear, you now own and maintain them by hand. Simplify by removing Streams only when the job is genuinely stateless.
This is why "Kafka Streams vs consumer" is the wrong axis. You are not choosing between two transports: both read and write Kafka the same way. You're choosing whether to write the processing layer yourself or let the library write it for you. Frame the decision as plain consumer plus your own code versus Kafka Streams, and the trade-off gets clear fast.
When the plain consumer is enough
Reach for a bare consumer (and producer, if you emit results) when your work is simple and stateless. Specifically:
- Per-record consume-then-act, no memory between records. Read a message, do one thing with it, move on. No counts, no running totals, nothing remembered across records.
- The "act" is not stream processing. Write the record to a database, call an HTTP service, push to a queue, send an email. The interesting work happens outside Kafka, and you just need records delivered to your code.
- No joins, aggregations, or windows. The moment you need to combine two streams or group by time, you're rebuilding the library.
- You want full control of the poll loop. Custom commit timing, manual partition assignment, your own threading model, careful backpressure: the consumer hands you the raw controls; Streams hides them on purpose.
A consumer that reads an orders topic and inserts each order into Postgres is a textbook plain-consumer job. Adding Kafka Streams there buys you nothing and costs you the Streams runtime and its rebalance behavior. A stateless topology creates no RocksDB store and no internal topics, but the moment anyone adds a stateful step, both appear and you operate them.
When Kafka Streams pays off
Reach for Kafka Streams when you need the things you'd otherwise reinvent (usually badly) on top of a consumer:
- Stateful operations. Counts, sums, rollups, and joins backed by a local state store that survives a crash. Hand-rolling fault-tolerant local state on a raw consumer is a project in itself.
- Joins across streams. Enriching an event stream against a reference table, or correlating two streams within a time window. (More on doing this without Streams below. Short version: you don't want to.)
- Event-time windowing. Grouping by when events happened, with late-data handling, instead of when they arrived.
- Exactly-once across read-process-write. One config flag makes the consume → transform → produce → commit cycle atomic (exactly-once). Replicating this by hand against the transactional producer API is error-prone.
- Scaling and fault tolerance you don't write. Partitions, tasks, and state migrate between instances on their own. With a raw consumer you'd write the rebalance and state-handoff logic yourself.
The pattern: if your processing is stateful, time-aware, or needs strong delivery guarantees, the consumer-plus-DIY route slowly converges on a worse version of Kafka Streams. Use the real one.
The same job, both ways
Here's a stateless filter (pass through only paid orders) written as a raw consumer and as a Kafka Streams topology. Both run on Apache Kafka 3.9.0.
Plain consumer and producer:
try (var consumer = new KafkaConsumer<String, String>(consumerProps);
var producer = new KafkaProducer<String, String>(producerProps)) {
consumer.subscribe(List.of("orders"));
while (true) {
var records = consumer.poll(Duration.ofMillis(100));
for (var record : records) {
if (record.value().contains("\"status\":\"PAID\"")) {
producer.send(new ProducerRecord<>("paid-orders", record.key(), record.value()));
}
}
producer.flush();
consumer.commitSync();
}
} Note the producer.flush() before the commit. send() is asynchronous, so committing offsets first means a crash can silently drop PAID records. That ordering bug is exactly the kind of detail you own on the raw clients.
Kafka Streams:
StreamsBuilder builder = new StreamsBuilder();
builder.<String, String>stream("orders")
.filter((key, order) -> order.contains("\"status\":\"PAID\""))
.to("paid-orders");
KafkaStreams streams = new KafkaStreams(builder.build(), props);
streams.start(); For this job (one stateless filter) they're roughly even, and the consumer is arguably more transparent. That's the point: a one-line stateless transform is not a reason to adopt a framework.
Now imagine the next requirement lands: count paid orders per customer per hour, exactly once, and survive a restart without losing the counts. On the consumer you start writing a state store, a flush-and-commit protocol, windowing, transactional sends, and rebalance-aware state handoff. On Streams it's a groupByKey().windowedBy(...).count() and one processing.guarantee flag. That is where the library stops being optional.
Side by side
| Dimension | Plain consumer (+ producer) | Kafka Streams |
|---|---|---|
| What it is | Low-level client: poll, process, commit | Processing library built on the consumer/producer |
| Transport to Kafka | Consumer group, offsets | The same consumer group and offsets, underneath |
| State / aggregations | You build and persist it yourself | Built-in state stores with changelog backup |
| Joins & windows | Hand-rolled (hard, error-prone) | First-class joins and windowing |
| Exactly-once read-process-write | DIY with the transactional producer | One config flag |
| Scaling / failover | You write rebalance & state handoff | Tasks and state migrate automatically |
| Control of the poll loop | Full: commit timing, threading, assignment | Abstracted away on purpose |
| Operational surface | Just a consumer group | + RocksDB, internal topics, restore once stateful |
| Best for | Stateless per-record work; act outside Kafka | Stateful, time-aware, exactly-once processing |
--changelog and -repartition), and rebalance/restore behavior. Pay that price for state, joins, and exactly-once, not for a filter, which creates none of it. Is Kafka Streams just a consumer?
Under the hood, mostly yes: a Kafka Streams app uses the consumer and producer clients and forms a consumer group like any other. What it adds sits above the poll loop: a declarative topology, fault-tolerant local state, windowing, and exactly-once. So it's the consumer plus a processing layer, not a separate transport.
When do I need Kafka Streams over a consumer?
When your processing is stateful (counts, aggregations, joins), time-aware (windowing on event time), or needs exactly-once across the read-process-write cycle. Those are the parts you'd otherwise build by hand on top of a consumer. For simple, stateless, per-record work, a plain consumer is the lighter choice.
Does Kafka Streams replace the consumer API?
No. It's built on the consumer and producer APIs and uses them internally; it doesn't deprecate or replace them. Plenty of production systems keep using plain consumers for stateless work and reach for Kafka Streams only where state, joins, or exactly-once are needed.
Can I do joins with a plain consumer?
Technically yes, but you'd have to maintain the lookup data in your own state store, keep it co-partitioned with the stream, handle restarts and rebalances, and deal with timing, which is exactly what Kafka Streams joins do for you. For anything beyond a trivial in-memory lookup, a plain-consumer join is a maintenance trap.
Is a Kafka Streams app harder to operate than a consumer?
Yes, and that's the trade-off. A consumer is just a consumer group. A stateful Streams app adds a RocksDB state store, auto-created changelog and repartition topics, and rebalance/restore behavior you have to watch. A stateless one stays close to a plain consumer group, which is exactly why a plain consumer is enough there.
See it in practice with Conduktor
Both a plain consumer and a Kafka Streams app show up on the cluster the same way: as a consumer group with lag and committed offsets. Conduktor Console lets you watch that consumer group's lag, inspect committed offsets, and, for a stateful Streams app, see the extra changelog and repartition topics it creates. That side-by-side view is the quickest way to confirm what your processing is actually doing on Kafka, whichever approach you chose.
Next steps
- What is Kafka Streams?: the library model, in depth
- Build your first Kafka Streams app: a runnable WordCount in Java
- Kafka Streams state stores: the local state you'd otherwise hand-roll