# Kafka Streams vs the consumer API

*Decide whether you need Kafka Streams or just a consumer.*

Almost everyone starting with stream processing hits this fork: *do I reach for Kafka Streams, or is a plain [consumer](https://www.conduktor.io/kafka/kafka-consumers) enough?* The framing that trips people up is treating them as rivals. They aren't. Kafka Streams is built directly on top of the consumer and producer clients: it's not a different way to talk to Kafka, it's a higher-level processing layer sitting on the same transport.

So the real question isn't "which transport is better." It's "how much of the processing machinery do I want to write myself?" This guide answers that honestly, with the same logic side by side so you can see exactly what the library buys you.

**What you'll learn:**
- Why Kafka Streams is built *on* the consumer, not instead of it
- When a plain consumer is the right, simpler choice
- When Kafka Streams earns its operational cost
- The same job written both ways, line for line

## Kafka Streams sits on top of the consumer

Open up a running Kafka Streams app and look at what it's doing on the wire: it polls partitions with a [consumer](https://www.conduktor.io/kafka/kafka-consumers), it commits offsets, it produces results with a producer, and its instances form a [consumer group](https://www.conduktor.io/kafka/kafka-consumer-groups-and-consumer-offsets) that splits partitions between them. That's the consumer API: Streams didn't replace it, it wrapped it.

What the library adds is everything *above* the poll loop: a topology you describe declaratively, local state that survives restarts, event-time windowing, and atomic read-process-write. You get those by configuring a topology instead of hand-coding them against `poll()`.

> 🚫 *"We'll move off Kafka Streams to plain consumers to simplify, it's just a consumer anyway."*

Dropping Streams for a raw consumer doesn't remove complexity from a stateful job: it relocates it into your code. The state store, the windowing, the exactly-once cycle, the rebalance handling: they don't disappear, you now own and maintain them by hand. Simplify by removing Streams only when the job is genuinely stateless.

> **This is why "Kafka Streams vs consumer" is the wrong axis.** You are not choosing between two transports: both read and write Kafka the same way. You're choosing whether to write the processing layer yourself or let the library write it for you. Frame the decision as *plain consumer plus your own code* versus *Kafka Streams*, and the trade-off gets clear fast.

## When the plain consumer is enough

Reach for a bare [consumer](https://www.conduktor.io/kafka/kafka-consumers) (and producer, if you emit results) when your work is simple and stateless. Specifically:

- **Per-record consume-then-act, no memory between records.** Read a message, do one thing with it, move on. No counts, no running totals, nothing remembered across records.
- **The "act" is not stream processing.** Write the record to a database, call an HTTP service, push to a queue, send an email. The interesting work happens *outside* Kafka, and you just need records delivered to your code.
- **No joins, aggregations, or windows.** The moment you need to combine two streams or group by time, you're rebuilding the library.
- **You want full control of the poll loop.** Custom commit timing, manual partition assignment, your own threading model, careful backpressure: the consumer hands you the raw controls; Streams hides them on purpose.

A consumer that reads an `orders` topic and inserts each order into Postgres is a textbook plain-consumer job. Adding Kafka Streams there buys you nothing and costs you the Streams runtime and its [rebalance](https://www.conduktor.io/kafka-streams/rebalancing) behavior. A stateless topology creates no RocksDB store and no internal topics, but the moment anyone adds a stateful step, both appear and you operate them.

## When Kafka Streams pays off

Reach for [Kafka Streams](https://www.conduktor.io/kafka-streams) when you need the things you'd otherwise reinvent (usually badly) on top of a consumer:

- **Stateful operations.** Counts, sums, rollups, and joins backed by a local [state store](https://www.conduktor.io/kafka-streams/state-store) that survives a crash. Hand-rolling fault-tolerant local state on a raw consumer is a project in itself.
- **Joins across streams.** Enriching an event stream against a reference table, or correlating two streams within a time window. (More on doing this without Streams below. Short version: you don't want to.)
- **Event-time windowing.** Grouping by when events *happened*, with late-data handling, instead of when they arrived.
- **Exactly-once across read-process-write.** One config flag makes the consume → transform → produce → commit cycle atomic ([exactly-once](https://www.conduktor.io/kafka-streams/exactly-once)). Replicating this by hand against the transactional producer API is error-prone.
- **Scaling and fault tolerance you don't write.** Partitions, tasks, and state migrate between instances on their own. With a raw consumer you'd write the rebalance and state-handoff logic yourself.

The pattern: if your processing is stateful, time-aware, or needs strong delivery guarantees, the consumer-plus-DIY route slowly converges on a worse version of Kafka Streams. Use the real one.

## The same job, both ways

Here's a stateless filter (pass through only paid orders) written as a raw consumer and as a Kafka Streams topology. Both run on Apache Kafka 3.9.0.

Plain consumer and producer:

```java
try (var consumer = new KafkaConsumer<String, String>(consumerProps);
     var producer = new KafkaProducer<String, String>(producerProps)) {
    consumer.subscribe(List.of("orders"));
    while (true) {
        var records = consumer.poll(Duration.ofMillis(100));
        for (var record : records) {
            if (record.value().contains("\"status\":\"PAID\"")) {
                producer.send(new ProducerRecord<>("paid-orders", record.key(), record.value()));
            }
        }
        producer.flush();
        consumer.commitSync();
    }
}
```

Note the `producer.flush()` before the commit. `send()` is asynchronous, so committing offsets first means a crash can silently drop PAID records. That ordering bug is exactly the kind of detail you own on the raw clients.

Kafka Streams:

```java
StreamsBuilder builder = new StreamsBuilder();
builder.<String, String>stream("orders")
    .filter((key, order) -> order.contains("\"status\":\"PAID\""))
    .to("paid-orders");

KafkaStreams streams = new KafkaStreams(builder.build(), props);
streams.start();
```

For *this* job (one stateless filter) they're roughly even, and the consumer is arguably more transparent. That's the point: a one-line stateless transform is not a reason to adopt a framework.

Now imagine the next requirement lands: *count paid orders per customer per hour, exactly once, and survive a restart without losing the counts.* On the consumer you start writing a state store, a flush-and-commit protocol, windowing, transactional sends, and rebalance-aware state handoff. On Streams it's a `groupByKey().windowedBy(...).count()` and one `processing.guarantee` flag. **That** is where the library stops being optional.

## Side by side

| Dimension | Plain consumer (+ producer) | Kafka Streams |
|---|---|---|
| **What it is** | Low-level client: poll, process, commit | Processing library built *on* the consumer/producer |
| **Transport to Kafka** | Consumer group, offsets | The same consumer group and offsets, underneath |
| **State / aggregations** | You build and persist it yourself | Built-in [state stores](https://www.conduktor.io/kafka-streams/state-store) with changelog backup |
| **Joins & windows** | Hand-rolled (hard, error-prone) | First-class [joins](https://www.conduktor.io/kafka-streams/joins) and [windowing](https://www.conduktor.io/kafka-streams/windowing) |
| **Exactly-once read-process-write** | DIY with the transactional producer | One [config flag](https://www.conduktor.io/kafka-streams/exactly-once) |
| **Scaling / failover** | You write rebalance & state handoff | Tasks and state migrate automatically |
| **Control of the poll loop** | Full: commit timing, threading, assignment | Abstracted away on purpose |
| **Operational surface** | Just a consumer group | + RocksDB, internal topics, [restore](https://www.conduktor.io/kafka-streams/state-restore) once stateful |
| **Best for** | Stateless per-record work; act *outside* Kafka | Stateful, time-aware, exactly-once processing |

Read the last two rows together: Streams adds power *and* operational surface. The moment your topology is stateful you take on RocksDB memory, auto-created changelog and repartition topics (named `<application.id>-<store-name>-changelog` and `-repartition`), and rebalance/restore behavior. Pay that price for state, joins, and exactly-once, not for a filter, which creates none of it.

**Is Kafka Streams just a consumer?**

Under the hood, mostly yes: a Kafka Streams app uses the consumer and producer clients and forms a consumer group like any other. What it adds sits above the poll loop: a declarative topology, fault-tolerant local state, windowing, and exactly-once. So it's the consumer plus a processing layer, not a separate transport.

**When do I need Kafka Streams over a consumer?**

When your processing is stateful (counts, aggregations, joins), time-aware (windowing on event time), or needs exactly-once across the read-process-write cycle. Those are the parts you'd otherwise build by hand on top of a consumer. For simple, stateless, per-record work, a plain consumer is the lighter choice.

**Does Kafka Streams replace the consumer API?**

No. It's built on the consumer and producer APIs and uses them internally; it doesn't deprecate or replace them. Plenty of production systems keep using plain consumers for stateless work and reach for Kafka Streams only where state, joins, or exactly-once are needed.

**Can I do joins with a plain consumer?**

Technically yes, but you'd have to maintain the lookup data in your own state store, keep it co-partitioned with the stream, handle restarts and rebalances, and deal with timing, which is exactly what Kafka Streams joins do for you. For anything beyond a trivial in-memory lookup, a plain-consumer join is a maintenance trap.

**Is a Kafka Streams app harder to operate than a consumer?**

Yes, and that's the trade-off. A consumer is just a consumer group. A stateful Streams app adds a RocksDB state store, auto-created changelog and repartition topics, and rebalance/restore behavior you have to watch. A stateless one stays close to a plain consumer group, which is exactly why a plain consumer is enough there.

> **See it in practice with Conduktor**
> Both a plain consumer and a Kafka Streams app show up on the cluster the same way: as a consumer group with lag and committed offsets. [Conduktor Console](https://docs.conduktor.io/guide/manage-kafka/kafka-resources/topics) lets you watch that consumer group's lag, inspect committed offsets, and, for a stateful Streams app, see the extra changelog and repartition topics it creates. That side-by-side view is the quickest way to confirm what your processing is actually doing on Kafka, whichever approach you chose.

## Next steps

- [What is Kafka Streams?](https://www.conduktor.io/kafka-streams): the library model, in depth
- [Build your first Kafka Streams app](https://www.conduktor.io/kafka-streams/getting-started): a runnable WordCount in Java
- [Kafka Streams state stores](https://www.conduktor.io/kafka-streams/state-store): the local state you'd otherwise hand-roll
