Kafka Streams stateless operations

Kafka Streams stateless operations: map, mapValues, filter, flatMap, branch and merge, and the hidden repartition that map and selectKey trigger.

Learn the stateless DSL, and the one that costs you a network round-trip.

Stateless operations are the workhorses of a Kafka Streams topology: transform a record, drop it, split the stream, recombine it. They keep no state, need no store, and survive a restart with nothing to recover. Most of your topology is probably stateless.

But "stateless" doesn't mean "free." One distinction inside this group, whether an operator changes the record's key, decides whether Kafka Streams quietly inserts a repartition topic and a full network round-trip downstream. It's the cheapest performance win most people miss.

What you'll learn:

  • The full stateless DSL: map, filter, flatMap, branch, merge, and friends
  • Which operators touch the key, and why that's the line that matters
  • Why map/selectKey can trigger a repartition while mapValues/filter never do
  • When stateless is enough, and when you've crossed into stateful territory

The stateless toolkit

Every operator here takes a KStream and returns a KStream. No store, no changelog, no co-partitioning rules. They group into a few families.

Transform

// map: change key AND/OR value, returns a new KeyValue
KStream<String, Integer> lengths = lines.map((k, v) -> KeyValue.pair(v.getUserId(), v.length()));

// mapValues: change the value only, key untouched
KStream<String, Integer> sizes = lines.mapValues(v -> v.length());

Filter

// filter: keep records matching the predicate
KStream<String, Order> paid = orders.filter((k, order) -> order.isPaid());

// filterNot: the inverse, keep records that DON'T match
KStream<String, Order> unpaid = orders.filterNot((k, order) -> order.isPaid());

Expand

// flatMap: one record in, zero-or-many out, key and value free to change
KStream<String, String> words = lines.flatMap((k, line) ->
    Arrays.stream(line.split(" ")).map(w -> KeyValue.pair(w, w)).toList());

// flatMapValues: same, but key is preserved
KStream<String, String> tokens = lines.flatMapValues(line -> Arrays.asList(line.split(" ")));

Re-key

// selectKey: set a new key from the existing key/value, value untouched
KStream<String, Click> byUser = clicks.selectKey((k, click) -> click.getUserId());

Split and recombine

// split/branch: route records into named branches by predicate (modern API)
Map<String, KStream<String, Order>> branches = orders.split(Named.as("by-"))
    .branch((k, o) -> o.getAmount() > 1000, Branched.as("large"))
    .branch((k, o) -> o.getAmount() > 0,    Branched.as("small"))
    .defaultBranch(Branched.as("other"));
// branches.get(_4_), branches.get(_5_), ...

// merge: combine two streams of the same key/value type into one
KStream<String, Order> all = largeOrders.merge(smallOrders);

Observe (no transformation)

// peek: side-effect per record (logging, metrics), record passes through unchanged
orders.peek((k, o) -> log.debug("order {}", k));

// foreach: terminal side-effect, consumes the stream, returns nothing
orders.foreach((k, o) -> log.info("processed {}", k));

peek returns the stream so you can keep chaining; foreach is terminal. Neither is a place for a database call or a REST request, a blocking call here stalls the whole stream thread (risking a rebalance if it misses its poll deadline), and any external side effect it performs sits outside Kafka's exactly-once guarantee.

The line that matters: does it touch the key?

Here is the distinction that separates a cheap operator from one that can cost you an extra topic and a network hop.

Kafka Streams partitions data by key. A stateful operation downstream, an aggregation, a join, needs all records for a given key on the same partition. So if an operator changes the key, Kafka Streams can no longer trust that the partitioning still matches the key. It marks the stream for repartition.

Marking is not the same as doing it. The repartition, a write to an internal topic and a read back, is only triggered when a downstream operator actually depends on correct partitioning (a groupByKey, an aggregation, a join). If you re-key and then just .to() an output topic, no repartition topic is created; the mark goes unused.

OperatorChanges the key?Marks for repartition?
mapValues, flatMapValuesNoNo
filter, filterNotNoNo
peek, foreachNoNo
branch/split, mergeNoNo
map, flatMapYes (can)Yes
selectKeyYesYes
groupByYes (re-keys)Yes
The pattern that bites: someone writes map((k, v) -> KeyValue.pair(v.id(), transform(v))) to change the value, and changes the key to v.id() only because map made them return a full KeyValue. The key change marks the stream. Add a count() downstream and Kafka Streams silently materializes a repartition topic, extra partitions on the cluster, an extra produce-and-consume per record, and an extra place for things to lag.

The fix is almost always free: if you only need to change the value, use mapValues (or flatMapValues). The key is preserved, nothing is marked, no repartition can be triggered.

Prefer mapValues over map whenever the key stays the same. It's not a micro-optimization. A needless re-key followed by any stateful op adds a full internal topic and a network round-trip per record. Reach for map/selectKey only when you genuinely need a new key, for a join or an aggregation that groups by something other than the current key.

🚫 "Stateless operators are free, so it doesn't matter which one I use."

Reading whether a repartition will happen

You don't have to guess. topology.describe() shows where a repartition happens, and the build log mentions auto-created repartition topics. When you re-key and feed a stateful operator, the description splits into two sub-topologies joined by an internal topic whose name ends in -repartition (a KSTREAM-SINK writes to it, a KSTREAM-SOURCE reads it back). That name is positional unless you set one, which is also why reordering operators is risky, see evolving a topology.

If a repartition is genuinely required, name it explicitly with Repartitioned.as("...") (or Grouped.as("...") on the grouping step) so the topic name is stable across topology edits. An unnamed repartition topic gets a positional name that shifts the moment you insert an operator above it.

Stateless vs stateful: where the easy half ends

Everything on this page keeps no state. That's why these operators don't need a store, don't create a changelog, don't have co-partitioning requirements, and recover instantly, there's nothing to recover.

The moment you count, reduce, aggregate, join two streams, or window anything, you've crossed into stateful processing: a local state store, a changelog topic, restore-on-restart, and the memory and rebalance behaviour that come with it. That's a different operational world, covered in aggregations.

The boundary between the two is exactly the re-key discussed above. groupBy/groupByKey is where a stateless stream becomes the input to a stateful operation, and where a marked-for-repartition stream finally pays for the repartition. Keeping the stateless part of your topology key-stable means that boundary is the only place you pay, not three operators earlier.

What is the difference between map and mapValues in Kafka Streams?

map can change both the key and the value and returns a full KeyValue; mapValues changes only the value and preserves the key. Because map may change the key, it marks the stream for repartition, while mapValues never does, so prefer mapValues whenever the key stays the same.

Which Kafka Streams operations trigger a repartition?

The operators that can change the key, map, flatMap, selectKey, and groupBy, mark the stream for repartition. Key-preserving operators like mapValues, flatMapValues, filter, peek, branch, and merge never do.

How do I split a stream into multiple branches?

Use split() with branch() to route records into named branches by predicate, optionally ending with defaultBranch(). It returns a Map of branch name to KStream, and you read each branch back by its name; branch/split does not change the key, so it triggers no repartition.

Does selectKey cause a repartition topic to be created?

selectKey marks the stream for repartition because it changes the key, but the repartition topic is only actually created when a downstream operator depends on correct partitioning, a groupByKey, an aggregation, or a join. Re-key and then just .to() an output topic and no repartition topic appears.

What are the stateless operations in the Kafka Streams DSL?

The stateless operators include map/mapValues, filter/filterNot, flatMap/flatMapValues, selectKey, branch/split, merge, and the observe-only peek and foreach. They keep no state, need no store or changelog, have no co-partitioning rules, and recover instantly because there is nothing to restore.

See it in practice with Conduktor

When a re-key triggers a repartition, a real internal topic appears on your cluster. Conduktor Console lets you spot those -repartition topics in the catalog, confirm whether your topology created ones you didn't intend, and watch their lag, a quick way to catch an accidental map that should have been mapValues.

Next steps