# Kafka Streams stateless operations

*Learn the stateless DSL, and the one that costs you a network round-trip.*

Stateless operations are the workhorses of a Kafka Streams topology: transform a record, drop it, split the stream, recombine it. They keep no state, need no store, and survive a restart with nothing to recover. Most of your topology is probably stateless.

But "stateless" doesn't mean "free." One distinction inside this group, whether an operator changes the record's *key*, decides whether Kafka Streams quietly inserts a repartition topic and a full network round-trip downstream. It's the cheapest performance win most people miss.

**What you'll learn:**
- The full stateless DSL: map, filter, flatMap, branch, merge, and friends
- Which operators touch the key, and why that's the line that matters
- Why `map`/`selectKey` can trigger a repartition while `mapValues`/`filter` never do
- When stateless is enough, and when you've crossed into [stateful](https://www.conduktor.io/kafka-streams/aggregations) territory

## The stateless toolkit

Every operator here takes a `KStream` and returns a `KStream`. No store, no changelog, no co-partitioning rules. They group into a few families.

**Transform**

```java
// map: change key AND/OR value, returns a new KeyValue
KStream<String, Integer> lengths = lines.map((k, v) -> KeyValue.pair(v.getUserId(), v.length()));

// mapValues: change the value only, key untouched
KStream<String, Integer> sizes = lines.mapValues(v -> v.length());
```

**Filter**

```java
// filter: keep records matching the predicate
KStream<String, Order> paid = orders.filter((k, order) -> order.isPaid());

// filterNot: the inverse, keep records that DON'T match
KStream<String, Order> unpaid = orders.filterNot((k, order) -> order.isPaid());
```

**Expand**

```java
// flatMap: one record in, zero-or-many out, key and value free to change
KStream<String, String> words = lines.flatMap((k, line) ->
    Arrays.stream(line.split(" ")).map(w -> KeyValue.pair(w, w)).toList());

// flatMapValues: same, but key is preserved
KStream<String, String> tokens = lines.flatMapValues(line -> Arrays.asList(line.split(" ")));
```

**Re-key**

```java
// selectKey: set a new key from the existing key/value, value untouched
KStream<String, Click> byUser = clicks.selectKey((k, click) -> click.getUserId());
```

**Split and recombine**

```java
// split/branch: route records into named branches by predicate (modern API)
Map<String, KStream<String, Order>> branches = orders.split(Named.as("by-"))
    .branch((k, o) -> o.getAmount() > 1000, Branched.as("large"))
    .branch((k, o) -> o.getAmount() > 0,    Branched.as("small"))
    .defaultBranch(Branched.as("other"));
// branches.get("by-large"), branches.get("by-small"), ...

// merge: combine two streams of the same key/value type into one
KStream<String, Order> all = largeOrders.merge(smallOrders);
```

**Observe (no transformation)**

```java
// peek: side-effect per record (logging, metrics), record passes through unchanged
orders.peek((k, o) -> log.debug("order {}", k));

// foreach: terminal side-effect, consumes the stream, returns nothing
orders.foreach((k, o) -> log.info("processed {}", k));
```

`peek` returns the stream so you can keep chaining; `foreach` is terminal. Neither is a place for a database call or a REST request, a blocking call here stalls the whole stream thread (risking a [rebalance](https://www.conduktor.io/kafka-streams/rebalancing) if it misses its poll deadline), and any external side effect it performs sits outside Kafka's [exactly-once](https://www.conduktor.io/kafka-streams/exactly-once) guarantee.

## The line that matters: does it touch the key?

Here is the distinction that separates a cheap operator from one that can cost you an extra topic and a network hop.

Kafka Streams partitions data by key. A stateful operation downstream, an aggregation, a join, needs all records for a given key on the *same* partition. So if an operator **changes the key**, Kafka Streams can no longer trust that the partitioning still matches the key. It **marks the stream for repartition**.

Marking is not the same as doing it. The repartition, a write to an internal topic and a read back, is only **triggered** when a downstream operator actually depends on correct partitioning (a `groupByKey`, an aggregation, a join). If you re-key and then just `.to()` an output topic, no repartition topic is created; the mark goes unused.

| Operator | Changes the key? | Marks for repartition? |
|---|---|---|
| `mapValues`, `flatMapValues` | No | No |
| `filter`, `filterNot` | No | No |
| `peek`, `foreach` | No | No |
| `branch`/`split`, `merge` | No | No |
| `map`, `flatMap` | Yes (can) | Yes |
| `selectKey` | Yes | Yes |
| `groupBy` | Yes (re-keys) | Yes |

The pattern that bites: someone writes `map((k, v) -> KeyValue.pair(v.id(), transform(v)))` to change the value, and changes the key to `v.id()` only because `map` made them return a full `KeyValue`. The key change marks the stream. Add a `count()` downstream and Kafka Streams silently materializes a repartition topic, extra partitions on the cluster, an extra produce-and-consume per record, and an extra place for things to lag.

The fix is almost always free: if you only need to change the value, use `mapValues` (or `flatMapValues`). The key is preserved, nothing is marked, no repartition can be triggered.

> **Prefer `mapValues` over `map` whenever the key stays the same.** It's not a micro-optimization. A needless re-key followed by any stateful op adds a full internal topic and a network round-trip per record. Reach for `map`/`selectKey` only when you genuinely need a new key, for a join or an aggregation that groups by something other than the current key.

> 🚫 *"Stateless operators are free, so it doesn't matter which one I use."*

## Reading whether a repartition will happen

You don't have to guess. `topology.describe()` shows where a repartition happens, and the build log mentions auto-created repartition topics. When you re-key and feed a stateful operator, the description splits into two sub-topologies joined by an internal topic whose name ends in `-repartition` (a `KSTREAM-SINK` writes to it, a `KSTREAM-SOURCE` reads it back). That name is positional unless you set one, which is also why reordering operators is risky, see [evolving a topology](https://www.conduktor.io/kafka-streams/topology-evolution).

If a repartition is genuinely required, name it explicitly with `Repartitioned.as("...")` (or `Grouped.as("...")` on the grouping step) so the topic name is stable across topology edits. An unnamed repartition topic gets a positional name that shifts the moment you insert an operator above it.

## Stateless vs stateful: where the easy half ends

Everything on this page keeps no state. That's why these operators don't need a store, don't create a changelog, don't have co-partitioning requirements, and recover instantly, there's nothing to recover.

The moment you `count`, `reduce`, `aggregate`, join two streams, or window anything, you've crossed into stateful processing: a local [state store](https://www.conduktor.io/kafka-streams/state-store), a changelog topic, restore-on-restart, and the memory and rebalance behaviour that come with it. That's a different operational world, covered in [aggregations](https://www.conduktor.io/kafka-streams/aggregations).

The boundary between the two is exactly the re-key discussed above. `groupBy`/`groupByKey` is where a stateless stream becomes the input to a stateful operation, and where a marked-for-repartition stream finally pays for the repartition. Keeping the stateless part of your topology key-stable means that boundary is the *only* place you pay, not three operators earlier.

**What is the difference between map and mapValues in Kafka Streams?**

`map` can change both the key and the value and returns a full `KeyValue`; `mapValues` changes only the value and preserves the key. Because `map` may change the key, it marks the stream for repartition, while `mapValues` never does, so prefer `mapValues` whenever the key stays the same.

**Which Kafka Streams operations trigger a repartition?**

The operators that can change the key, `map`, `flatMap`, `selectKey`, and `groupBy`, mark the stream for repartition. Key-preserving operators like `mapValues`, `flatMapValues`, `filter`, `peek`, `branch`, and `merge` never do.

**How do I split a stream into multiple branches?**

Use `split()` with `branch()` to route records into named branches by predicate, optionally ending with `defaultBranch()`. It returns a `Map` of branch name to `KStream`, and you read each branch back by its name; `branch`/`split` does not change the key, so it triggers no repartition.

**Does selectKey cause a repartition topic to be created?**

`selectKey` marks the stream for repartition because it changes the key, but the repartition topic is only actually created when a downstream operator depends on correct partitioning, a `groupByKey`, an aggregation, or a join. Re-key and then just `.to()` an output topic and no repartition topic appears.

**What are the stateless operations in the Kafka Streams DSL?**

The stateless operators include `map`/`mapValues`, `filter`/`filterNot`, `flatMap`/`flatMapValues`, `selectKey`, `branch`/`split`, `merge`, and the observe-only `peek` and `foreach`. They keep no state, need no store or changelog, have no co-partitioning rules, and recover instantly because there is nothing to restore.

> **See it in practice with Conduktor**
> When a re-key triggers a repartition, a real internal topic appears on your cluster. [Conduktor Console](https://docs.conduktor.io/guide/manage-kafka/kafka-resources/topics) lets you spot those `-repartition` topics in the catalog, confirm whether your topology created ones you didn't intend, and watch their lag, a quick way to catch an accidental `map` that should have been `mapValues`.

## Next steps

- [Kafka Streams aggregations](https://www.conduktor.io/kafka-streams/aggregations), where stateless ends and stateful begins
- [KStream, KTable & GlobalKTable](https://www.conduktor.io/kafka-streams/kstream-ktable-globalktable), the streams and tables these operators act on
- [Build your first Kafka Streams app](https://www.conduktor.io/kafka-streams/getting-started), a runnable topology in Java
