The future of Kafka Streams

Where Kafka Streams is going: remote object-storage state stores, the new rebalance protocols, share groups, and its place in real-time AI systems.

Read where the library is heading, grounded in real work.

Kafka Streams is a mature library, but the ground under it is moving. The themes that dominate recent Current, Kafka Summit, and Flink Forward talks aren't incremental config tweaks, they target the two things that have always made Streams hard to operate: local state that's expensive to restore, and rebalances that stop the world. At the same time, the broader Kafka project is growing new primitives (share groups, queues) that change when you'd reach for a Streams topology at all.

This page is a vendor-neutral read on those directions: what's shipping, what's still proposal-stage, and what each one means for choosing Kafka Streams today. No product pitch, most of this is upstream Apache Kafka and an active open ecosystem.

What you'll learn:

  • Why remote, object-storage-backed state stores are the hottest theme, and what they fix
  • How the new rebalance protocols cut the stop-the-world pain
  • When a share-group consumer fits better than a Streams topology
  • Where Streams sits in real-time ML and agentic-AI architectures

Remote state stores: getting state off local disk

This is the dominant theme, and it attacks the deepest operational pain in Kafka Streams. Today, state lives in RocksDB on each instance's local disk, backed by a changelog topic. That design is fast in steady state but brittle in motion: an instance that starts with no local data must replay its entire changelog before it can process, which can mean minutes, or, for a multi-hundred-million-key store, far longer.

Remote (or disaggregated) state stores invert this. State lives in shared, durable storage, typically object storage like S3, instead of on the instance. The payoff:

  • Restores become near-instant. A new or rebalanced instance reads from the shared store rather than replaying a changelog from offset zero.
  • Instances become effectively stateless. They hold a cache, not the system of record, so they can be killed, scaled, and rescheduled cheaply, which matters on Kubernetes and spot capacity.
  • State size stops bounding the instance. Local disk no longer caps how much state a node can be responsible for.

This is real, not vaporware, because the Kafka Streams state store interface is pluggable, you can supply your own store implementation. Several open and commercial efforts build remote stores on that seam, and the conference circuit has been thick with talks on object-storage-backed and snapshot-capable stores. The trade-off is honest: you swap local-disk latency and changelog-replay restores for network round-trips to object storage and a caching layer to hide them. Whether that wins depends on your restore pain versus your latency budget.

The convergence is the tell that this is a genuine industry direction, not one vendor's bet: stream processors are independently moving state to object storage. The same architectural pressure, make stateful stream processors elastic by getting state off the box, is reshaping the wider space, Kafka Streams included.

What this means for choosing Streams today. Remote state stores are emerging, not yet the default in Apache Kafka. If slow restores and stop-the-world rebalances are your main objection to Kafka Streams right now, evaluate the ecosystem stores on the pluggable interface, but build on the standard local-state model unless you've measured a restore problem you can't tune away with standby replicas and persistent volumes. Don't adopt a remote store speculatively; adopt it against a measured pain.

New rebalance protocols: less stop-the-world

The other historic pain is the rebalance: when group membership changes, work pauses while assignments and state move. Stateful apps feel this hardest, because a reassigned task may have to restore its store before it can resume.

Two upstream protocol changes are reshaping this:

ChangeStatusWhat it does
KIP-848, next-gen consumer rebalance protocolGA in Apache Kafka 4.0Moves assignment logic broker-side and makes rebalancing incremental, cutting the synchronization barrier that caused long, lock-step pauses across the whole group
KIP-1071, Streams-specific rebalance protocolGA in Kafka 4.2, but opt-in and a limited subset (sticky assignor only, offline migration, no static membership or topology updates)Extends the broker-driven model to Kafka Streams' task assignment, so Streams gets the same reduced-pause behavior tuned for its tasks, standbys, and warm-up
The direction is consistent: less client-side coordination, fewer global stop-the-world pauses, smoother scaling. Combined with remote state (no restore to wait on) and existing tools like warm-up replicas and standbys, the long-term trajectory is rebalances that are an operational non-event rather than a recurring incident.

Version-gate any plan on this: KIP-848 is the one you can rely on today; KIP-1071 reached GA in 4.2 but is still opt-in and feature-limited, with the default switch not expected before Kafka 5.0. One concrete caveat: migrating an existing classic group to the streams protocol is not recommended in 4.2.0 due to a broker-side bug in the offline-migration path (KAFKA-20254); fresh streams groups are unaffected. Pilot it, but most apps still run the classic protocol, check the Streams rebalance protocol docs for the version you actually run.

Queues and share groups: a different tool

Not every workload that touches Kafka should be a Streams topology. KIP-932 introduces share groups, a consumer model that lets many consumers cooperatively process records from the same partitions without the strict one-consumer-per-partition limit, with per-record acknowledgement. In effect, it brings classic queue semantics to Kafka.

That changes the decision tree. Reach for a share-group consumer when your problem is competing-consumer work distribution, a pool of workers draining a task queue, where you want parallelism beyond the partition count and per-message ack/redelivery, and you don't need stateful stream processing. Reach for a Streams topology when your problem is stateful, ordered, per-key stream processing, aggregations, joins, windowing, materialized views. Share groups are about throughput and work-spreading on a queue; Streams is about transforming and enriching keyed streams with managed state. They solve different problems, and having both in Kafka means fewer cases where you bend Streams into a job it was never the right shape for.

Share groups went GA in Kafka 4.2, after early access in 4.0 and a preview in 4.1 that was explicitly not production-ready. They're still evolving fast (new ack types, adaptive batching, lag metrics all landed in 4.2), so check the release notes for the maturity level in your version before building on them.

Kafka Streams in real-time ML and agentic AI

Real-time AI architectures lean heavily on streaming, and most reference demos today are Flink-centric, Flink SQL feeding features, vector stores, and agent loops. That's a real pattern, and it's fair to say Flink currently owns the spotlight here.

Streams' role is narrower and concrete: stateful enrichment and real-time feature computation inside JVM applications. If your service is already a Java application that owns its data and reacts to events, computing features or enriching events for a model with a Streams topology keeps that logic next to your business code, no separate cluster, no extra deployment surface. The honest framing:

  • Streams is a good fit for real-time features and enrichment embedded in a JVM service that's already part of your event-driven system.
  • It is not the tool for heavy SQL analytics across many heterogeneous sources, or for a dedicated streaming-ML platform a separate team operates, that's Flink's territory.
  • The "agent" layer (orchestration, model calls) typically lives elsewhere; Streams contributes the stateful, low-latency data plane underneath it.

The most concrete fit in a multi-agent system is a context store: a Streams app materializes the conversation into a queryable store that agents read over interactive queries, keeping their memory in the same data plane, no external database. That pattern has its own page: Kafka Streams for AI agents.

No hype here: Streams isn't becoming an AI framework. It stays what it is, a library for stateful stream processing, and that's exactly the slice of an AI architecture where it earns its place.

State stores as a system of record

A quieter but striking direction: using Kafka Streams' state stores not as a derived cache but as the authoritative store for a domain. Because a store is durable (via its changelog), partitioned, and queryable through interactive queries, some teams build systems where the store is the source of truth, workflow engines, calculation engines, and orchestrators whose entire state lives in Streams rather than an external database.

It's a legitimate pattern with real production deployments behind it, and it's a useful lens on what the library actually is: a way to keep durable, queryable, event-sourced state co-located with the code that mutates it. It also raises the operational bar, your state store is now load-bearing in a way a disposable cache never is, which makes the restore-time and rebalance themes above even more important. It's not the common case, but it shows where the model can go when you take its durability guarantees seriously.

What is the future of Kafka Streams?

The dominant directions are remote, object-storage-backed state stores that make restores near-instant and instances effectively stateless, plus new rebalance protocols (KIP-848, KIP-1071) that cut stop-the-world pauses. Alongside that, new Kafka primitives like share groups change when you'd reach for a Streams topology at all.

What is a remote or disaggregated state store?

It moves state off each instance's local disk into shared durable storage, typically object storage like S3, so a new or rebalanced instance reads from the shared store instead of replaying a changelog from offset zero. The trade-off is honest: you swap local-disk latency and changelog-replay restores for network round-trips and a caching layer to hide them.

Are remote state stores production ready?

They are emerging, not the default in Apache Kafka. The state store interface is pluggable, so several open and commercial efforts build on that seam, but you should build on the standard local-state model unless you've measured a restore problem you can't tune away with standby replicas and persistent volumes.

What is the KIP-1071 Streams rebalance protocol and which version has it?

It extends the broker-driven assignment model to Kafka Streams' tasks for reduced-pause rebalancing. It reached GA in Kafka 4.2 but is opt-in and a limited subset (sticky assignor only, offline migration, no static membership or topology updates), with the default switch not expected before Kafka 5.0, so pilot it, but most apps still run the classic protocol.

When should I use a share group (KIP-932) instead of a Streams topology?

Reach for a share-group consumer when your problem is competing-consumer work distribution, a pool of workers draining a task queue with per-record acknowledgement, where you want parallelism beyond the partition count and don't need stateful processing. Use a Streams topology for stateful, ordered, per-key processing: aggregations, joins, windowing, and materialized views.

See it in practice with Conduktor

Whatever direction state stores take, a Kafka Streams app remains a consumer group writing to changelog and repartition topics. Conduktor Console gives you a stable vantage point across these shifts, watch consumer group lag, inspect the internal topics, and confirm partition assignment regardless of which rebalance protocol or store implementation runs underneath.

Next steps