# Kafka Streams vs Flink vs ksqlDB

*Pick the right stream processor, by deployment model, not feature checklist.*

Most "Kafka Streams vs Flink" comparisons hand you a feature grid and call it a decision. That grid is mostly noise: for the large majority of stream-processing problems, *either* tool does the job. The honest differentiator is the one most articles skip, the **deployment model** and the **team** that has to operate it. Get those two right and the feature list rarely matters.

We sell neither Kafka Streams nor Flink, so there's no thumb on the scale here. This is the senior-architect version: the questions that actually decide it, and the ones that only look like they do.

**What you'll learn:**
- Why deployment model and team profile decide this, not features
- A side-by-side of Kafka Streams, Flink, and ksqlDB
- Straight answers to "is Kafka Streams dying?" and "is ksqlDB dead?"
- A decision guide: when each one is the right call

## The question that actually decides it

Kafka Streams is a **library**. You add `org.apache.kafka:kafka-streams` to your app, write a topology, and your service *becomes* the stream processor. No cluster to stand up, no job to submit. It [scales like any consumer group](https://www.conduktor.io/kafka/kafka-consumer-groups-and-consumer-offsets), and its state lives in local stores backed by [changelog topics](https://www.conduktor.io/kafka-streams/state-store). The application team owns all of it, memory, restarts, rebalances.

Flink is a **runtime**. You stand up a cluster (a JobManager and TaskManagers), submit a job to it, and that cluster schedules the work, checkpoints state to durable storage, and recovers on failure. It reads from and writes to anything, Kafka, a database, files, object storage. It scales well past your Kafka partition count. And it usually comes with an operator: a dedicated streaming or platform team that runs the cluster as shared infrastructure.

That single split, *code you ship* versus *infrastructure you operate*, predicts almost everything else. It tells you who gets paged at 3am, how you scale, how you recover, and which team's roadmap the workload lives on. Start there. The feature comparison is a tiebreaker, not the decision.

> **"For 80 to 90 percent of stream-processing use cases, either Kafka Streams or Flink will work."** This is a recurring line from Kafka maintainers, and it holds up in practice. Stateless transforms, windowed aggregations, enrichment joins, event-time handling, both engines do all of it. When both can do the job, you choose on operations and team, not capability.

## Side by side

Read this as a map of *where each one is at home*, not a scoreboard. The last row is the one that matters.

| Dimension | Kafka Streams | Apache Flink | ksqlDB |
|---|---|---|---|
| **Deployment model** | Library embedded in your app | Separate cluster (JobManager + TaskManagers) | A server you run (built on Kafka Streams) |
| **What you deploy** | A JAR, a normal JVM service | A job submitted to a cluster | SQL statements to the ksqlDB server |
| **Languages** | Java (JVM) | Java, Python, SQL (the dedicated Scala API was removed in Flink 2.0) | SQL only |
| **Sources / sinks** | Kafka only, in and out | Any, Kafka, DBs, files, object stores | Kafka only |
| **Scaling ceiling** | ≤ partition count of the input | Far beyond partition count (task parallelism) | ≤ partition count (it *is* Kafka Streams) |
| **State recovery** | Replay the [changelog topic](https://www.conduktor.io/kafka-streams/state-store) | Restore from a [checkpoint](https://flink.apache.org/) in durable storage | Changelog replay (same as Streams) |
| **Operational owner** | The application team | A dedicated streaming / platform team | Whoever runs the ksqlDB server |
| **Batch + stream** | Streaming only | Unified batch and stream | Streaming only |
| **When it wins** | In-app, stateful logic owned by app devs | Heterogeneous sources, huge scale, a platform team, SQL/Python | Fast SQL transforms when you've committed to the ksqlDB server |

A few rows deserve a caveat the grid can't carry.

**Scaling ceiling.** Kafka Streams parallelism tops out at the partition count of the busiest sub-topology, add instances past that and the extras sit idle (or host standby replicas, but only if you set `num.standby.replicas`; the default is 0). Flink decouples parallelism from partitions, so it scales past that ceiling. In practice most workloads never approach the limit, so this only decides the genuinely large jobs. See [scaling Kafka Streams](https://www.conduktor.io/kafka-streams/scaling).

**State recovery.** Both engines are converging here (more below), but today the models differ in *who* holds the state and how a restart behaves. Streams replays a changelog into a local store; Flink restores from a checkpoint it wrote to S3/HDFS/GCS. The Streams model puts restore time on the app team's plate, the thing that turns a [rolling restart into a slow one](https://www.conduktor.io/kafka-streams/state-restore).

**Event time.** A common myth is that Flink "has watermarks and Kafka Streams doesn't," implying weaker event-time support. Both handle event time fully. Kafka Streams deliberately uses *stream time* plus continuous refinement instead of watermark-driven triggers, a different design, not a missing feature.

## What about ksqlDB?

ksqlDB is SQL layered on top of Kafka Streams. You write `SELECT … EMIT CHANGES`, the server compiles it to a Streams topology and runs it. So everything true of Kafka Streams' runtime, Kafka-only I/O, changelog-backed state, partition-bound scaling, is true of ksqlDB too. It trades the Java DSL for SQL and a server you operate.

Be straight about its status: ksqlDB is under **lower active investment** than it once was, frequently described as effectively maintenance-mode, while Confluent steers new SQL stream-processing work toward Flink SQL. That's a real signal if you're starting fresh today. It is *not* a reason to rip out a working ksqlDB deployment, it still runs, and the Kafka Streams engine underneath it is very much alive. New SQL-first project: look hard at Flink SQL before committing to ksqlDB. Existing ksqlDB that works: leave it alone until you have a concrete reason to move.

## "Is Kafka Streams dying? Is ksqlDB dead?"

The loaded questions, answered plainly.

**Kafka Streams is not dying.** It's a core part of Apache Kafka and the default for in-app, stateful processing, fraud scoring, usage metering, materialized views, [event-driven microservices](https://www.conduktor.io/kafka-streams). Active KIPs are still landing (a [Streams-specific rebalance protocol](https://www.conduktor.io/kafka-streams/rebalancing), native dead-letter-queue and error-handling support). The confusion comes from conflating Kafka Streams with ksqlDB. They are not the same project, and Flink's rise does not retire the library, they serve different shapes of problem.

**ksqlDB is not dead, but it is quiet.** See the section above: lower investment, the SQL momentum has moved to Flink. Treat it as stable-but-not-growing.

**Flink and Kafka Streams coexist.** This is the part the "X killed Y" headlines miss. Plenty of organizations run both: Flink for cross-source pipelines and big shared jobs operated by a platform team, Kafka Streams for stateful logic that belongs *inside* a service the app team owns. They're complements far more often than competitors.

> 🚫 *"Flink is the modern one, so we should migrate our Kafka Streams apps to it."*

Migrating a working Kafka Streams microservice to a Flink cluster you don't yet operate trades a problem you've solved for one you haven't. Newer is not the axis. Deployment model and team are.

## They're converging anyway

The sharpest practical difference today, *how state is stored and recovered*, is the one the industry is actively erasing. Both ecosystems are moving state off local disk and onto **object storage**: remote, disaggregated state stores so that restores become near-instant and instances become effectively stateless. On the Kafka Streams side the state-store interface is pluggable, and the remote-state work is being built on top of it by the ecosystem (vendors like Responsive ship object-storage-backed stores) rather than inside Apache Kafka itself. On the Flink side, Flink 2.0 ships disaggregated state through the ForSt backend, which keeps state on object storage but is still marked experimental.

If that convergence lands the way it's trending, "changelog replay vs checkpoint restore" stops being a deciding factor, which pushes even more weight onto deployment model and team. We track the direction in [the future of Kafka Streams](https://www.conduktor.io/kafka-streams/future).

## A decision guide

Choose the tool that fits your *deployment and team*, then confirm it covers your features (it almost always will).

**Choose Kafka Streams when:**
- The logic belongs inside an application your team already owns and deploys.
- Your sources and sinks are Kafka, and you're a JVM shop.
- You want stream processing to scale and deploy like any other microservice, no extra cluster.
- The state is manageable, or you're prepared to own [restore time](https://www.conduktor.io/kafka-streams/state-restore) and [RocksDB memory](https://www.conduktor.io/kafka-streams/state-store).

**Choose Flink when:**
- You're pulling from non-Kafka sources, databases, files, object storage, multiple systems in one job.
- You need to scale past your Kafka partition count, or run large shared jobs as infrastructure.
- You have (or want) a dedicated streaming/platform team to operate a cluster.
- Your authors live in SQL or Python, not just the JVM.
- You want unified batch and streaming in one engine.

**Choose ksqlDB when:**
- You want SQL-defined streaming transforms *and* you've already committed to running the ksqlDB server.
- For a new SQL-first project, weigh Flink SQL first, given where the investment is going.

When two of these fit, default to the one whose operational model matches your team. A library workload forced onto a cluster (or a cross-source pipeline crammed into a library) fights you forever.

**What is the difference between Kafka Streams and Apache Flink?**

Kafka Streams is a library you embed in your application, so your service becomes the stream processor with no cluster to operate, and it reads and writes Kafka only. Flink is a runtime: you stand up a cluster, submit a job, and it reads from any source, scales past your partition count, and is usually run by a dedicated platform team.

**Is Kafka Streams dying?**

No. It is a core part of Apache Kafka and the default for in-app stateful processing, with active KIPs still landing such as a Streams-specific rebalance protocol and native error handling. The "dying" confusion usually comes from conflating Kafka Streams with ksqlDB, which are different projects.

**When should I choose Kafka Streams over Flink?**

Choose Kafka Streams when the logic belongs inside an application your team already owns, your sources and sinks are Kafka, you're a JVM shop, and you want stream processing to deploy and scale like any other microservice. Choose Flink for non-Kafka sources, scale beyond your partition count, SQL/Python authors, or a dedicated platform team.

**Does Kafka Streams scale as large as Flink?**

Kafka Streams parallelism tops out at the partition count of the busiest sub-topology, extra instances past that sit idle unless you configure standby replicas (num.standby.replicas defaults to 0). Flink decouples parallelism from partitions and scales further, but most workloads never approach the limit, so this only decides genuinely large jobs.

**Is ksqlDB dead, and how does it compare?**

ksqlDB is SQL layered on top of Kafka Streams, so it inherits Kafka-only I/O and partition-bound scaling. It is not dead but is under lower active investment and effectively maintenance-mode, with new SQL work steered toward Flink SQL, leave a working deployment alone, but weigh Flink SQL first for a new SQL-first project.

> **See it in practice with Conduktor**
> Whichever engine you pick, it runs on Kafka, and both Kafka Streams and Flink workloads show up there as consumer groups, lag, and topics. [Conduktor Console](https://docs.conduktor.io/guide/manage-kafka/kafka-resources/topics) lets you watch consumer group lag, inspect partition assignment, and see the internal and changelog topics a job creates, so you can tell whether a Streams app or a Flink job is keeping up, independent of which framework owns the processing. One change on the horizon: with KIP-1071 (early access in Kafka 4.1), Streams apps move to a dedicated `streams` group type with its own `kafka-streams-groups.sh` tooling, instead of appearing as plain consumer groups.

## Next steps

- [What is Kafka Streams?](https://www.conduktor.io/kafka-streams), the library model, in depth
- [Kafka Streams state stores](https://www.conduktor.io/kafka-streams/state-store), the recovery model that drives the comparison
- [The future of Kafka Streams](https://www.conduktor.io/kafka-streams/future), remote state and where both engines are headed
