# Kafka Streams vs ksqlDB

*Understand the real difference, it's not what "vs" suggests.*

The framing "Kafka Streams vs ksqlDB" is misleading before you read a single comparison, because at the engine level they aren't rivals. ksqlDB is built **on top of** Kafka Streams. Every ksqlDB query compiles down to a Kafka Streams topology and runs on the Streams runtime. So the honest question isn't "which engine is faster or more capable", it's "do I want to write SQL against a server someone operates, or embed a Java library in an app my team owns?"

We sell neither, so there's no angle here. This is the senior-architect version: what the two things actually are, the status reality of ksqlDB that the reputational queries are really asking about, and when each is still the right call.

**What you'll learn:**
- Why ksqlDB *is* Kafka Streams, with SQL and a server bolted on
- The real axis: SQL-on-a-server vs library-in-your-app
- The honest status of ksqlDB, and where Confluent is steering SQL work
- When to pick ksqlDB, when Kafka Streams, when Flink SQL

## ksqlDB is built on Kafka Streams

Start with the fact that dissolves the "vs". You write a statement like this against a ksqlDB server:

```sql
CREATE TABLE orders_per_customer AS
  SELECT customer_id, COUNT(*) AS order_count
  FROM orders
  GROUP BY customer_id
  EMIT CHANGES;
```

ksqlDB parses that, plans it, and **compiles it into a Kafka Streams topology**, the same `groupByKey().count()` shape you'd write by hand in the [Kafka Streams DSL](https://www.conduktor.io/kafka-streams), then runs it on the embedded Streams runtime inside the ksqlDB server. The result table is materialized in a [state store](https://www.conduktor.io/kafka-streams/state-store) backed by a compacted changelog topic named `&lt;application.id&gt;-&lt;store-name&gt;-changelog`, exactly as a hand-written Streams app would do it.

That inheritance is the whole story. Everything true of the Kafka Streams runtime is true of ksqlDB underneath:

- **Kafka-only I/O.** The compiled queries read and write Kafka topics only. Not databases, not files. ksqlDB the product can also manage Kafka Connect connectors (it can even run an embedded Connect worker inside the server) to bridge databases, but that data still transits through Kafka topics, unlike Flink, which reads external systems directly.
- **Changelog-backed state.** The same local store plus [changelog](https://www.conduktor.io/kafka-streams/state-store) recovery model, and the same [restore time](https://www.conduktor.io/kafka-streams/state-restore) when a server restarts with a large table.
- **Partition-bound scaling.** A ksqlDB query [scales like a consumer group](https://www.conduktor.io/kafka/kafka-consumer-groups-and-consumer-offsets), capped at the partition count of its input, [exactly as Kafka Streams does](https://www.conduktor.io/kafka-streams/scaling).
- **The same rebalance behavior.** Add or remove a ksqlDB server and the underlying Streams [rebalance](https://www.conduktor.io/kafka-streams/rebalancing) reassigns work.

> **ksqlDB is not a separate stream-processing engine.** It is a SQL interface and a server runtime sitting on the Kafka Streams library. When you tune a ksqlDB query's performance, you are tuning Kafka Streams, RocksDB memory, changelog topics, partition count, the lot. The skills transfer directly.

## The real axis: SQL on a server vs a library in your app

If the engine is the same, what actually differs? Two things, and they're the whole decision.

**ksqlDB is SQL plus a server you operate.** You define streams and tables in SQL, submit them to a ksqlDB server cluster, and that cluster runs the compiled topologies as long-lived persistent queries. You operate the ksqlDB servers as standalone infrastructure, separate from the apps that produce and consume the data. The people writing the logic can be analysts or data engineers who know SQL and never touch the JVM.

**Kafka Streams is a library you embed.** You add the `org.apache.kafka:kafka-streams` JAR to an application your team already builds and deploys, write a topology in Java, and your service *becomes* the processor. No extra cluster. The logic lives inside the app, owned by the app team, deployed on their pipeline.

That split decides who writes the logic, who operates it, and how much custom behavior you can reach for. SQL is faster to write and read for the shapes it covers, filters, joins, windowed aggregations. The Java library gives you arbitrary code: call into a library, branch on complex logic, use the [Processor API](https://www.conduktor.io/kafka-streams/processor-api) and punctuators, implement a custom [deduplication](https://www.conduktor.io/kafka-streams/deduplication) processor or a [dead-letter-queue](https://www.conduktor.io/kafka-streams/dead-letter-queue) strategy that SQL can't express.

## Side by side

Read this as a map of where each is at home. The last two rows are the ones that usually decide it.

| Dimension | ksqlDB | Kafka Streams |
|---|---|---|
| **What it is** | SQL layer + server, built on Kafka Streams | A JVM library you embed in your app |
| **Engine underneath** | Kafka Streams (compiled topologies) | Kafka Streams (you write it directly) |
| **Language** | SQL only | Java (JVM; an official Scala DSL ships too) |
| **Deployment** | A ksqlDB server cluster you run | A JAR inside a normal service, no extra cluster |
| **Custom logic** | What SQL + UDFs express | Arbitrary code: Processor API, punctuators, any library |
| **Sources / sinks** | Kafka only for queries; can manage Connect connectors | Kafka only |
| **State & scaling** | Changelog-backed, ≤ partition count | Changelog-backed, ≤ partition count (identical) |
| **Operational owner** | Whoever runs the ksqlDB servers | The application team |
| **Momentum** | Low investment; SQL focus moved to Flink SQL | Core part of Apache Kafka, actively developed |

The first eight rows describe a genuine tradeoff: SQL ergonomics and an analyst-friendly server, against full control inside an app you already own. The last row is different in kind, it's about where each project is *going*, not what it does. That deserves its own section, because it's what the "is ksqlDB dead" queries are really after.

## The status reality of ksqlDB

Be factual about this, because it's a real and recurring question, and dodging it helps no one.

ksqlDB is under **substantially lower active investment** than it once was. It is frequently described as effectively maintenance-mode: it still works, it still ships, but the SQL-for-stream-processing momentum inside Confluent, the company that builds it, has moved to **Flink SQL**. New streaming-SQL features and the roadmap energy land there now. That's not snark; it's where the commits and the product positioning point. Two concrete signals: ksqlDB now versions in lockstep with Confluent Platform releases instead of its old standalone 0.x scheme, and Confluent Cloud does not offer fully managed ksqlDB on Enterprise clusters.

What that means in practice, without overstating it:

- **It is not abandoned and not removed.** ksqlDB still runs in production at many organizations. The Kafka Streams engine underneath it is very much alive and actively developed, so the runtime isn't going stale even where the SQL layer is quiet.
- **It is a real signal for *new* projects.** If you're choosing a streaming-SQL tool today with a multi-year horizon, building fresh on a low-investment layer is a risk you should weigh deliberately, look hard at Flink SQL first.
- **It is not a reason to rip out a working deployment.** A ksqlDB cluster that does its job doesn't suddenly stop working because the roadmap cooled. Migrate when you have a concrete reason, a feature you need, a consolidation onto Flink, not because of a vibe.

> 🚫 *"ksqlDB is maintenance-mode, so we should rewrite our queries in Kafka Streams to be safe."*

Rewriting working ksqlDB SQL into hand-maintained Java topologies trades a quiet-but-functional layer for a pile of code your team now owns forever, and you land on the *same engine* you were already running. If you're going to spend a migration, spend it moving toward where the investment is (Flink SQL) only if you actually need cross-source pipelines or its roadmap; otherwise leave the working queries alone.

## When each is the right call

Pick by who writes and operates the logic, then sanity-check the capability (SQL covers more than people expect, but not everything).

**Choose ksqlDB when:**
- Your authors live in SQL, analysts, data engineers, and the work is filters, joins, and windowed aggregations.
- You've already committed to running a ksqlDB server cluster, or have one and it works.
- You want quick, declarative streaming pipelines without standing up app code or a separate engine.

**Choose Kafka Streams when:**
- The logic belongs inside an application your team owns and deploys, with no extra server to operate.
- You need custom code SQL can't express, arbitrary branching, library calls, the [Processor API](https://www.conduktor.io/kafka-streams/processor-api), bespoke error handling.
- You're a JVM shop and want stream processing to deploy and [scale like any microservice](https://www.conduktor.io/kafka-streams/scaling).

**Look at Flink SQL when:**
- You want SQL streaming *and* you're starting fresh today, it's where the streaming-SQL investment is going.
- You need non-Kafka sources, scale past your partition count, or a platform team operating a shared cluster. See [Kafka Streams vs Flink](https://www.conduktor.io/kafka-streams/vs-flink).

The trap is treating this as ksqlDB *versus* Kafka Streams when they share an engine. The genuine fork is SQL-on-a-server vs library-in-an-app, and, for new SQL work, whether Flink SQL is the better bet than either.

**Is ksqlDB just Kafka Streams under the hood?**

Essentially, yes. ksqlDB compiles each SQL statement into a Kafka Streams topology and runs it on the Kafka Streams runtime embedded in the ksqlDB server. It adds a SQL language and a server you operate, but the processing engine, the changelog-backed state, and the partition-bound scaling are all Kafka Streams.

**Is ksqlDB deprecated or in maintenance mode?**

It is not formally deprecated and still ships and runs, but it is under much lower active investment, and is often described as effectively in maintenance mode. Confluent has moved its streaming-SQL focus to Flink SQL. Treat it as stable-but-quiet: fine to keep running, worth weighing carefully before betting a new project on it.

**What is the difference between ksqlDB, Kafka Streams, and Flink SQL?**

Kafka Streams is a JVM library you embed in your app. ksqlDB is SQL plus a server, built on top of Kafka Streams (Kafka-only, partition-bound). Flink SQL is SQL on Apache Flink, a separate cluster engine that reads many sources and scales past partition count, and is where new streaming-SQL investment is going.

**Can I use SQL with Kafka Streams directly?**

Not within the Kafka Streams library itself, its API is the Java DSL and Processor API, not SQL. To write SQL over the Kafka Streams engine you use ksqlDB, which compiles SQL to Streams topologies. For SQL on a different engine that reads beyond Kafka, use Flink SQL.

**Should I migrate existing ksqlDB queries to Kafka Streams?**

Usually not just because ksqlDB is quiet. You'd land on the same engine, having taken on Java code to maintain. Migrate only for a concrete reason, custom logic SQL can't express, or consolidating onto Flink SQL for cross-source pipelines. A working ksqlDB deployment doesn't need rescuing.

> **See it in practice with Conduktor**
> Whether you run ksqlDB queries or hand-written Kafka Streams apps, both surface on Kafka as consumer groups, changelog topics, and lag, because both *are* Kafka Streams underneath. [Conduktor Console](https://docs.conduktor.io/guide/manage-kafka/kafka-resources/topics) lets you watch consumer group lag, inspect the internal and changelog topics a query or app creates, and confirm partition assignment, so you can tell whether the processing is keeping up, independent of whether it was written in SQL or Java.

## Next steps

- [Kafka Streams vs Flink](https://www.conduktor.io/kafka-streams/vs-flink), including where Flink SQL fits for new SQL work
- [What is Kafka Streams?](https://www.conduktor.io/kafka-streams), the library that ksqlDB is built on
- [The future of Kafka Streams](https://www.conduktor.io/kafka-streams/future), where the engine under both is headed
