# What is Kafka Streams?

*Understand what Kafka Streams is, and what it isn't.*

Kafka Streams is a Java library for processing data that lives in Apache Kafka. You add it as a dependency, write a topology, and your own application becomes the stream processor. There is no separate cluster to deploy, no job to submit, no scheduler to babysit: the processing runs inside your service, next to your business logic.

That single design choice (*a library, not a platform*) explains almost everything about how Kafka Streams behaves in production, for better and for worse. This guide covers both.

**What you'll learn:**
- What Kafka Streams is and how it differs from the plain consumer API
- Why being a library (not a cluster) shapes how you run it
- What you can realistically build with it
- When Kafka Streams is the right tool, and when it isn't

## A library, not a cluster

Most stream processors (Flink, Spark Structured Streaming) are systems you stand up and submit jobs to. Kafka Streams inverts that. It is a `org.apache.kafka:kafka-streams` JAR on your classpath. Your app reads from topics, transforms records, and writes to topics, all through a fluent API:

```java
StreamsBuilder builder = new StreamsBuilder();
builder.<String, String>stream("orders")
    .filter((key, order) -> order.contains("\"status\":\"PAID\""))
    .to("paid-orders");

KafkaStreams streams = new KafkaStreams(builder.build(), props);
streams.start();
```

That is a complete, runnable stream processor, with one catch in `props`: since Kafka 3.0, `default.key.serde` and `default.value.serde` have no default value, so set them (along with `application.id` and `bootstrap.servers`) or the app throws a `StreamsException` at startup:

```java
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "order-filter");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());
```

It scales the same way any Kafka consumer scales: run more instances, and Kafka's [consumer group protocol](https://www.conduktor.io/kafka/kafka-consumer-groups-and-consumer-offsets) spreads the partitions across them. No resource manager, no cluster, no YARN.

The trade-off is ownership. Because Kafka Streams runs inside your process, *you* own its memory, its state on local disk, its restarts, and its rebalances. A Flink operator hands those concerns to a cluster; a Kafka Streams developer carries them.

> *"For 80 to 90 percent of stream-processing use cases, either Kafka Streams or Flink will work. The real question is the deployment model and who operates it."*
> — paraphrasing a recurring theme from Kafka maintainers on the deployment-model trade-off

## More than the consumer API

You could write all of this with a raw [Kafka consumer](https://www.conduktor.io/kafka/kafka-consumers) and producer. People do, and then slowly reinvent Kafka Streams badly. The library gives you, for free, things that are tedious and error-prone to hand-roll:

- **Stateful operations**: aggregations, counts, and joins backed by a local store that survives restarts (covered in [state stores](https://www.conduktor.io/kafka-streams/state-store)).
- **Event-time windowing**: group records by when they *happened*, not when they arrived ([windowing](https://www.conduktor.io/kafka-streams/windowing)).
- **Exactly-once processing**: the read-process-write cycle made atomic with one config flag ([exactly-once](https://www.conduktor.io/kafka-streams/exactly-once)).
- **Automatic scaling and fault tolerance**: partitions, tasks, and state move between instances without you writing rebalance code.

The unit you work with is the **stream** (`KStream`, an unbounded log of events) and the **table** (`KTable`, the latest value per key). Understanding the difference between those two is the single most useful concept in the library. See [KStream vs KTable](https://www.conduktor.io/kafka-streams/kstream-ktable-globalktable).

## What you can build

Kafka Streams fits a specific shape of problem: continuous, per-record processing of Kafka data, by the team that owns the application.

- **Enrichment**: join an event stream against reference data (a `KTable` of users, products, accounts).
- **Real-time aggregations**: counts, sums, and rollups per key and per time window (fraud scoring, usage metering, leaderboards).
- **Materialized views**: turn a changelog into a queryable table you can read directly from your service via [interactive queries](https://www.conduktor.io/kafka-streams/state-store).
- **Event-driven microservices**: services that react to events and emit new ones, with state and ordering handled for you.
- **AI agent memory**: materialize a multi-agent conversation into a queryable context store that LLM agents read in real time ([Kafka Streams for AI agents](https://www.conduktor.io/kafka-streams/ai-agents)).

## When *not* to use Kafka Streams

Being honest about the boundaries saves you a painful migration later:

- **You're not a JVM shop.** Kafka Streams is Java/Scala only. If your team lives in Python or Go, a plain consumer or Flink's Python API is a better fit.
- **You want someone else to operate the state.** Large local state (RocksDB) means slow restores and [rebalances](https://www.conduktor.io/kafka-streams/rebalancing) that can stall migrated tasks for minutes while their state rebuilds. If you don't want to own that, a managed cluster engine moves the burden off your team.
- **Your sources aren't Kafka.** Kafka Streams reads and writes Kafka, full stop. Pulling from a database, a queue, and an HTTP API into one job is Flink's territory.
- **It's a one-line stateless filter.** A single `filter` with no state is sometimes just a [consumer](https://www.conduktor.io/kafka/kafka-consumers) with three lines of code. Don't add a framework for it.

> **Kafka Streams vs Flink vs ksqlDB.** This is the most common question newcomers ask, and most comparisons answer it dishonestly. We wrote a vendor-neutral one: [Kafka Streams vs Flink vs ksqlDB](https://www.conduktor.io/kafka-streams/vs-flink).

## What this guide covers

This is a full course, built around the problems people actually hit, sourced from years of questions on the Confluent forum, Stack Overflow, and conference talks, not just the happy path.

**Foundations:** [architecture](https://www.conduktor.io/kafka-streams/architecture) · [KStream, KTable & GlobalKTable](https://www.conduktor.io/kafka-streams/kstream-ktable-globalktable) · [stateless operations](https://www.conduktor.io/kafka-streams/stateless-operations) · [your first app](https://www.conduktor.io/kafka-streams/getting-started) · [aggregations](https://www.conduktor.io/kafka-streams/aggregations) · [state stores](https://www.conduktor.io/kafka-streams/state-store) · [windowing](https://www.conduktor.io/kafka-streams/windowing) · [joins](https://www.conduktor.io/kafka-streams/joins) · [exactly-once](https://www.conduktor.io/kafka-streams/exactly-once)

**In production (where the bodies are buried):** [slow rebalances](https://www.conduktor.io/kafka-streams/rebalancing) · [state restore time](https://www.conduktor.io/kafka-streams/state-restore) · [RocksDB tuning](https://www.conduktor.io/kafka-streams/rocksdb-tuning) · [why you still see duplicates](https://www.conduktor.io/kafka-streams/exactly-once-duplicates) · [the suppress() trap](https://www.conduktor.io/kafka-streams/suppress-not-emitting) · [joins that drop data](https://www.conduktor.io/kafka-streams/join-troubleshooting) · [serde errors](https://www.conduktor.io/kafka-streams/serdes) · [evolving a topology](https://www.conduktor.io/kafka-streams/topology-evolution) · [dead letter queues](https://www.conduktor.io/kafka-streams/dead-letter-queue) · [deduplication](https://www.conduktor.io/kafka-streams/deduplication) · [scaling](https://www.conduktor.io/kafka-streams/scaling) · [testing](https://www.conduktor.io/kafka-streams/testing)

**What is Kafka Streams used for?**

Kafka Streams is a Java library for continuous, per-record processing of data in Apache Kafka: enrichment, real-time aggregations, materialized views, and event-driven microservices. The processing runs inside your own application, reading from topics and writing to topics, with no separate cluster to deploy.

**Is Kafka Streams a database?**

No. Kafka Streams is a stream-processing library, not a database. It can maintain local state (a `KTable` backed by a state store) and expose it for lookups, but the durable source of truth stays in Kafka topics, not in Kafka Streams.

**What is the difference between Kafka and Kafka Streams?**

Kafka is the distributed log that stores and moves the data; Kafka Streams is a client library that processes that data. They are complementary, not competitors: Kafka Streams reads from and writes to Kafka topics and runs as part of your application.

**When should I use Kafka Streams instead of Kafka Connect?**

Use Kafka Connect to move data between Kafka and external systems, and Kafka Streams to transform data already in Kafka. If your job is "get data in or out", that is Connect; if it is "filter, join, aggregate, or reshape records", that is Streams.

**Does Kafka Streams replace the Kafka consumer and producer APIs?**

It is built on top of them, not a replacement. Kafka Streams gives you stateful operations, event-time windowing, exactly-once, and automatic scaling for free, things that are tedious to hand-roll on the raw consumer and producer. For a one-line stateless filter, a plain consumer can still be simpler.

> **See it in practice with Conduktor**
> A Kafka Streams app is, under the hood, a consumer group plus a set of internal topics. [Conduktor Console](https://docs.conduktor.io/guide/manage-kafka/kafka-resources/topics) lets you watch its consumer group lag, inspect the changelog and repartition topics it creates, and confirm partition assignment, the signals you need when a Streams app misbehaves. One quick fingerprint: a Streams app's group reports its partition assignor as `stream`, which tells it apart from plain consumer groups at a glance.

## Next steps

- [Kafka Streams architecture](https://www.conduktor.io/kafka-streams/architecture): topologies, tasks, and threads
- [KStream vs KTable vs GlobalKTable](https://www.conduktor.io/kafka-streams/kstream-ktable-globalktable): the core mental model
- [Build your first Kafka Streams app](https://www.conduktor.io/kafka-streams/getting-started): a runnable WordCount in Java
