KStream vs KTable vs GlobalKTable

Q: What is the difference between a KStream and a KTable?

A `KStream` is an append-only log of events where each record is an independent fact, so two records with the same key are two separate events. A `KTable` is a changelog where each new record for a key is an upsert, so only the latest value per key is kept.

Q: When should I use a KTable vs a GlobalKTable?

Use a `KTable` for partitioned entity data that scales with your partitions; use a `GlobalKTable` only when the lookup table is small and you want to skip co-partitioning, because it replicates the entire table to every instance. When unsure, default to `KTable` and upgrade to `GlobalKTable` deliberately.

Q: Why does a GlobalKTable use so much memory?

Because a `GlobalKTable` reads all partitions of its source topic into every instance of your app, so a 30 GB table is 30 GB on each instance, plus the startup time to load it before processing begins. For large lookup data, a partitioned `KTable` join or an external store is usually the better call.

KStream vs KTable vs GlobalKTable: events vs changelog vs fully-replicated lookup, tombstone semantics, and which one to actually use in Kafka Streams.

By Stéphane Derosiaux · July 23, 2026

Get the core Kafka Streams mental model right.

Almost every confusing thing in Kafka Streams traces back to one question: is this data a stream of events, or a table of current state? The library gives you three abstractions for that, KStream, KTable, and GlobalKTable, and picking the wrong one is how you end up with joins that drop data, aggregations that double-count, and the occasional NullPointerException you can't explain.

This is the single most useful concept in the library. Get it right and the rest of the DSL stops feeling arbitrary.

What you'll learn:

KStream vs KTable: events versus a changelog, and why that distinction is everything
What a tombstone is, and the NullPointerException it sets you up for
When GlobalKTable earns its memory cost, and when it doesn't
A concrete rule for which abstraction to reach for

KStream: an append-only log of events

A KStream is an unbounded, append-only sequence of records. Every record is an independent fact that happened: a click, a payment, a sensor reading. Two records with the same key are two distinct events, not an update, if a customer places three orders, that's three records, and all three matter.

KStream<String, String> orders = builder.stream("orders");
// key=_1_ appears 3 times → 3 separate order events

This is the right model for anything event-shaped: transactions, page views, log lines, IoT readings. You filter them, transform them, route them, and aggregate them, but you never "overwrite" one. The log is the truth, and the order of records carries meaning.

KTable: a changelog, where the latest value wins

A KTable is the opposite framing. It models a table where each key has exactly one current value, and the underlying topic is read as a changelog: each new record for a key is an update (an upsert) that replaces the previous value. Three records for customer-42 don't accumulate, the third one wins, and the first two are history.

KTable<String, String> profiles = builder.table("user-profiles");
// key=_1_ appears 3 times → table holds only the latest value

This is the right model for entity and reference data: a user's current profile, an account balance, a product's current price, a device's last-known status. You're not interested in every change as an event, you want the current value per key, and a KTable maintains exactly that, backed by a local state store.

The tombstone: a null value means delete

Here is the part the 101 courses gloss over, and it bites people in production. In a changelog, you need a way to say "this key no longer exists." That's a tombstone: a record with a non-null key and a null value. When a KTable sees one, it removes that key from the table.

Tombstones have semantics that will surprise you the first time:

🚫 "mapValues and filter run on every record in my KTable, so I'm safe to dereference the value."

They don't. On a KTable, filter and mapValues skip your function for tombstones and forward the null straight through, the delete has to propagate downstream, so the table can't just drop it. Your lambda never sees it. The trap springs when you call toStream():

profiles
    .mapValues(profile -> profile.toUpperCase()) // skipped for tombstones, null passes through
    .toStream()
    .foreach((key, value) -> process(value.length())); // NullPointerException on a deleted key

toStream() faithfully surfaces those tombstones as records with null values, and your downstream KStream code, which assumed mapValues had already run, dereferences null. The fix is to filter nulls explicitly after toStream() when you don't care about deletes:

profiles
    .toStream()
    .filter((key, value) -> value != null) // drop tombstones before touching the value
    .foreach((key, value) -> process(value.length()));

If you do care about deletes (propagating them to a downstream system, for instance), keep the tombstone and handle null deliberately. Either way, the rule is: on a stream derived from a table, assume nulls are real and decide what to do with them.

Stream-table duality

KStream and KTable aren't separate worlds, they're two views of the same data, and you convert between them freely. This is stream-table duality, and it's the idea the whole DSL rests on.

A table is a stream you've folded up. Replay a changelog from the beginning, keeping only the latest value per key, and you've reconstructed the table. That's literally how a KTable is restored after a crash, by replaying its changelog topic.
A stream is the changes a table emitted. Call toStream() on a KTable and you get the sequence of updates (including tombstones) that produced it.

Concretely: you groupBy + aggregate a KStream and the result is a KTable (a running view, see aggregations); you toStream() that KTable to publish the changes back out. You move across the boundary constantly, often without thinking about it.

GlobalKTable: replicated to every instance

KTable shards keys across 3 instances holding partition subsets; GlobalKTable copies all partitions to every instance with no co-partitioning required

A KTable is partitioned, each instance of your app holds only the keys for the partitions it owns. That's efficient, but it constrains joins: to join a KStream against a KTable, both sides must be co-partitioned (same partition count, same partitioning, so matching keys land on the same task). And don't expect Kafka Streams to catch a violation for you: the famous Topics not co-partitioned error only fires when the topology contains an internal repartition topic. Without one, a mismatched join starts fine, reaches RUNNING, and silently misses matches (verified on 3.9 and 4.3).

A GlobalKTable removes that constraint by making a different trade. It reads all partitions of its source topic into every instance of your app. Every instance holds a full, complete copy of the table.

GlobalKTable<String, String> countries = builder.globalTable("country-reference");

That full replication buys you two things:

No co-partitioning required. Because every instance has every key, a KStream-GlobalKTable join can look up any key locally. The stream side doesn't need to match the table's partition count or be re-keyed first. This is the usual reason people reach for it.
Bootstrapped before processing, not time-synchronized. A GlobalKTable is loaded fully on startup. The join is a pure point-in-time lookup against current state, it does not try to align record timestamps the way a KStream-KTable join does (best-effort ordering by default; an exact temporal lookup only with versioned state stores, KIP-914). That's a feature for stable reference data, a footgun if you expected event-time correctness.

The cost is the part people underestimate. Every instance stores the entire table. A small lookup table (a few thousand currency codes, country names, feature flags) is free. A large one is not:

A 30 GB GlobalKTable is 30 GB on every instance. Run ten instances and you're holding ten full copies, memory and disk on each, plus the startup time to load all partitions before the app can process anything. Engineers regularly ask whether they can put a multi-gigabyte table in a GlobalKTable; the honest answer is that you can, but you're paying for it N times over and slowing every restart. For large lookup data, a partitioned KTable join (with proper co-partitioning) or an external store is usually the better call.

One more limitation: you can't re-key a GlobalKTable. If you need to join on something other than its key, you supply a KeyValueMapper at join time that derives the join key from the stream's record, the table itself stays keyed as-is.

Which one do I actually use?

This is the question that sends people to the forums. A blunt rule covers most cases:

Your data is…	Use	Why
Events, each record is an independent fact	KStream	Order matters, nothing is overwritten
Entity / reference data, partitioned, large	KTable	Latest value per key, scales with partitions
Reference / lookup data, small, joined by any key	GlobalKTable	Full copy everywhere, no co-partitioning, fast local lookup

Sharper heuristics:

"Did it happen" → KStream. "What is it now" → KTable. Payments happen; an account balance is. If you'd ever want to replay each occurrence, it's a stream.
Reach for GlobalKTable only when the table is small and you want to skip co-partitioning. Currency rates, country codes, feature flags, small product catalogs, classic fits. The moment the table is large or grows unbounded, step back to a KTable.
When unsure between KTable and GlobalKTable, default to KTable. It scales with your partitions instead of replicating in full. You upgrade to GlobalKTable deliberately, to dodge a co-partitioning problem, not as a default.

The "which do I use for X" confusion almost always dissolves once you classify the data as event-vs-entity first, and only then pick the abstraction.

Frequently Asked Questions

What is the difference between a KStream and a KTable?

A KStream is an append-only log of events where each record is an independent fact, so two records with the same key are two separate events. A KTable is a changelog where each new record for a key is an upsert, so only the latest value per key is kept.

When should I use a KTable vs a GlobalKTable?

Use a KTable for partitioned entity data that scales with your partitions; use a GlobalKTable only when the lookup table is small and you want to skip co-partitioning, because it replicates the entire table to every instance. When unsure, default to KTable and upgrade to GlobalKTable deliberately.

What is stream-table duality?

Stream-table duality is the idea that a stream and a table are two views of the same data. A table is a stream folded up to the latest value per key, and a stream is the sequence of changes a table emits, which is why a KTable is restored by replaying its changelog and toStream() turns it back into updates.

Why does a GlobalKTable use so much memory?

Because a GlobalKTable reads all partitions of its source topic into every instance of your app, so a 30 GB table is 30 GB on each instance, plus the startup time to load it before processing begins. For large lookup data, a partitioned KTable join or an external store is usually the better call.

What is a tombstone in a KTable?

A tombstone is a record with a non-null key and a null value; it tells a KTable to delete that key. On a KTable, filter and mapValues skip your function for tombstones and forward the null through, so code that dereferences the value after toStream() can hit a NullPointerException.

See it in practice with Conduktor
A KTable is just a compacted Kafka topic read as a changelog, and a GlobalKTable reads all partitions of one. Conduktor Console lets you inspect those source topics, browse records to confirm a key's latest value, spot the tombstones (null-value records) that drive deletes, and check partition counts before you build a join that depends on co-partitioning.

Next steps

Kafka Streams joins, co-partitioning, and the GlobalKTable join that skips it
Aggregations, turn a KStream into a KTable with count, reduce, and aggregate
State stores, the local store and changelog that back every KTable