# Kafka Streams exactly-once

*Understand what exactly-once actually guarantees (and what it doesn't).*

Exactly-once is the feature people cite as the reason to use Kafka Streams, and also the one they most often misread. In Kafka Streams it is genuinely one config flag, and it genuinely works, for a precisely-scoped problem. The trouble starts when "exactly-once" is heard as "duplicates can never happen anywhere", because that is not what it means, and believing it leads to duplicates in production with the flag switched on.

This page is the honest version: what the guarantee covers, what it deliberately does not, and why a lot of experienced teams choose at-least-once instead.

**What you'll learn:**
- What `exactly_once_v2` turns on, and what it builds upon
- The exact scope of the guarantee: the consume-process-produce cycle inside Kafka
- Why external side effects (databases, REST calls) are not covered
- When the performance cost makes at-least-once the better call

## One flag

Exactly-once in Kafka Streams is a single setting:

```properties
processing.guarantee=exactly_once_v2
```

That's it. The mode behind `exactly_once_v2` has been the recommended choice since Kafka 2.6, where it shipped under the name `exactly_once_beta` (KIP-732 renamed it in 3.0). It replaces the original `exactly_once` (v1), which needed one producer per input partition to achieve the same result. EOS v1 was deprecated in Kafka 3.0 and removed in Kafka 4.0, so on any supported cluster v2 is the one you want: same guarantee, dramatically lower overhead.

> **Version note.** On Kafka 3.0+ clients, use `exactly_once_v2`. On 2.6–2.8 clients the same mode was named `exactly_once_beta`; setting `exactly_once_v2` there throws a `ConfigException`. The mode needs brokers on 2.5+. The older `exactly_once` (v1) was deprecated in 3.0 and removed in 4.0. On 2.6–3.x it still runs but is significantly more resource-hungry, and on 4.0+ `exactly_once_v2` is the only accepted value. There is no reason to choose v1.

Behind that flag, Kafka Streams wires together three mechanisms you could assemble by hand but really shouldn't:

- **The [idempotent producer](https://www.conduktor.io/kafka/idempotent-kafka-producer)**, so retries inside the cycle don't write a record twice to a Kafka partition.
- **Kafka transactions**, so the records produced *and* the consumer offsets committed for a batch are written atomically, all-or-nothing.
- **`read_committed` isolation** on the consume side, so the application only reads records from committed transactions, never from ones that were aborted and rolled back.

## What "exactly-once" actually means here

The guarantee has a precise boundary, and naming it correctly is the whole point of this page.

Kafka Streams makes the **consume → process → produce cycle, inside Kafka, atomic.** For each batch, three things happen as one unit:

1. it reads input records,
2. it produces output records to Kafka topics (including internal changelog and repartition topics),
3. it commits the input offsets.

Either all three commit together, or none do. If the application crashes mid-batch, the transaction aborts: the output records are marked aborted (they stay in the log physically; `read_committed` readers skip them client-side), the offsets are not advanced, and on restart the batch is reprocessed cleanly. The net effect (the records that land in the output topics and the state that ends up in your stores) is **as if each input record were processed exactly once.**

That last phrase is the accurate one: **effect-once**, not "the bytes physically appear once and are never retried under the hood." Internally, records *are* re-read and reprocessed after a failure. What exactly-once removes is the *observable duplicate*: the aborted attempt is invisible to a correct reader, so downstream sees each result a single time.

> 🚫 *"Exactly-once means a record can never be duplicated anywhere in my pipeline."*

The guarantee is scoped to Kafka-to-Kafka processing within the Streams application. It says nothing about duplicates produced *outside* that boundary, and assuming otherwise is the single most common way people get burned by it.

## What it does not cover: external side effects

The transaction is a *Kafka* transaction. It spans Kafka reads, Kafka writes, and Kafka offset commits. It cannot span anything that is not Kafka.

So if your processing logic does any of the following, exactly-once does **not** make it exactly-once:

- **Writing to a database.** A `INSERT` you issue inside a `process()` call is not part of the Kafka transaction. If the batch aborts and reprocesses, that insert runs again. You get a duplicate row unless the write is idempotent (an upsert keyed by something deterministic).
- **Calling a REST API.** An HTTP `POST` to a payment provider, an email send, a webhook: none of these roll back when the Kafka transaction aborts. The side effect already happened.
- **Anything non-deterministic in your logic.** Exactly-once assumes reprocessing the same input yields the same output. If your code reads the wall clock, calls a random generator, or depends on external mutable state, reprocessing after an abort can produce a *different* result, and now you have two different outputs for one input.

This is why "I turned on `exactly_once_v2` and I still see duplicates" is one of the most common Kafka Streams support threads. The flag is usually working exactly as designed; the duplicates are coming from a non-idempotent sink, a downstream consumer reading `read_uncommitted`, or non-deterministic processing, none of which the Kafka transaction can govern. The full debugging checklist is in [why you still see duplicates](https://www.conduktor.io/kafka-streams/exactly-once-duplicates).

## Exactly-once is not physical de-duplication

A related misconception: exactly-once will *not* collapse two genuinely-different input records that happen to mean the same thing. If the same logical event is produced **twice by an upstream producer** (two separate records, two offsets), Kafka Streams sees two distinct inputs and faithfully processes both. The transaction guarantees each is handled once; it does not know they are "the same".

De-duplicating across producers, across restarts, or by a business key is a *different* problem, and it needs a different tool: a stateful dedup operator backed by a state store, an idempotent sink keyed by a business id, or both. See [deduplication](https://www.conduktor.io/kafka-streams/deduplication) for the pattern. Exactly-once de-duplicates the *cycle*; it does not de-duplicate your *data*.

## The cost, and why at-least-once is a fair choice

Exactly-once is not free, and the trade-off is real enough that plenty of seasoned teams deliberately do not use it.

| | At-least-once (default) | Exactly-once (`exactly_once_v2`) |
|---|---|---|
| Duplicates in output | Possible after a failure | None observable (within Kafka) |
| Consume isolation | `read_uncommitted` | `read_committed` |
| Commit cadence | Offsets committed periodically | Transaction per commit interval |
| Latency | Lower | Higher: readers wait for commit |
| Overhead | Minimal | Transaction coordination on every cycle |

Two costs dominate. First, `read_committed` adds **end-to-end latency**: a downstream consumer cannot see records from a transaction until that transaction commits, so output appears in bursts aligned to the commit interval rather than continuously. Second, every cycle carries **transaction coordination** overhead (begin, produce, commit offsets, commit transaction) which throttles throughput compared to fire-and-forget at-least-once. One default softens the first cost: enabling exactly-once silently drops the default `commit.interval.ms` from 30000 ms to 100 ms, which keeps the bursts sub-second out of the box. It also means an exactly-once app commits 300x more often than an at-least-once one unless you tune the interval.

The pragmatic alternative is **at-least-once plus an idempotent sink.** Run the default `at_least_once`, accept that a failure may reprocess a batch, and make the *final* write absorb the duplicate: an upsert keyed by a deterministic id, a `INSERT ... ON CONFLICT DO NOTHING`, or a downstream dedup keyed by a business id. This pushes the correctness guarantee to the edge of the system where it actually matters, avoids the per-cycle transaction tax, and is often simpler to reason about than convincing yourself every link in the chain honors the transaction.

The broader framing (at-most-once, at-least-once, exactly-once and how offset commits decide which one you get) is in [delivery semantics for Kafka consumers](https://www.conduktor.io/kafka/delivery-semantics-for-kafka-consumers). Exactly-once is the strongest option, not the default one; choose it when the consume-process-produce cycle inside Kafka is genuinely where your duplicates come from, and reach for an idempotent sink when they come from anywhere else.

**Is Kafka Streams exactly-once real?**

Yes, but it is scoped: Kafka Streams makes the consume-process-produce cycle inside Kafka atomic, so the records that land in output topics and the state in your stores are as if each input were processed exactly once. It is effect-once: failed attempts are re-read internally but the aborted output is invisible to a correct reader, not "bytes never retried."

**How do I enable exactly-once in Kafka Streams?**

Set the single config `processing.guarantee=exactly_once_v2`. Behind it, Kafka Streams wires together the idempotent producer, Kafka transactions that commit output records and consumer offsets atomically, and `read_committed` isolation on the consume side.

**What is the difference between exactly_once and exactly_once_v2?**

`exactly_once_v2` gives the same guarantee with dramatically lower overhead, because the original `exactly_once` (v1) needed one producer per input partition. The v2 mode has been recommended since Kafka 2.6 (named `exactly_once_beta` until 3.0 renamed it); v1 was deprecated in 3.0 and removed in 4.0, so there is no reason to choose it.

**Does Kafka Streams exactly-once cover writes to an external database?**

No. The transaction is a Kafka transaction spanning Kafka reads, writes, and offset commits only: it cannot enroll a database insert, a REST call, or any non-Kafka side effect. If a batch aborts and reprocesses, those side effects run again unless the write is idempotent.

**Should I use exactly-once or at-least-once with an idempotent sink?**

Exactly-once adds end-to-end latency (`read_committed` readers wait for the commit) and per-cycle transaction coordination overhead. Many teams instead run at-least-once and make the final write absorb duplicates with an upsert keyed by a deterministic id, pushing correctness to the edge where it matters. Choose exactly-once when the Kafka-to-Kafka cycle is genuinely where duplicates come from.

> **See it in practice with Conduktor**
> Exactly-once runs on Kafka transactions and `read_committed` consumers, and both leave signals on the cluster. [Conduktor Console](https://docs.conduktor.io/guide/manage-kafka/kafka-resources/topics) lets you inspect the topics a Streams app reads and writes, watch consumer group lag, and confirm processing is keeping up, so when output arrives in bursts or a transaction stalls, you can see whether it's the commit cadence or a genuine backlog.

## Next steps

- [Why you still see duplicates](https://www.conduktor.io/kafka-streams/exactly-once-duplicates): the debugging checklist when exactly-once is on but duplicates persist
- [Deduplication patterns](https://www.conduktor.io/kafka-streams/deduplication): de-duplicating data, which exactly-once does not do
- [Idempotent Kafka producer](https://www.conduktor.io/kafka/idempotent-kafka-producer): the producer-level guarantee exactly-once builds on