Kafka 4.3: OAuth assertions, share groups, tiered bootstrap

Stéphane Derosiaux May 23, 2026 10 min read
Kafka 4.3: OAuth assertions, share groups, tiered bootstrap

Apache Kafka 4.3.0 was just released. 25 KIPs, mostly polish across broker, KRaft, and Kafka Streams. Nothing extraordinary, but necessary. In this article, I want to address three of those KIPs. You'll care about them if you run Kafka with OAuth, if you've looked at SQS to get queues, or if you run tiered storage.

  • KIP-1258: OAuth client_credentials with client assertion. No more long-lived client_secret strings in your producer configs.
  • KIP-1240: Share groups configuration. The queue semantics on Kafka are getting better, which helps AI agents running on Kafka.
  • KIP-1023: Follower fetch from the last tiered offset. Some hidden waste in tiered storage finally addressed.

KIP-1258: secret-less OAuth with client assertions

Kafka supports SASL/OAUTHBEARER since KIP-255 (Kafka 2.0), and the client_credentials flow since KIP-768 (Kafka 3.1). The flow is rather simple: the client sends client_id and client_secret to the OAuth token endpoint (HTTP Basic (base64-encoded) or POST), gets back an access token (often a JWT, but to be treated as opaque), and uses it to authenticate to the broker.

Simple, that works. But it's also a security liability.

client_secret is a long-lived shared secret. Talk to your security team, that's quite bad. Worse, it's probably stored in your producer's properties file/config. If (when) leaked, this gives an attacker your identity until you rotate it (supposing you know it happened and you have to rotate it).

Rotating that secret requires you to update every client using it. Maybe one application, maybe several. And who knows where else these credentials ended up? Without even mentioning the Kafka tooling where we happily put application service account secrets. Secret managers like Vault/KMS help, but the issue is still "a string that proves I am me". We know it, but we do it anyway "yeah, we know, we'll fix it later".

KIP-1258 finally fixes that. It replaces the shared secret with a signed JWT assertion. Not to be confused with KIP-1139 added in Kafka 4.1: the jwt-bearer grant, where you exchange a JWT your identity provider already issued for an access token, instead of signing one yourself.

The token request, before and after

The change happens in the token request:

OAuth client_credentials with a shared secret versus a locally signed JWT assertion: the secret travels and can leak, while the private key never leaves the client and only a short-lived signed JWT goes over the wire

Before, with client_secret:

Kafka client → POST /token
  Authorization: Basic <base64(client_id:client_secret)>
  grant_type=client_credentials
  scope=kafka.read kafka.write

After, with client assertion:

Kafka client → POST /token
  grant_type=client_credentials
  client_assertion_type=urn:ietf:params:oauth:client-assertion-type:jwt-bearer
  client_assertion=<signed JWT>
  scope=kafka.read kafka.write

The JWT is signed with an asymmetric algorithm, RS256 or ES256. The OAuth server validates the signature, checks the claims and returns the access token. The rest of the SASL/OAUTHBEARER flow is unchanged.

The client configuration

KIP-1258 does not introduce any new configuration. The authentication method is detected automatically based on which properties you set:

security.protocol=SASL_SSL
sasl.mechanism=OAUTHBEARER
sasl.login.callback.handler.class=org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginCallbackHandler
sasl.oauthbearer.token.endpoint.url=https://idp.example.com/token

# Assertion settings (replaces clientId + clientSecret)
sasl.oauthbearer.assertion.algorithm=RS256
sasl.oauthbearer.assertion.private.key.file=/etc/kafka/secrets/client-key.pem
sasl.oauthbearer.assertion.claim.iss=kafka-producer-service
sasl.oauthbearer.assertion.claim.jti.include=true

sasl.jaas.config=org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule required;

Verify the claims your IdP needs before you deploy. The property names above check out against the 4.3 client (I ran them), but your IdP may require specific claims (sub, aud, azp) the defaults don't set.

If your identity provider issues pre-built assertions (Workload Identity Federation patterns), you point to the file instead:

sasl.oauthbearer.assertion.file=/var/run/secrets/idp/assertion.jwt

Private keys and assertion files are cached in memory but reloaded automatically when the file changes on disk. Secret rotation no longer requires a client restart. In-flight tokens stay valid until their own exp, so the staleness window during rotation is bounded by your token TTL (which you control).

What client assertions unlock

Three things:

  • Short exp claims become possible: five minutes is plenty, and the access token has its own TTL on top.
  • Sidecar patterns are easy now. Move the private key to a sidecar (SPIFFE, Vault, Kubernetes workload identity): the sidecar mints assertions, the Kafka client never sees the key. This is the pattern Istio and other service meshes use, and now it works for Kafka clients too.
  • "We do not store OAuth client secrets in producer configs" becomes literally true. Your SOC 2 and PCI auditors will appreciate it.

For multi-tenant Kafka behind a proxy like Conduktor Gateway, this matters even more. The proxy can normalize token verification across tenants, log per-identity activity, and present a single OIDC-aware front door even when downstream brokers are configured differently. Conduktor Gateway already validates OIDC client_credentials tokens via SASL/OAUTHBEARER. KIP-1258 is transparent here: the assertion is exchanged with your IdP, not the Gateway, so an assertion-based client already authenticates through it unchanged. Minting per-tenant assertion keys for the Gateway's own upstream auth is the next design step. We're scoping it now, so I'd love to hear your thoughts.

KIP-1240: share groups are more configurable

Share groups shipped as a preview in Kafka 4.0, and the team keeps improving them through the 4.x line. 4.3 adds another round: KIP-1240 introduces broker and group configurations to tune share group behavior, including acquisition lock duration, max in-flight records, and delivery attempt limits.

How a share group works

A traditional Kafka consumer group assigns each partition to exactly one consumer. Ordering is preserved per partition, and throughput is bounded by partition count, so you over-provision partitions up front because you can't predict tomorrow's load.

A share group (Kafka's take on a queue) inverts the model: every consumer in the group can read from all partitions, and you acknowledge individual records, not partition ranges. Each record gets an acquisition lock for a configurable duration (default 30 seconds), and the consumer must acknowledge, release, or reject it before the lock expires.

Consumer group versus share group: a consumer group binds each partition to one consumer so extra consumers sit idle, while a share group lets the broker hand individual records to any consumer, decoupling parallelism from partition count

Share groups lose ordering, by design. In exchange you get per-record acknowledgement, per-record retry, and consumer scaling decoupled from partition count (bounded by group.share.max.size, default 200). That's what a queue means in SQS, RabbitMQ, or Pulsar terms. It's not complete yet (no built-in DLQ, no retry/dead-letter topics), but Kafka is getting there.

Under the hood, there are new ShareFetch and ShareAcknowledge RPCs on the data path (check their wire format in Kafka Options Explorer), plus a new share coordinator (analogous to the group coordinator) that persists per-record state in a new internal topic.

Why should we care? AI Agents!

Kafka has long been treated as the wrong tool for job-queue patterns. People reach for Flink, but that's stream processing, a different job. The real need is moving requests and data between services and surviving the usual distributed-systems failure modes, and that's what Kafka was built for.

Multi-agent AI is, underneath, a distributed systems problem. Agents are microservices with a brain: they inherit the same coupling, backpressure, retries, and debugging under non-determinism that microservices teams hit around 2010. The answer is the one we landed on back then: put an event broker in the middle.

Agent workflows need tool dispatch, multi-step reasoning, parallel inference across providers, and long-running task pools. What they need looks exactly like a job queue:

  • Each task is independent. Ordering doesn't matter, so you want aggressive fan-out across workers.
  • Tasks have variable duration. A vision tool call takes 200ms, a deep research subtask takes minutes.
  • Failures are per-task. One agent task throws for some LLM reason, and you just want to retry that task, not reset the partition offset.
  • You want elasticity. Scale workers from 5 to 500 in seconds based on demand. Consumer groups make you re-partition; share groups don't.

If you already run Kafka, you can host your agent task queue on the same system, with the same durability, replay, and audit trail, instead of bolting a separate queue system next to it. Your CIO and CISO will appreciate the consolidation.

Agent orchestrator-worker loop on Kafka: the orchestrator publishes tasks to a topic, a share group of workers pulls and processes them, results land on a responses topic, and the orchestrator consumes them and keeps reasoning, with no temporal coupling

Tradeoff: share groups break the partition-as-unit-of-parallelism mental model that's driven Kafka cluster design for a decade. Capacity planning, rebalancing behavior, observability: all need new thinking.

The Conduktor Gateway doesn't proxy share groups yet, the demand isn't there. The market is getting closer, so reach out if that's on your roadmap.

KIP-1023: stop re-replicating data already in object storage

The least glamorous KIP in the release, but the one ops teams will actually like. It fixes a replication pain for anyone running Kafka tiered storage.

What a new follower re-replicates

Tiered storage splits each partition's log into two zones:

  • Hot data: a local tier on broker disk
  • Cold data: a remote tier in object storage (S3, GCS, Azure Blob)

Each partition is replicated (RF=3): one leader, plus follower replicas.

The issue: routine operations involve spinning up brand-new followers: they join and ask the partition leaders for data to copy until they catch up and rejoin the ISR (in-sync replicas).

Until KIP-1023, the new follower were re-replicating every byte the leader still has locally, even though most of that data were already sitting in S3. It means the replication was taking time and delaying the new broker entering ISR.

What KIP-1023 changes

KIP-1023 introduces a new bootstrap point: the earliest pending upload offset, the offset of the next record not yet uploaded to the remote tier. When enabled, an empty follower starts fetching from there. Everything before that, the follower reads from object storage if and when it needs to serve those offsets.

Adding a follower to a tiered topic, before and after KIP-1023: the old bootstrap point makes the follower re-replicate the whole local log from the leader, while the new one starts at the earliest pending upload offset so only 10 to 15 percent crosses the network and the rest is read from object storage on demand

It means a follower has almost nothing to replicate now and can join the ISR way faster, leading to more stability. I guess we're approaching the "stateless Kafka broker" phase, where the local data will be mostly a cache to fanout quickly.

follower.fetch.last.tiered.offset.enable=true

A thin follower makes a poor leader

A follower bootstrapped this way has very little local data. If it gets elected leader, it'll need to fetch historic segments from the remote tier to serve consumer fetch requests, slower than serving from local disk, plus object storage egress cost attached. A leader that doesn't have a lot of data locally is a leader that pays per request, basically.

KIP-1303 is the follow-up. It deprioritizes tiered-bootstrapped followers in leader election. Same logic should apply to fetch-from-closest-replica (KIP-392, Kafka 2.4): these followers are bad candidates for serving consumer fetch requests until they've built up local segments.

Rollback: Toggling the config off does not retroactively re-replicate data. Followers that bootstrapped from the tiered offset stay bootstrapped from there.

What's next?

The release notes have more behind these three: group coordinator append-buffer limits (KIP-1196), cordoned log directories (KIP-1066), KRaft fetch limits (KIP-1219), member epoch validation improvements (KIP-1251), and a stack of Kafka Streams DSL additions. Skim them if you operate Kafka or write streams applications. There's no headline feature this release, and that's kind of the point.


Planning an upgrade and want to talk through OAuth rollout, or be a design partner for share group multi-tenancy at the proxy layer? Book a call.