What a Kafka Proxy Can Do: From Routing to Enforcement

Stéphane Derosiaux June 30, 2026 7 min read
Wireframe line-art: several streams of data-packet cubes fly in from the left, pass through one glowing gateway in the center, and continue as a single ordered stream into a Kafka broker cluster on the right.

Where does control over your Kafka live?

Think of the rules you actually set: PII must be masked before a consumer reads it, no team can create a topic with replication factor 1 in production, you need a record of who consumed which topic last quarter, teams sharing one physical cluster stay strictly isolated. Where are those rules set and enforced?

Typical answer: in various places, or "wait, which rules?" Some are broker configs, some live in a library (config in .properties, or pulled from Vault), some are just in a wiki nobody reads. That's exactly the gap a Kafka proxy fills. It's not about routing traffic. That's what a reverse proxy like NGINX is for. A Kafka proxy is mainly a place to set up and enforce all these data controls.

Reverse proxy vs Kafka proxy

Is a Kafka proxy a reverse proxy? It sits between clients and servers like NGINX or a service mesh, but the two do very different jobs:

Left: NGINX, one entry point fanning out to many services (Auth, API, Web), there to route. Right: many apps funnelling through a Kafka proxy into the Kafka cluster behind it, the proxy there to add the security and controls Kafka lacks.

When people hear "proxy" they picture either:

  • NGINX, HAProxy, an L4 load balancer: something that moves bytes from a socket on one side to a socket on the other and doesn't care what's in them.
  • a Kafka REST Proxy, which translates HTTP into Kafka for clients that don't speak the protocol.

A reverse proxy is great to 'hide' what's behind: route, mutate headers, terminate TLS, forward the payload, add some quotas, done. That's especially true of an HTTP reverse proxy, where the protocol is rather trivial (GET, POST, PUT, etc.). A regular TCP proxy has no idea about the underlying protocol, whether the bytes are a Produce request, a Fetch, a consumer group join, and so on.

A Kafka proxy is none of that.

A Kafka proxy sits between clients and brokers too, but it does parse the traffic. Kafka clients speak a binary, stateful, request/response protocol over TCP, where every request is typed (Produce, Fetch, Metadata, FindCoordinator, CreateTopics, JoinGroup, and many more). A Kafka proxy decodes each one, reads what it is, and can decide what to do with it before it ever reaches a broker.

A reverse proxy is one front door fanning out to many services. A Kafka proxy is the reverse: many apps funnelling through it into the cluster behind it, often across network or provider boundaries (on-prem to cloud, or hops between VPCs), there to add the security and controls the clusters lack on their own.

The Kafka wire protocol, quickly. Kafka clients don't talk HTTP/REST. They open long-lived TCP connections to brokers and exchange typed, versioned binary frames. Reading those frames is L7 work (interpreting the Kafka protocol), not L4 (just moving bytes).

Why a "dumb" Kafka proxy gets bypassed

You can't just drop a TCP load balancer in front of Kafka and call it a proxy. Try it and clients will either fail or route around it within one round trip.

Why? Because of how Kafka clients discover brokers. A client connects to a bootstrap address, then immediately asks for Metadata. The cluster answers with the address of every broker (its advertised.listeners), and from that point on the client talks directly to those addresses for everything. If your proxy forwarded that Metadata response untouched, the client now has the brokers' real addresses and connects straight to them. Your proxy is out of the path. Bypassed.

So a Kafka proxy has to do something a plain TCP load balancer never would: it rewrites the broker addresses in the Metadata (and FindCoordinator, and DescribeCluster) responses to point back at itself.

The metadata response path through a Kafka proxy: the brokers return their real addresses (10.0.0.5:9092, 10.0.0.6:9092) to the proxy, and the proxy rewrites them to its own addresses (proxy:9093, proxy:9094) before handing them to the client, so every later connection comes back through the proxy.

Where can control even live?

There are four places control could live. A Kafka proxy is one of them, and the most flexible and complete:

Four candidate locations for Kafka control side by side: the broker (in Kafka, topic-level ACLs and quotas, opaque bytes, partial), the client library (inside each app, only the clients you control, partial), a sidecar (one per app to deploy and keep in sync, opt-in, partial), and the proxy (in front of Kafka, platform-owned, hides the cluster, parses every request, the only complete chokepoint, highlighted).

  • The broker. ACLs and broker quotas live here. It's quite coarse: an ACL allows operations on a resource for a principal, and that's it. No concept of a field, a record, a user, or context. The broker also never reads the payload, so "mask this PII field for that reader" or "audit who read which record" aren't things you can express here.
  • The client library. You can put an interceptor or a serializer in the client: encryption, some validation. It works, until you remember you don't control every client: every language, framework, and tool that speaks Kafka, down to the raw kafka-console-consumer your data scientist runs.
  • A sidecar. It has to attach to each service that connects to Kafka. You deploy it once per app, touching every app's deployment to add it, then run and patch N copies that drift out of sync. And it only protects the apps that actually run it: you can't force every team, contractor, legacy app, or batch job to add one.
  • The proxy. It's installed once in front of the cluster and owned by the platform team, who control the bootstrap servers. One thing to run, one place the rules live. Every client has to pass through it, by construction and network setup. And because clients connect to it as if it were Kafka, it hides the real infrastructure behind it: brokers, addresses, even which cluster you're pointed at can change without touching a single application.

The broker stores your data. The proxy is where you can govern access to it. Different jobs, different layers.

What that position lets you do

Now that the proxy decodes every request and response, you can add control of all kinds:

  • Data security. Encrypt a record's value before it reaches the brokers, and decrypt or mask it per reader, with keys in your own KMS. The brokers only ever see ciphertext. Deploy a new app tomorrow and it's covered automatically, with no code changes.
  • Governance and policy. Reject a CreateTopics request that asks for replication factor 1, a wrong naming prefix, too many partitions, or missing required metadata. Anything that connects and mutates state is now governed at the wire, unlike GitOps, which is restrictive, often limited to topic creation, and does nothing the moment someone bypasses it.
  • Data contracts. Validate that a record conforms to its schema and to business rules a schema can't express, and reject or dead-letter it if not. A schema registry is advisory and client-cooperative. The proxy can make it mandatory.
  • Auditability. Every request already passes through, so the request log is the audit trail for all Kafka activity.
  • Multi-cluster routing. Pointing clients at a different physical cluster (a migration, a failover) is a proxy-side change, not a fleet-wide client redeploy.
  • Tenant isolation. A "virtual cluster" is the proxy transparently prefixing every Kafka resource name on the way in and stripping it on the way back. Each team sees a clean namespace, its own world. No new infrastructure, and as many virtual clusters or sandboxes as you want on one physical cluster.

The layer above the broker

A Kafka proxy doesn't replace your brokers. The brokers keep doing what they're good at: storing and replicating the log. The proxy holds the cross-cutting concerns that have nowhere else to live: the data security, the governance, the audit, the isolation.

It's where those controls gravitate once an organization runs Kafka at any real scale.

So: not a Kafka NGINX. A control point.

If you want the detailed walkthrough, see What is a Kafka proxy?. Conduktor Gateway is the implementation of everything above. If you'd rather see it against your own setup, talk to us.