Most Kafka Guardrails Don't Protect Your Data

Stéphane Derosiaux June 27, 2026 12 min read

Wireframe line-art on a dark teal background: a horizontal stream of data-packet cubes flows left to right through one central gate lit by a single lime glow that catches the cubes passing through it, while a dim rectangular rule-panel floats off to the side, bypassed and unlit.

"Platforms should be flexible enough to provide teams with the means to get to a solution as quickly as possible, while at the same time being rigid enough to prevent common mistakes." — A platform engineer

It's the whole platform engineering job in one sentence. And on Kafka, the rules that matter most are the ones protecting the data: who can read sensitive data, what shape a record takes, which service sees what.

Self-service already moves teams fast

Letting teams move fast? Mostly solved. A developer needs a topic or a connector. They declare it in a repo, and a pipeline applies it. Topics, schemas, ACLs, quotas as code, reviewed in a PR, the same way as the rest, using IaC, a button, scripts, Terraform.

The fast path: a topic declared as code, reviewed, and applied by a pipeline.

This is good. It's the part of self-service mostly everyone does. It takes minutes to provision a topic from a repo and tear it down. Moving fast works. Preventing mistakes is harder.

A guardrail only fires if a request goes through it

Where do the rules live?

a naming convention on a Confluence page
a kafka-topics.sh wrapper that regex-checks the name in CI
a Terraform module that's "the only supported way" to create a topic
a pull request someone from the platform team has to approve
...

The problem: they only fire if the request goes through them.

🚫 "We turned off auto-create and route every topic through Terraform. It's enforced."

Terraform enforces what goes through Terraform.
The CI check enforces what goes through CI.
Same for the PR gate.

The moment a developer (or a job, or a quick hotfix) talks to the cluster some other way, the rule is not enforced. And teams route around platform tooling more often than anyone likes to admit.

A rule you can route around is not a guardrail.

The broker answers any client directly

The broker doesn't care about these external rules.

Depending on their ACLs, anyone with the bootstrap server and valid credentials can speak Kafka's wire protocol directly and call createTopics, produce, fetch, whatever they're allowed to. It does not care that your naming convention is on a wiki, that your Terraform module sets min.insync.replicas, or that your CI script would have rejected the name.

Did you know Kafka has a few native controls of its own? ACLs define who can read or write a topic, and a CreateTopicPolicy plugin vets a topic's name, partitions, and creation configs, if you can deploy it on your brokers (self-managed only):

public class TopicPolicy implements CreateTopicPolicy {
  public void validate(RequestMetadata r) {
    if (!r.topic().matches("[a-z]+\\.[a-z]+\\..+"))
      throw new PolicyViolationException("name must be team.domain.*: " + r.topic());
    if (r.replicationFactor() != null && r.replicationFactor() < 3)
      throw new PolicyViolationException("replication.factor must be >= 3");
  }
  public void configure(Map<String, ?> c) {}
  public void close() {}
}

It's very basic, but that's what vanilla Kafka offers as enforcement. Nothing on the data itself: the broker only sees bytes.

Common mistakes we see:

Topic explosion and naming chaos. Always documented, rarely followed, especially when an application spins up its own AdminClient at startup and names the topic whatever it wants.
Over-partitioning. "Let's just default to 30 partitions." That's partition waste at scale, and it adds real risk to every cluster, cloud or on-prem: replica count, open file handles, and the controller load the whole cluster carries.
Poison-pill records. "Please validate against the schema." A producer that skips the registry writes garbage straight into the topic, and every consumer downstream has to handle it.
PII read by the wrong service. You can express "only this team reads these topics" as a Read ACL. But an ACL is all-or-nothing. It can't hand the payments service the raw pan and the analytics service the same field masked to the last four digits, off one topic. So you end up duplicating the data just to mask it.
The noisy neighbor. Kafka has quotas, but they're enforced per broker, not across the cluster. A 10 MB/s quota is really 10 MB/s on every broker the client touches, so the effective limit drifts with how partitions are spread. And a quota only applies to the user or client-id you set it on, so a renamed client-id starts fresh. Coarse, and easy to slip past.

One platform team we worked with found they were running 70% partition waste even with "good automation" in place. Conduktor Insights and MCP help surface exactly that.

For every rule, there's a version that lives on the wire and can validate or reject, and a version that lives upstream and can only advise.

Beside the request, a client routes around the rule. On the request, it can't.

The schema registry is not enforcement

A schema registry feels like enforcement. It isn't.

A registry stores schemas and checks compatibility. It's genuinely useful, but it doesn't stop a producer from writing anything to a topic, because nothing forces the producer through it. It sits beside the write path, not on it.

The broker stores record values as opaque bytes and never deserializes them, so the only place a schema actually gets checked is the client's own serializer. The registry only validates the clients that choose to use it, in the languages that have a decent serializer for it.

"We can have all the schema in the world, but it won't enforce what's on the headers." — A devops engineer, after a bad header crashed production over a weekend

So "we have a schema registry" has nothing to do with enforcing the data format in your topics.

What can reject a request?

If you want to know whether a guardrail is in the right place, ask yourself: When someone ignores this rule, what rejects the request?

Your answer cannot just be a person, a pipeline, or a page. Those are purely advisory: they can't enforce anything on the wire.
Your answer must be something every connection physically passes through. Only then can you enforce.

A guardrail every request has to pass through, whether the client cooperates or not, is part of the system. That's where rules belong, not off to the side.

Enforcement must be on the wire

Where can a rule live such that it can't be bypassed? On the request path itself.

For Kafka, that means either in the brokers themselves or in something between the clients and the brokers that speaks the protocol: a Kafka proxy.

Conduktor Gateway is a Kafka-protocol proxy: clients connect to it unchanged. It decodes each request, applies policy, and forwards, rejects, or rewrites the request or response. The trick: no client knows it's there, so the rules apply whether it cooperates or not.

What you want to enforce	On the side (hint)	On the request (enforced)
Topic name, partitions, RF, retention	wiki, CI regex, Terraform	`CreateTopicPolicy` on the broker, or the proxy
A record matches its schema	"please use the registry"	proxy validates on produce
Who reads which fields	a doc; a broker ACL is all-or-nothing	proxy: per-reader masking (raw vs last-4)
Sensitive fields encrypted	every client remembering to	proxy: field-level encryption on produce
Noisy-neighbor limits	a runbook	broker quotas keyed right; proxy: per-tenant
Isolation between teams	naming prefixes everyone respects	proxy: virtual clusters
Who did what	reconstructed from logs after	audit at the protocol, inline

"I can't set a data masking policy across all of my Kafka clusters." — Platform lead at a logistics company

Putting the rules on the request path is the dependable way to enforce data protection across an organization.

Same story outside Kafka

None of this is really a Kafka story. The same split shows up everywhere:

a Kubernetes admission controller, not "please don't deploy privileged pods"
a service mesh enforcing mTLS, not "please set a timeout"
an API gateway enforcing rate limits, not "please don't hammer the endpoint"

A control is only ever as strong as the narrowest point every action has to pass through. If it can be avoided, it's just advice, not proof it's followed.

What about AI agents?

Point an agent at your streams. Will it read your wiki, or ask the platform team for the rules? No. In this age of agents, expect more clients, more autonomous calls, fewer reviews. "Governed AI" is only as governed as what you can enforce on the wire. It's why we run agents in a sandbox: to stay in control of what they can do.

We're building Conduktor Agents to give them tighter boundaries than we ever set for humans, bounding not just what they can see, but what they can do.

A human service mostly honors the scope on its own. An agent honors nothing it isn't forced to. Only enforcement on the wire scopes both the same way.

The platform engineer's job is to own the boundary around what they hand out, to humans and now to agents.

Want a guardrail that actually rejects?

We'll show you Conduktor Gateway enforcing policy inline on your own Kafka: a bad record refused on produce, a PII field masked per reader, a topic blocked at creation. No client changes, no broker rewrite.

Book a demo