Kafka platform teams deal with many environments and clusters: dev, QA, staging, UAT, preprod, prod, not counting business-unit segregation, use-case specifics, cross-region setups, and so on. Money is always a topic: often, they have way too many clusters, because, for a long time, a new cluster was the only way they had to isolate anything.
"How to provide safe multi-tenancy in the cloud was kind of insurmountable. We just didn't have the time, effort or knowledge at the time." — Lead engineer on a Kafka platform team at a large health insurer
Consolidation vs the security team. FIGHT!
Now, they want to consolidate: the clusters, their management, the cost. But they have a strong opponent: their security team.
The main objection to consolidation is that every environment should live in its own VPC with no network path between them, and whatever logical isolation a proxy offers is weaker than strict network segregation. That's a fair challenge. If a dev workload physically cannot route a packet to a QA cluster, what does a virtual cluster add that the network doesn't already give you for free?
This post is written for the security and compliance teams asking that question.
- The production boundary must be strict and physical. No discussion.
- The boundaries between non-prod environments are usually about people. This is where virtual clusters give you independent enforcement layers, at the price of one physical cluster instead of many.
Keep the production wall exactly where it is
What's obvious is that production and non-production must be segregated, hard, at the network layer. No exception. The moment a non-prod environment can reach production data or production infrastructure, you have a disaster waiting to happen.
We've seen pre-prod apps send Kafka records straight into the production cluster. They got lucky: the records carried a schema ID from the pre-prod registry, so prod consumers couldn't deserialize them. But the bytes still landed in the prod topics, advancing offsets and waiting to trip a consumer up. A schema mismatch is luck, not a control, which is exactly why the wall has to be real.
When a security team insists on a separate VPC with no connectivity for the prod boundary, they are right.
Scope. We'll talk about consolidating non-production environments (dev, QA, UAT, staging, pre-prod) onto shared infrastructure. Production keeps its own network and its own cluster(s).
Why separate environments?
A separate VPC per non-prod environment is the simplest way to be sure there is segregation. But what actually requires dev and QA to be unable to route to each other? It's rarely the network. Two reasons:
- Teams moving fast and coordination.
- Compliance and data.
Teams working in parallel
Two teams want to work on the same topics at the same time without trampling each other.
Take a pipeline A → B → C, where each arrow is a Kafka topic. On a shared dev cluster, team A's test events land in the same topics and consumer groups as team B's. They mix. The moment team B redeploys a half-finished app, it breaks for everyone downstream. So teams queue for the environment, ask for a new, more stable one, or hand-roll their own test topic nomenclature teamA-orders / teamB-orders.
Alternative: give each team (or developer, or feature branch) its own virtual cluster or sandbox and forget about these issues. Same clean topic names, no collision possible. Team A stands up the whole A → B → C in their own sandbox and iterates without touching anyone else's. The only thing that changes in those apps is the bootstrap address and the credentials. You just repoint the apps: preview environments, but for Kafka.
Compliance: who and what data
In regulated sectors, finance especially, there's a rule that those who build a thing shall not be the ones who sign off that it works. It's a constraint on who, not on where the packets go. You don't need a separate VPC to stop a developer from touching the QA dataset; you need an identity that isn't authorized for QA.
Staging and QA often get hydrated with copies of production data so tests run against something realistic. When the environments sit in separate VPCs, that copy has to cross the boundary through some intermediary. There's a clean way to handle it without a VPC: encrypt or mask the data at the proxy before it lands.
What a virtual cluster actually gives you
A virtual cluster is a logical namespace inside Conduktor Gateway, the Kafka protocol proxy that sits between your applications and your Kafka clusters. Each environment becomes its own virtual cluster with its own naming, rules, security, and isolation.
Invisible namespacing
Your apps reference orders in dev, orders in QA, orders in staging: the environment never appears in the name. Pipelines are identical across environments, with no per-environment config to wire in, so spinning up a new one means nothing to change. The vCluster does the physical prefixing transparently behind the scenes; only the platform admins ever see it. The common alternative is to prepend the environment name to all the topics, consumer groups and transaction IDs, which is a steady source of mistakes and oversights.
Cross-tenant isolation
A client connected to one virtual cluster cannot see, list, describe or address anything in another. The proxy is the single way in: the brokers sit on a private network the clients can't route to, and they accept connections from the Gateway alone, so nobody can bypass it to hit a broker directly. It's like a virtual machine, where the hypervisor makes sure a VM cannot reach the host.
Here is how you declare a QA virtual cluster with its own ACLs, in one API call:
apiVersion: gateway/v2
kind: VirtualCluster
metadata:
name: vc-qa
spec:
aclEnabled: true
aclMode: REST_API
acls:
- resourcePattern: { resourceType: TOPIC, name: orders, patternType: LITERAL }
principal: User:qa-app
host: "*"
operation: READ
permissionType: ALLOW
- resourcePattern: { resourceType: TOPIC, name: orders, patternType: LITERAL }
principal: User:qa-app
host: "*"
operation: WRITE
permissionType: ALLOW Copy this to create another vc-dev Virtual Cluster, and there you go: a new cluster to connect to. One physical cluster, two virtual clusters, no collisions, isolated by namespace and identity.
Isolation is a stack, not a flag
How is this actually isolated? A request crossing the Gateway goes through several controls:
- Namespace: QA can only see and access topics in its declared Virtual Cluster
vc-qa, and nothing in another environment's prefix is addressable. - Identity: the authenticated service account decides which virtual cluster you're in.
- Authorization: per-virtual-cluster ACLs give you least privilege inside the environment, so
qa-appreading and writingordersdoesn't imply it can touch anything else, anywhere else. - Policies (technical rules and data quality, for instance), applied with what we call interceptors, where the most specific scope wins: service account > group > virtual cluster > global.
That last layer is the most flexible one, as it's where you can stack guardrails. For instance, a topic-creation policy scoped to QA:
apiVersion: gateway/v2
kind: Interceptor
metadata:
name: qa-topic-policy
scope:
vCluster: vc-qa
spec:
pluginClass: io.conduktor.gateway.interceptor.safeguard.CreateTopicPolicyPlugin
priority: 100
config:
topic: ".*"
numPartition: { min: 1, max: 6, action: BLOCK } All these controls, defined by the platform team, are declarative and reviewable (they can be GitOps'd). This means that even if you change the underlying infrastructure and config, you can port and reapply these policies to Conduktor Gateway and keep the exact same security posture.
How do you reach a Virtual Cluster?
The virtual cluster is chosen by identity, not by the URL.
You can give each environment its own hostname: Gateway supports host-based (SNI) routing and an advertised host per deployment, so clients connect to dev.gw.internal or qa.gw.internal and everything feels like separate clusters:
environment:
GATEWAY_ROUTING_MECHANISM: host # route by SNI hostname
GATEWAY_ADVERTISED_HOST: qa.gw.internal Which virtual cluster a client actually lands in is resolved from its authenticated identity: this is the role of the GatewayServiceAccount:
apiVersion: gateway/v2
kind: GatewayServiceAccount
metadata:
vCluster: vc-qa
name: qa-app
spec:
type: EXTERNAL
externalNames:
- 00u9vme99nxudvxZA0h7 # the IdP identity (Okta, Entra) mapped to this account No credentials are shared across environments; they're per-environment. Issue one identity per environment and map it through your IdP, and the boundary holds. Reuse one across environments, and it doesn't. Your security team will appreciate this part.
Now that you have a virtual cluster, you can encrypt the regulated fields on the way in, in one API call:
apiVersion: gateway/v2
kind: Interceptor
metadata:
name: qa-protect-pii
scope:
vCluster: vc-qa
spec:
pluginClass: io.conduktor.gateway.interceptor.EncryptPlugin
priority: 100
config:
topic: ".*"
kmsConfig:
vault: { uri: http://vault:8200, token: ${VAULT_TOKEN} }
recordValue:
fields:
- { fieldName: email, keySecretId: vault-kms://vault:8200/transit/keys/nonprod, algorithm: AES256_GCM }
- { fieldName: ssn, keySecretId: vault-kms://vault:8200/transit/keys/nonprod, algorithm: AES256_GCM } In non-prod you usually stop there: with no matching decryption interceptor deployed, those fields stay ciphertext and the regulated values are never readable in the lower environment, so the copy stays out of audit scope. If a test genuinely needs realistic-looking values, masking is the better tool, since it redacts in place and can't be reversed. Gateway has masking interceptors for that, plus quite a few others: data quality, chaos testing, observability, traffic control, and much more.
Show me the money!
Hard network segregation means a separate network, which means separate infrastructure, which means many clusters to provision, patch, monitor, scale and license instead of one. The cost isn't really the brokers: it's everything around them, adoption, integration, education, coordination, the whole TCO.
Consolidating non-prod onto one physical Kafka cluster behind one Gateway collapses all of that to a single cluster, and with it all the surrounding process. It's not free of trade-offs:
- the environments now share brokers
- you lean on per-tenant quotas and rate-limits to avoid noisy-neighbor issues
- the Gateway now sits in the path of all traffic
But the isolation the security team cares about, namespace, identity, ACLs and policy, stays intact (stronger, even: every control is now declared and audited), and you get faster, cheaper iteration in return.
It's not about trading isolation for cost, it's about replacing a network boundary that happens to be expensive with multiple layers of enforcement that are cheaper to run and easier to audit, because every one of them is declared in YAML you can read, share, diff and review.
| Boundary | Enforce with | Why |
|---|---|---|
| Production ↔ non-production | Separate network, separate cluster | Mixing drags non-prod into prod's audit scope |
| Dev ↔ QA ↔ staging | Virtual clusters + per-vCluster ACLs | The split is about people and access, not packets |
| Prod data landing in non-prod | Field encryption or masking at the proxy | Keeps regulated values out of the lower environment |
Conclusion: one Prod and one Gateway to rule Non-Prod
The production boundary must stay physical. The boundaries between non-prod environments must be logical, enforced by namespace, identity, ACLs and policy, unless production data lands there, in which case you encrypt or mask it on the way in. One Gateway, one cluster, as many isolated environments as you have, and an isolation story that's easy to understand.
If you want to map your own non-prod topology onto virtual clusters, book a demo and we'll walk you through where the boundary must stay at the networking layer and where it can move to the logical one.
