# Good Kafka Is Boring Kafka

What is the highest compliment a regulated bank pays its Kafka platform? **Silence**. Nobody talks about it.

A field lesson from CDC Informatique on making Kafka boring, on purpose.

## Kafka to maintain France's financial stability

CDC Informatique runs the IT for Caisse des Dépôts in France, which puts it under the ACPR, the authority that oversees the French banking and insurance sectors to keep the financial system stable. You can't get more serious than this.

**What does their Kafka platform look like? How did it even start?**

Like everywhere else: Kafka shows up somewhere small (one ingestion pipeline, one team that wanted decoupling), people love it, and they share it. It's async, durable, easy to plug into, simple to use. So it spreads.

One day it breaks. Not the technical side, the people-and-governance side: nobody decided who owns what, how topics are named, who's allowed to read whose data, what a sane partition count is. Kafka scaled. The thinking around it never caught up.

> This post is a written form of [the webinar we recorded with Julien from CDC Informatique](https://www.conduktor.io/events/webinars/cdc-parcours-kafka-avec-conduktor). If you prefer to watch it in French, the recording is there.

HDS and ISO 27001 certified. Health data, disability data, pension data flowing through the systems. Full on-premise, dedicated VLANs per cluster, no crossing of environments. Tolerated downtime: a few hours a year, an SLA around 99.95%. Their main takeaway after years of running Kafka in that environment:

**The technology is never the problem.**

## Kafka spreads faster than anyone governs it

Kafka arrived at CDC Informatique in 2019 the way it arrives almost everywhere: as a feature of something else. It came bundled in a Cloudera Big Data distribution, there to do real-time ingestion into the data lake.

Then magic happened: give developers a system they can publish to and subscribe from without coordinating calls or negotiating formats, and they'll use it for *everything*. Adoption follows a hockey-stick curve. This sounds like a success story, until you realize the platform underneath was never designed for that scale.

Resource creation was centralized and scattered across teams at the same time (yes, both at once, that's how you know it's organic). No naming standard. No ACL convention. No ownership model. No common patterns.

> 🚫 *"Let's add governance later, once adoption proves Kafka out."*

This is the mistake almost everyone makes. Governance feels like extra process, something you add later once the platform is already working. In reality, success is exactly what makes governance impossible to ignore: as Kafka spreads across teams and systems, every unclear owner, every risky schema change, every ungoverned stream starts to matter. CDC Informatique hit that wall in 2021.

> *"We ran into real trouble with Kafka in 2021."* — Julien Maillard, Senior Architect, CDC Informatique

It wasn't a Kafka failure. It was what happens when you treat data infrastructure as just-another-technology. It was time for a change.

They killed the Cloudera-bundled approach, stood up dedicated Kafka clusters with a team that owned them, and gave the whole thing a name and a service contract. Kafka was now a *product*, not just infra.

> *"Nothing in life is easy, but we got the sponsorship, and we had the courage to do it."* — Julien

- The technical migration is the "easy" half.
- The hard half is getting production, engineering, and leadership to agree to pause and re-lay the foundations of something people already depend on.

## Governance first, technology second

Ask Julien for the one piece of advice he'd give a less mature organization, after nine years working around Kafka:

> *"The biggest advice I can give is to think about governance from the moment you stand up the platform. And to design it so you can delegate it to the project teams, inside a frame you can industrialize."* — Julien

It's not about the tech. Governance is not a thing you add later, when it's already too late. Naming, ACLs, ownership, the patterns you want to use: decide them early, or it gets expensive to fix later. Everyone learns that eventually, usually by getting burned.

Governance has to be **delegable** and **industrializable**. Not a RACI matrix in a wiki or a spreadsheet (we've all seen the Excel, let's never speak of it again). A frame the project teams operate *inside*, automatically, without filing a ticket:

**Governance you can't delegate becomes the bottleneck you were trying to avoid.**

CDC Informatique classifies its Kafka usage on the [DICP scale (availability, integrity, confidentiality, traceability)](https://www.tenacy.io/en/articles/it-risk-assessment-dicp-analysis), with integrity treated as absolute and traceability non-negotiable.

## Self-service, governed by default

CDC Informatique built an internal GitOps tool, "resources as code". Every project team gets a Git repo with one file describing the desired state of its Kafka world: topics, ACLs, schemas, connectors, Kafka Streams apps, service-account bindings. They declare what they want. The tooling reconciles it, applies controls (partition caps, for one), and enforces logical isolation per application and per environment through naming rules and ACLs. 99% of requests are standard; the other 1% get a conversation with the platform team, who can lift a control or refuse a config that makes no sense.

Any of their teams can stand up a PostgreSQL CDC connector feeding an [outbox pattern](https://www.conduktor.io/blog/transactional-outbox-pattern-database-kafka) (declaration, access requests and all) in about half an hour. No central team in the critical path.

Conduktor provides the same shape under its [self-service framework](https://www.conduktor.io/blog/governed-kafka-self-service). You declare an application and what it owns. The platform derives the ACLs and the ownership boundaries:

```yaml
apiVersion: self-serve/v1
kind: Application
metadata:
  name: "payroll-events"
spec:
  title: "Payroll Events"
  owner: "payroll-team"
---
apiVersion: self-serve/v1
kind: ApplicationInstance
metadata:
  application: "payroll-events"
  name: "payroll-events-prod"
spec:
  cluster: "prod-bank"
  serviceAccount: "payroll-events-prod"
  resources:
    - type: TOPIC
      patternType: PREFIXED
      name: "payroll."          # owns everything under payroll.*
    - type: TOPIC
      patternType: PREFIXED
      ownershipMode: LIMITED
      name: "reference.party."  # read-only on the shared reference data
```

`payroll-team` owns `payroll.*` and gets *limited* access to someone else's `reference.party.*` topics. The most important organizational move CDC Informatique made: they pushed the responsibility for access [**left, onto the project teams**](https://www.conduktor.io/blog/no-kafka-data-platform-without-ownership).

The platform team can't possibly know whether the payroll application is *business-authorized* to consume a third-party reference dataset. That's a functional decision. It lives with the people who understand the domain. The central team only owns the framework and the guardrails. The project teams own the decisions inside it.

## Most people who touch Kafka aren't Kafka people

No one starts out a Kafka engineer. You build Spring apps. You write Python pipelines. You run QA. You do data analysis.

Before CDC Informatique had a decent UI, all of them had to go find a developer. The existing interface wasn't acceptable, and the raw CLI is not where you send someone whose job isn't Kafka.

> *"The UI in the distribution didn't work for us, not on security, not on performance."* — Julien

With a few hundred internal users, going through a developer every time doesn't scale. The fix was a tool for the people who *aren't* experts:

- search and filter across topics
- produce a test message without writing code
- diff two Avro schema versions
- replay events

All behind SSO, with an audit trail, mirroring the exact logical isolation the platform already enforced.

> *"Console is the product our users really fell for. Take it away now and the reaction would be... loud."* — Julien

Adoption is measured by how badly people would miss the thing.

Another benefit: first-level troubleshooting moved to the project teams. They inspect their own messages, check their own schemas, replay their own events. The central team only gets pulled in on the genuinely hard cases.

## Your cluster is at the mercy of its clients

A Kafka cluster is, by default, at the mercy of its clients. A client can:

- ship uncompressed data
- set `acks=0` to "go faster" and silently drop messages
- batch in a way that hammers the brokers
- push [multi-megabyte records](https://www.conduktor.io/blog/beyond-limits-produce-large-records-without-undermining-apache-kafka) that replication then duplicates three or four times
- [over-partition](https://www.conduktor.io/blog/stop-over-partitioning-kafka) a topic into the hundreds
- produce [schemaless data](https://www.conduktor.io/blog/the-hidden-pitfalls-of-kafka-s-schemaless-data)

You find out from a monitoring dashboard, after it already hurt. Documentation doesn't enforce anything.

CDC Informatique put [Conduktor Gateway](https://www.conduktor.io/gateway) in the path as its Kafka-native proxy. The adoption cost is almost nothing: clients change their `bootstrap.servers` to point at the Gateway, and that's the entire migration. Which is exactly why you can roll it into integration and pre-prod first, and start getting value before it's anywhere near production traffic.

> *"The Gateway proxy won me over immediately."* — Julien

Now you have a control point where there wasn't one. You can actually *see* what clients send, and decide what's allowed. Here's a producer policy that requires compression, demands `acks=all`, and forces idempotence:

```yaml
apiVersion: gateway/v2
kind: Interceptor
metadata:
  name: producer-efficiency-policy
spec:
  pluginClass: io.conduktor.gateway.interceptor.safeguard.ProducerPolicyPlugin
  priority: 100
  config:
    compressionType:
      allowed: ["zstd", "lz4", "snappy"]
      action: BLOCK
    acks:
      required: "all"
      action: BLOCK
    enableIdempotence:
      required: true
      action: BLOCK
```

In their lower environments, they use this to *surface* the bad configurations (a producer with no compression inflating network and storage, for instance) and require a fix before anything reaches prod.

**Best practice that isn't enforced is just a PDF nobody reads.**

CDC Informatique also runs chaos testing through the Gateway. They wire up an interceptor that duplicates messages on the consumer side, `DuplicateMessagesPlugin`, to test whether downstream processing is genuinely idempotent. Does a replayed payment event get charged twice? That's hard to simulate any other way. [Discover all the failure modes you can simulate](https://www.conduktor.io/blog/chaos-engineering-for-kafka-testing-recovery-before-you-need-it): broken brokers, latency, leader elections, data corruption, and more.

## The work quadrupled. The team didn't.

In three years CDC Informatique went from roughly 40 applications in production to 160, with another ~50 coming in the next couple of quarters. Over the same period the Kafka platform team went from one ops + one expert + one architect, to two ops + two experts + one product owner.

The workload quadrupled. The team grew by two people, and it's been flat for about eighteen months.

The tooling absorbs the growth. Every new application onboards itself through the frame, troubleshoots itself through the Console, and gets its bad configs caught by the Gateway.

> *"A good sign, for us, is that we don't hear much about Kafka."* — Julien

I love that as a definition of platform maturity: *good Kafka is boring Kafka*.

## What's next

The modern data spine is an API gateway for the synchronous world and Kafka for the asynchronous one, and the new use cases (agents, [AI, MCP](https://www.conduktor.io/blog/conduktor-mcp-and-skills-for-ai-agents)) graft onto that spine. Which means the same governance questions apply to an AI agent reading or writing a topic as to any other client. Who is it? What's it allowed to see? Is it audited? Eight years ago every conference was about Kafka. Today every conference is about AI. The backbone didn't change. The things plugged into it did.

[Book a demo](https://www.conduktor.io/contact/demo) to see what the governed-self-service version of this looks like on [Conduktor](https://www.conduktor.io/console): the Console on the human-access path, the [Gateway](https://www.conduktor.io/gateway) on the data path, and the guardrails that let you hand the keys to your project teams without losing control.

The hard part was never the log. It's deciding who gets to write to it (..and finding the courage to enforce that).
