Your Platform Team Can’t Fix Kafka Costs Alone

Nicole Bouchard May 19, 2026 6 min read
Your Platform Team Can’t Fix Kafka Costs Alone
The 60-second version
  • Kafka spend grows year over year because most organizations are stuck at a maturity level where nobody outside the platform team can act on it.
  • "How do we cut the bill?" is the wrong starting question. The right one is who should own the cost, and what would it take for them to act.
  • Three stages of cost responsibility maturity. Stage 1 puts everything on the platform team. Stage 2 overcorrects to project teams without the means to act. Stage 3 is where the two partner effectively.
  • The leap straight to a FinOps model doesn't work. The supporting structures have to land in order.
  • Moving up a stage is change management only the engineering leader can authorize.

An engineering leader recently described hitting a wall on Kafka cost. The bill had grown faster than usage for several quarters, cleanup work kept eroding, and finance was asking sharper questions.

Why is this growing this fast? Can we make it more predictable? Is the spend actually producing value?

The leader did what most leaders do in that spot. They went to the platform team for answers.

Three stages of cost responsibility maturity

That scenario is recognizable because most organizations sit somewhere on the same three-stage progression. What changes across the stages isn't who pays the bill, but whether the people positioned to act have the data and expertise to do it effectively.

Stage 1: All responsibility sits with the platform team. They hold the budget, the tooling, the vendor relationship, and the bill. Project teams use the platform freely, costs grow with adoption, and the platform team has the accountability without the levers to act on resources other teams created.

Stage 2: Responsibility shifts to project teams without the means to act. The bill gets allocated across teams as a flat share rather than proportional to usage. Teams might see coarse usage data, but not the depth or expertise to tell value from waste. Responsibility moved, capability didn't follow, and nothing actually changes.

Stage 3: The platform team and project teams partner effectively. Project teams pay in proportion to actual usage with the visibility and tooling to act at the source. The platform team brings expertise about which patterns matter and where guardrails should sit. Accountability and capability balance across the two, which is what produces change.

Stage 1Stage 2Stage 3
Who pays the billPlatform team carries it aloneAllocated as a flat share across teamsTeams pay in proportion to actual usage
Who can act on costPlatform team alone, but without the levers on resources they don't ownTeams now pay, but coarse data and no expertise to tell value from wasteTeams act on their share, with platform-team expertise on what matters and guardrails on what doesn't
What enables the stageDefault starting pointA model for dividing the bill, plus coarse usage viewsProportional attribution, self-service tooling, guardrails, and a working platform-project partnership
Where it breaksAccountability without the leversResponsibility shifted, capability didn't followRequires the prior structures to be working

Why the platform team can't fix this alone

The scenario at the start is the textbook Stage 1 pattern. The platform team can identify the waste, drawing on the same cost patterns that show up across most estates. What they can't do is unilaterally retire dead topics or right-size partitions on resources other teams created. They have visibility without authority, and the project teams with the authority don't have visibility into what their decisions cost.

A common pattern makes the gap concrete. A project team complains the central platform is too expensive and threatens to move to a cheaper alternative they don't really want to operate. The frustration is misdirected. The team could likely make their bill feel reasonable by cleaning up their own waste, but cost and action ownership are split, so neither side can resolve it alone.

The platform team has visibility but no authority. The project teams have authority but no visibility. Only the engineering leader can rewire that.

Why "we need FinOps" doesn't fix it either

The natural reaction at Stage 1 is to declare "we need FinOps" or "we need chargeback," essentially a leap straight to Stage 3. It almost never works on the first try, because Stage 3 depends on supporting structures that don't exist at Stage 1: defensible cost attribution, per-team visibility, self-service tooling, and guardrails on new waste. Without them, project teams receive a bill they can't understand or act on, push back on the methodology, and the program loses credibility before producing results.

The path is sequential. Stage 1 to Stage 2 gets cost off the platform team's books and starts the cultural shift. Stage 2 to Stage 3 is where behavior actually starts to change.

The change management the engineering leader has to drive

Stage progression is fundamentally an organizational change. The actual lift is rewiring who is accountable for what, which is why this has to come from the engineering leader. In RACI-like terms:

  • Paying the bill sits with the platform team across all three stages.
  • Managing the budget belongs to the platform team at Stage 1, shifts to a flat allocation across teams at Stage 2, and lands proportionally on each project at Stage 3.
  • Identifying waste and acting on it is where most organizations break down. At Stage 3 it works as a partnership, with platform-team expertise on what matters and project-team context and action on their own workloads. Neither side can do this alone, which is why Stage 1 and Stage 2 both fail to produce change.

Reassigning that third row can't be delegated. It cuts across team boundaries and requires explicit prioritization to displace whatever else the project teams are doing.

Where to start

The work begins with an honest read of where the organization is today and what the next stage looks like. Four questions to work through with the platform team and a few project leads:

  1. What stage are we actually at? Showback on paper that isn't surfaced is still Stage 1. Visibility without action capability is still Stage 2.
  2. What's the next stage that's reachable in the next two quarters? Stage 2 if cost isn't yet shared. Stage 3 if cost is shared but not proportional, or if attribution exists but project teams can't act on it.
  3. Who owns each piece today, and what changes at the next stage? Vague aspirations don't produce change.
  4. What supporting structure does the next stage need? Attribution, self-service tooling, guardrails, a governance cadence. Scope what's missing as work.

What we hear from engineering leaders

In a recent conversation with platform owners running Kafka at scale, more than half could see total spend but not what was driving it. A smaller group had full visibility down to team and resource. Most organizations are operating somewhere in the Stage 1 to Stage 2 transition, often without a clear plan for the next step.

For a longer view of the cost patterns the diagnosis usually surfaces, our Kafka cost optimization page collects them in one place.

Find out which stage your org is at, and what the next one looks like.

Our free Kafka cost analysis maps your current ownership model and the cost patterns underneath it. You walk away with a clear read on where the gaps are, the supporting structures the next stage requires, and a prioritized view of where to invest first.

Get your Kafka Cost Analysis