Kafka Team Structure: Platform Team vs Embedded Model

Platform teams centralize expertise but bottleneck delivery. Embedded engineers move fast but create chaos. Here's how to choose.

Stéphane DerosiauxStéphane Derosiaux · October 3, 2024 ·
Kafka Team Structure: Platform Team vs Embedded Model

Your Kafka architecture is only as good as the team running it. I've seen technically excellent clusters fail because nobody knew who owned them. And simple setups thrive because one team had clear accountability.

Two models dominate: centralized platform teams and embedded engineers. Both have tradeoffs. The right choice depends on your scale and where bottlenecks actually live.

We went full platform team and became a ticket queue. Topic creation took 5 days. Product teams started spinning up shadow clusters. Adding self-service with guardrails fixed everything—delivery velocity tripled.

Platform Lead at a fintech

Platform Team Model

A dedicated Kafka team owns everything: cluster operations, topic provisioning, schema management, security, monitoring, and on-call. Product teams submit requests; the platform team executes.

Where it works: Kafka serves 10+ teams. Compliance requires centralized audit trails. Junior engineers on product teams don't need to understand broker internals.

Where it breaks: The platform team becomes a bottleneck. Every topic creation, ACL change, and schema update becomes a ticket. Product teams wait days for simple changes.

SymptomRoot Cause
Multi-day wait for new topicsPlatform team at capacity
Shadow clusters appearingTeams bypassing the bottleneck
Platform team burnoutSupport overhead exceeds engineering

Embedded Model

Each product team has 1-2 engineers with Kafka expertise. Teams operate independently, making their own architectural decisions. No centralized authority.

Where it works: Startups and early adoption. 2-3 teams using Kafka for distinct use cases. Maximum velocity and experimentation.

Where it breaks: One team creates 1000-partition topics because nobody stopped them. Another skips encryption. When your embedded expert leaves, the team loses years of context.

SymptomRoot Cause
Wildly inconsistent configurationsNo standardization
Security policy violationsNo central enforcement
Duplicate topics with similar dataTeams unaware of each other

The Hybrid: Platform + Self-Service

Most mature organizations land here. A platform team sets guardrails and builds self-service. Product teams operate within those boundaries autonomously.

Platform team owns: Cluster provisioning, security policies, disaster recovery, tooling, cost management.

Product teams own: Topic creation (within naming conventions), schema design (within compatibility rules), monitoring their applications.

The platform team's job shifts from executing requests to building capabilities that let product teams self-serve safely.

Implementation: A Gateway layer intercepts all Kafka traffic. Product teams create topics freely, but the Gateway enforces: replication factor must be 3, partition count under 50 without approval, topic names follow conventions, sensitive fields get encrypted automatically. Self-service portals let teams request resources within guardrails.

No ticket queue. Product teams move fast. Platform team focuses on capabilities.

When to Transition

Signals to centralize (from embedded):

  • 5+ teams using Kafka independently
  • Repeated operational incidents across teams
  • Cross-team data sharing becomes common

Signals to add self-service (from platform):

  • Topic creation backlog exceeds 1 week
  • Platform team spends >50% time on requests
  • Product teams creating shadow infrastructure

Scaling Guidelines

ClustersModelPlatform Team Size
1-3Embedded0 (product teams own)
4-10Hybrid2-4 engineers
10-50Platform + Self-Service5-8 engineers
50+Platform + CoE8-15 engineers
The organizations that struggle most never deliberately chose a model—they just grew into whatever pattern emerged from early decisions.

Book a demo to see how Conduktor Console provides the self-service layer your platform team needs—topic management, schema governance, and access control with guardrails baked in.