Kafka Platform: Build vs Buy Decision Framework

Framework for evaluating self-managed Kafka vs managed services like Confluent Cloud and MSK. TCO analysis, team skills assessment, and decision criteria.

Stéphane DerosiauxStéphane Derosiaux · January 26, 2024 ·
Kafka Platform: Build vs Buy Decision Framework

The build vs buy decision for Kafka isn't about technology. It's about where you want to invest engineering attention.

I've helped organizations make this decision dozens of times. The ones who get it right focus on organizational capacity, not feature comparisons. The ones who get it wrong underestimate operational cost by 3-5x.

We thought self-managed Kafka would cost us $15K/month in infrastructure. Two years later, we realized the real cost was $80K/month when you count the three engineers who spent half their time on Kafka operations.

VP Engineering at a SaaS company

What You're Actually Buying

When you "buy" managed Kafka, you're purchasing operational capacity. When you "build," you're committing engineering time to infrastructure instead of product.

Self-managed means owning: cluster provisioning, broker tuning, security (SSL, SASL, ACLs), monitoring, alerting, incident response, upgrades, patching, disaster recovery. You'll also need chargeback and cost visibility to track consumption across teams.

Managed services abstract away: infrastructure provisioning, broker lifecycle, basic monitoring, upgrades.

They don't solve: application-level governance, cross-team access control, schema evolution strategy, cost optimization for your workloads.

TCO: What Most Organizations Miss

The license cost of Apache Kafka is zero. The operational cost is not.

Self-Managed CostMonthly Range
Infrastructure (3-broker prod)$3,000–$15,000
Engineering time (1 FTE)$15,000–$25,000
Monitoring stack$500–$2,000
DR infrastructure80–100% of primary
Conservative estimate: $20,000–$50,000/month for production-grade with proper DR.
Managed Service CostMonthly Range
Service fee$3,000–$30,000
Data transfer$500–$5,000
Add-on features$1,000–$10,000
Data transfer breakdown: Intra-AZ is free. Cross-AZ costs $0.01/GB. Cross-region costs $0.02/GB. Internet egress runs $0.09/GB. A 10 TB/month cross-AZ workload adds $10,000/month in transfer fees alone—often exceeding the service fee.

Managed services appear cheaper at small scale. Self-managed becomes economical at high, stable throughput—but only if you already have the team.

Team Requirements: The Honest Assessment

To run Kafka in production responsibly, you need 2 FTEs minimum for a single cluster: one Kafka admin, 0.5 platform engineer, 0.5 on-call coverage.

For enterprise-grade multi-cluster deployments: 4 FTEs.

Honest check: If your team can't explain the difference between session.timeout.ms and max.poll.interval.ms without looking it up, you're not ready to self-manage.

Decision Criteria

CriterionFavor ManagedFavor Self-Managed
Kafka expertiseNo dedicated expertiseMultiple Kafka engineers
Time to productionThis quarter12+ month runway
On-call capacityCan't staff 24x7Strong on-call culture
ThroughputVariableStable, predictable, high
ComplianceStandard controlsRequires specific controls

The Hybrid Path

Pure build or pure buy is rare. Most organizations combine:

Managed Kafka + Self-Managed Tooling: Confluent Cloud or MSK for brokers, your own governance layer on top.

Multi-Tier: Managed for dev and non-critical. Self-managed for production where you need maximum control.

Red Flags

Self-managed is failing when: Upgrades are perpetually deferred. Incidents escalate to the same 1-2 people. Developers avoid Kafka because "it's too hard."

Managed isn't delivering when: Monthly bill surprises. Support tickets take days. Can't implement required security controls.

The Real Question

The goal isn't to pick the cheapest option. It's to pick the option that lets your organization move fastest on what matters most.

Where should your engineering attention go? Infrastructure or product?

Book a demo to see how Conduktor provides governance and developer experience that managed services don't include—and self-managed teams don't have time to build.