Kafka Data Products: Beyond Topics

Kafka data products need contracts, ownership, SLAs, and discovery. Topics with documentation aren't data products—they're shared state.

Stéphane DerosiauxStéphane Derosiaux · February 3, 2026 ·
Kafka Data Products: Beyond Topics

A README doesn't make a topic a product.

Most "data products" are Kafka topics with Confluence documentation explaining what they contain. This is better than nothing, but it's not a product. Real products need data contracts. Products have owners who are accountable, SLAs that define reliability guarantees, contracts that prevent breaking changes, and discovery mechanisms so teams know they exist.

Real data products answer: Who owns this? What's the guaranteed uptime? What happens if schema changes break my consumer? Can I discover this without asking in Slack? If these questions don't have answers, you don't have a data product—you have a topic with aspirational documentation.

The shift to data products changes organizational dynamics. Instead of "the platform team owns all Kafka infrastructure and everyone files tickets," individual teams own specific data products and are accountable for their reliability, quality, and evolution. Platform teams provide infrastructure and guardrails; product teams provide data.

What Makes a Topic a Product

Ownership means a team is accountable for data quality, availability, and evolution. Not "someone knows about this topic," but "this team's quarterly goals include maintaining this data product, and they're on-call when it breaks."

Ownership manifests in operational reality: when consumers report issues, they know who to contact. When schema changes are needed, there's a process for requesting them. When downtime happens, the owning team is responsible for resolution and postmortems.

SLAs define reliability guarantees: uptime percentage, maximum lag for real-time topics, data freshness guarantees. These aren't aspirational—they're measured and reported.

Example SLA: "orders-created topic maintains 99.9% availability (measured by producer success rate) with p99 produce latency under 100ms. Messages are available to consumers within 5 seconds of production."

If the topic violates SLA (availability drops to 98%, latency spikes to 500ms), the owning team investigates and remediates. SLAs create accountability.

Data contracts enforce backward compatibility through schema validation. Producers can't deploy schema changes that break existing consumers. This isn't documentation ("we try not to break consumers")—it's enforcement through Schema Registry compatibility modes.

When a team wants to evolve a data product's schema, they follow a defined process: register the new schema, verify compatibility, coordinate with consuming teams if breaking changes are unavoidable, and migrate consumers before removing old fields.

Discovery means teams can find data products without institutional knowledge. A searchable catalog shows: what topics exist, what data they contain, who owns them, what schemas they use, who's consuming them.

Without discovery, teams duplicate effort: "We need order data. Let me create orders-v2 because I didn't know orders-processed already exists." With discovery, teams search for "orders" and find existing topics before creating duplicates.

Quality metrics measure data product health through data quality policies: schema conformance (what percentage of messages match registered schema?), completeness (are required fields always populated?), timeliness (how fresh is the data?), consumer satisfaction (are consumers experiencing issues?).

These metrics are reported to product owners, who are accountable for maintaining quality standards.

Product Thinking for Streaming Data

Product thinking means treating consumers as customers whose success depends on your data product.

Consumer-driven SLAs reflect consumer requirements. If downstream fraud detection needs real-time data (sub-second latency), the orders-created topic commits to p99 latency under 500ms. If analytics needs historical data (batch processing), the SLA emphasizes retention and completeness over latency.

Different consumers have different requirements. Product ownership means understanding those requirements and committing to meet them.

Versioning and deprecation communicate changes transparently. When schema evolution requires breaking changes, versioning provides migration paths. Schema v1 continues for legacy consumers while v2 becomes available for new consumers. Deprecation timelines give consumers months to migrate, not days.

Clear communication: "Schema v1 will be deprecated in Q3 2026. All consumers must migrate to v2 by August 1. Migration guide available here."

Usage analytics show how data products are consumed: which teams consume which topics, what their lag patterns look like, whether they're actively reading or subscriptions are stale. This informs product decisions: if 5 teams consume a topic, schema changes require coordination. If zero teams consume it, the topic might be deprecated.

Consumer feedback loops capture quality issues. When consumers encounter problems (malformed messages, unexpected nulls, late data), they file issues with the product team. The product team investigates, fixes root causes, and reports resolution.

This mimics software product support: customers report bugs, product teams fix them, quality improves over time.

Ownership Patterns and Application Catalog

Ownership at scale requires automation. Manually tracking "team X owns topics Y and Z" doesn't scale to hundreds of topics.

Pattern-based ownership uses naming conventions to assign ownership automatically: topics matching orders. are owned by the orders team, topics matching inventory. by the inventory team. When a new orders.shipment-confirmed topic is created, ownership applies automatically.

Application catalog defines which applications own which topics and what permissions they need. An application is a logical grouping: "orders-service" owns orders. topics, produces to them, and consumes from inventory. topics.

From application definitions, the platform auto-generates:

  • ACLs (orders-service can write to orders., read from inventory.)
  • Ownership metadata (orders team owns orders.*)
  • Service account mappings (orders-service-prod service account represents this application)

Application-level ownership scales better than topic-level because it groups related resources under single ownership.

SLA Monitoring and Reporting

SLAs are meaningless without measurement. Data product owners need dashboards showing: are we meeting SLA commitments?

Availability SLA measures producer success rate. If SLA commits to 99.9% availability, dashboards show actual availability over rolling 30 days. Falling below 99.9% triggers investigation.

Latency SLA measures p99 produce latency. If SLA commits to p99 under 100ms, dashboards track actual p99. Exceeding 100ms for sustained periods indicates SLA breach.

Freshness SLA measures time from event occurrence to message availability. For real-time topics, freshness might be seconds. For batch topics, hours. Dashboards show actual freshness vs. SLA.

Consumer lag SLA commits to maximum lag for real-time consumers. If SLA states "consumers should lag by less than 1000 messages," monitoring tracks consumer lag and alerts when exceeded.

SLA reports go to product owners monthly, showing compliance percentage. SLA breaches trigger root cause analysis and remediation.

Measuring Data Product Health

Health metrics extend beyond infrastructure (broker CPU, disk usage) to data quality and consumer satisfaction.

Schema conformance measures percentage of messages matching registered schema. Target: 100%. If 5% of messages fail schema validation, something is producing malformed data.

Completeness measures whether required fields are populated. If schema defines userId as required, what percentage of messages actually include it? Gaps indicate data quality issues.

Consumer satisfaction surveys consuming teams quarterly: does this data product meet your needs? Are SLAs appropriate? What improvements would help?

Low satisfaction indicates quality or documentation issues. Product teams use feedback to improve.

Topic sprawl measures how many similar topics exist. If three teams created orders-v1, orders-v2, and order-events to serve the same purpose, discovery failed. Consolidating duplicates improves quality and reduces maintenance burden.

Discovery and Collaboration

Data products become useful when teams can find and understand them.

Topic catalog provides search and browsing: search for "customer" and find all topics containing customer data. Filter by team (show me topics owned by analytics team), by schema type (Avro vs. JSON), or by consumer count (most-used topics).

Each catalog entry shows: topic name, description, owning team, schema version, consumer count, SLA commitments, sample messages.

Usage metrics show consumer activity: which teams consume this topic, what's their lag, when did they start consuming. This reveals usage patterns and helps coordinate changes.

Approval workflows for cross-team access formalize data sharing through partner zones. When team B wants to consume team A's data product, they request access. Team A reviews (is this appropriate use? Should we mask PII fields?), approves, and ACLs generate automatically.

The workflow creates audit trails showing who accessed data, when, and why. Compliance teams can report on data sharing without manual investigation.

Product Lifecycle Management

Data products have lifecycles: development, production, deprecation, retirement.

Development phase: Topic exists but isn't production-ready. Schema is unstable, SLAs aren't committed, consumers are limited to the owning team. This allows iteration without breaking downstream teams.

Production phase: Topic is stable, SLAs are committed, schema changes follow compatibility rules. Other teams can depend on this data product with confidence that breaking changes won't surprise them.

Deprecation phase: Topic is marked for retirement. New consumers are discouraged, existing consumers are notified to migrate to alternatives, deprecation timeline is communicated (typically 6+ months).

Retirement phase: Topic is deleted after all consumers have migrated. Ownership team confirms zero active consumers before deletion.

Lifecycle management prevents topics from living indefinitely without purpose. Unused topics accumulate, waste storage, and create maintenance burden.

The Path Forward

Kafka data products shift streaming data from infrastructure (topics that happen to exist) to products (intentionally designed, owned, and SLA-backed interfaces that teams depend on).

Conduktor enables data products through application catalogs that define ownership, topic discovery that prevents duplication, SLA monitoring that measures reliability, and approval workflows that formalize data sharing. Organizations report 3,500+ hours saved annually through reduced duplication and clear ownership.

If your "data products" are topics with READMEs but no ownership, SLAs, or discovery, the problem isn't Kafka—it's treating data as infrastructure instead of product.


Related: What is a Data Mesh? → · Kafka Data Sharing → · Federated Ownership →