Schema Registry Isn't Optional

The "we'll add it later" approach costs weeks of debugging and painful migrations. Schema Registry is day-one infrastructure.

Stéphane DerosiauxStéphane Derosiaux · July 13, 2024 ·
Schema Registry Isn't Optional

"We'll add Schema Registry later, once things stabilize."

It never happens. Six months in, you're debugging a production outage at 2 AM because someone changed a field type. Twelve months in, you're planning a multi-quarter migration to add schemas to 200 topics.

I've watched this pattern repeat at dozens of companies. Schema Registry isn't a nice-to-have. It's the difference between a platform you can trust and a ticking time bomb.

A producer team changed their serializer and kept producing to the same topic. Every consumer broke. We spent half a day figuring out what happened.

Engineer at ING Bank

The Poison Pill Problem

A poison pill is a message that crashes your consumer every time it tries to process it. The consumer fails, restarts, fetches the same message, fails again. The partition is blocked.

ERROR Failed to deserialize record at offset 847291
Caused by: org.apache.avro.AvroTypeException: Found string, expecting double

At 10x Banking, a Kafka consumer got stuck on a bad record. The partition offset never incremented. Users couldn't open bank accounts until the retention period expired and the record aged out.

Both teams had Avro schemas defined somewhere. But without Schema Registry enforcing compatibility, producers could publish anything.

The Excuses Don't Hold Up

"We're moving fast" — Schema Registry adds one HTTP call per unique schema, then caches. What actually slows you down: debugging serialization errors across 15 services because someone renamed a field.

"Our data is too dynamic" — If your data is truly dynamic, analytics can't query it and ML models can't train on it. "Dynamic" usually means "we haven't agreed on a contract yet."

"We'll add it when we scale" — Migration cost scales too. Adding schemas to 3 topics is a day. Adding schemas to 200 topics with existing consumers is a quarter of coordination.

What Actually Happens

Silent breaking changes: A producer adds a required field. Downstream consumers work fine—until they deploy and crash on the three-week-old change nobody remembered.

Schema drift: Multiple producers write to one topic. Over time, each drifts:

  • Producer A uses timestamp as epoch milliseconds
  • Producer B uses ISO-8601 string
  • Producer C adds a new field and deprecates timestamp

Consumers must handle all three formats.

Compliance nightmare: Auditors ask what personal data flows through your topics. Without schemas, you sample messages manually and hope you don't miss anything. Centralized schema management provides the audit trail regulators expect.

Schema Evolution Works

CompatibilityWhat You Can Do
BACKWARDDelete fields, add optional with defaults
FORWARDAdd fields, delete optional with defaults
FULLAdd/remove optional fields with defaults
// Version 1
{"name": "Order", "fields": [
  {"name": "orderId", "type": "string"},
  {"name": "amount", "type": "double"}
]}

// Version 2 (backward compatible)
{"name": "Order", "fields": [
  {"name": "orderId", "type": "string"},
  {"name": "amount", "type": "double"},
  {"name": "currency", "type": "string", "default": "USD"}
]}

Consumers using Version 1 still read Version 2 messages. They ignore the new field. No coordination required.

Setup Takes 15 Minutes

schema-registry:
  image: confluentinc/cp-schema-registry:7.6.0
  environment:
    SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS: 'broker:29092'
    SCHEMA_REGISTRY_LISTENERS: http://0.0.0.0:8081
props.put("schema.registry.url", "http://localhost:8081");
props.put("value.serializer", "io.confluent.kafka.serializers.KafkaAvroSerializer");

Your producer now validates schemas before sending. Breaking changes are rejected at produce time, not discovered at consume time.

The Migration Tax

Teams who skip Schema Registry early pay later:

  1. Audit all topics—what formats are actually in production?
  2. Reverse-engineer schemas from sampled messages
  3. Handle in-flight data without schema IDs
  4. Coordinate producer upgrades before consumers can enforce
  5. Fix schemas that evolved incompatibly

Teams at ING Bank and MyHeritage documented migrations spanning months. One team imposed deployment freezes during migration to prevent schema ID misalignment.

Compare that to starting right: schemas defined as part of service development, new topics get schemas by default, evolution controlled from day one.

Start with Schema Registry. Treat compatibility violations as build failures. Your future self, debugging at 2 AM, will thank you.

Book a demo to see how Conduktor Console provides schema management with compatibility checking and version comparison.