Debezium vs Airbyte: CDC Approaches

Stéphane Derosiaux July 2, 2026 10 min read

Debezium is an open-source log-based Change Data Capture (CDC) platform most commonly deployed as Kafka Connect source connectors — it captures every row-level change from database transaction logs and streams them as events to Kafka topics in real time. Airbyte is an open-source data integration platform that supports polling-based and (for selected database sources) log-based CDC capture, targeting batch or micro-batch data movement primarily to data warehouses and lakes. The core difference: Debezium is a real-time streaming CDC engine designed around Kafka; Airbyte is a broader ELT orchestration platform where CDC is a capture mechanism for incremental database syncs.

TL;DR

Dimension	Debezium	Airbyte
Primary use case	Real-time streaming CDC to Kafka	ELT data integration (batch, micro-batch, CDC)
CDC mechanism	Log-based (transaction log tailing)	Log-based CDC + polling (full refresh, incremental)
Delivery latency	Near real-time (sub-second to seconds)	Micro-batch (minutes) or batch (hours)
Output target	Kafka topics (via Kafka Connect)	Data warehouses, lakes, databases, SaaS tools
Kafka dependency	Mandatory for Kafka Connect deployment (also: Debezium Server for non-Kafka sinks)	None (standalone platform)
Deployment	Kafka Connect workers	Airbyte server + workers (Docker / K8s)
Connector count	~30 database sources	Hundreds of sources and destinations
License	Apache 2.0	Elastic License 2.0 (core open source)
Managed offering	Debezium Cloud (Red Hat)	Airbyte Cloud
State management	Kafka Connect offsets + database-specific (e.g., MySQL binlog position, PostgreSQL LSN)	Airbyte internal state store

What is Debezium?

Debezium is an open-source CDC platform built by Red Hat, designed to tail database transaction logs and emit row-level change events. The most common deployment is as a Kafka Connect source connector: each connector monitors one database instance and translates insert, update, and delete operations into structured events on Kafka topics. Debezium also supports standalone deployment via Debezium Server (which can route to non-Kafka sinks like Kinesis, Pub/Sub, or HTTP) and the embedded engine (library mode for custom applications). Supported databases include PostgreSQL (logical replication / pgoutput), MySQL (binlog), MongoDB (change streams), SQL Server (SQL Server Agent CDC), Oracle (LogMiner), and others.

Because Debezium reads the transaction log rather than polling tables, it captures every change — including deletes — with low latency and minimal database load. Change events are routed to per-table Kafka topics (e.g., dbserver1.public.orders), where downstream consumers (Flink, Kafka Streams, Sink connectors) pick them up.

See Implementing CDC with Debezium and What is Change Data Capture (CDC)? for deeper coverage.

What is Airbyte?

Airbyte is an open-source data integration platform that orchestrates ELT (Extract, Load, Transform) pipelines from hundreds of sources to dozens of destinations. Sources include databases, SaaS APIs (Stripe, Salesforce, GitHub), files (S3, GCS), and event streams. Destinations include data warehouses (Snowflake, BigQuery, Redshift, Databricks), databases, and files.

Airbyte supports two primary sync modes per stream:

Full refresh: extract all records every sync cycle
Incremental: extract records modified since the last sync

For supported database sources (PostgreSQL, MySQL, SQL Server, MongoDB, Oracle), incremental sync can be powered by CDC (log-based capture using an embedded Debezium instance) rather than cursor-based polling. CDC is a capture mechanism, not a separate sync mode — it enables higher-fidelity incremental syncs including deletes.

For database sources that support it, Airbyte's incremental CDC implementation uses an embedded Debezium instance within its connector runtimes — Airbyte manages the CDC lifecycle (snapshot, streaming, offset management) as part of the platform. Not all sources use Debezium; SaaS and API connectors use polling or webhook-based patterns.

Architecture compared

CDC mechanism

Debezium (log-based, always): Debezium exclusively uses database transaction logs. For PostgreSQL, it uses logical replication (pgoutput or decoderbufs plugin). For MySQL, it reads the binary log (binlog). For MongoDB, it uses change streams. This means:

Changes are captured as they are committed — sub-second latency
Deletes are captured (they appear as events in the log)
No polling queries hit the source database
The database must be configured to enable logical replication / binlog (requires database-level permissions)

Airbyte (log-based CDC + polling): Airbyte connectors can use multiple strategies depending on the source:

CDC mode (available for select connectors): embeds Debezium to read transaction logs, similar latency to native Debezium
Incremental cursor: queries WHERE updated_at > last_sync_cursor periodically — typically minutes to hours between syncs; deletes are NOT captured
Full refresh: reads entire table on each sync

For most Airbyte use cases, syncs run on a schedule (hourly, daily) rather than continuously. This is intentional — Airbyte is optimized for data warehouse loading where freshness in minutes is acceptable.

Kafka dependency

Debezium: Kafka Connect is not optional. Debezium connectors run inside a Kafka Connect cluster, which requires a running Kafka cluster for offset storage, schema registry (if using Avro), and event delivery. The output is Kafka topics. Downstream consumers must read from Kafka. This makes Debezium the right choice when your architecture already centers on Kafka.

Airbyte: Has no Kafka dependency. Sources connect directly to destinations. Airbyte can read from Kafka (Kafka source connector exists), but Kafka is not required for its operation. If your destination is a data warehouse and you don't have Kafka infrastructure, Airbyte is simpler to adopt.

Output targets

Debezium's output is Kafka topics. To load data into a database or data warehouse, you need a Kafka Connect sink connector (JDBC sink, Snowflake connector, BigQuery connector, etc.) or a stream processor (Flink, Kafka Streams) to transform and route the events. This is a multi-step pipeline: DB → Debezium → Kafka → Sink Connector → Destination.

Airbyte's output is a direct connection from source to destination. DB → Airbyte → Data Warehouse. Fewer moving parts for the warehouse-loading use case.

Connector ecosystem

Debezium: ~30 database sources, focused on relational and NoSQL databases that expose transaction logs. No SaaS connectors.

Airbyte: hundreds of sources and destinations, covering databases, SaaS APIs, files, and messaging systems (see the Airbyte connector catalog for current counts). Much broader coverage for data warehouse loading use cases.

State and ordering

Debezium tracks state as Kafka Connect offsets — database-specific positions (MySQL binlog file + position, PostgreSQL LSN, MongoDB resume token). Reconnecting after a gap resumes from the last committed offset.

Airbyte tracks its own sync state (cursor values, CDC offsets) in its internal metadata store. If Airbyte is restarted, it resumes from the last recorded state.

Both preserve event ordering within a partition/stream for their CDC modes. Debezium guarantees ordering within a Kafka partition (per table, per key). Airbyte's ordering guarantees depend on the destination's ingestion behavior.

Operational trade-offs

Debezium advantages:

True real-time streaming — sub-second latency for change events
Captures every change including deletes, schema changes (DDL events in some connectors)
Deep integration with Kafka ecosystem: Kafka Connect SMTs, Schema Registry, downstream Flink/Kafka Streams processing
No polling load on the source database
Apache 2.0 license, fully open source

Debezium disadvantages:

Requires Kafka infrastructure (Kafka Connect cluster, Kafka brokers)
Database configuration required (logical replication slots, binlog enabled, LogMiner access)
Connector configuration is complex: snapshot mode, replication slot management, schema history topics, heartbeat configuration
No built-in transformations or destination-aware routing — requires additional connector or processor

Airbyte advantages:

Hundreds of connectors covering databases, SaaS APIs, files (see Airbyte catalog for current count)
Simple UI for pipeline configuration — no Kafka expertise required
Direct source-to-destination without intermediate message bus
Dbt integration for in-warehouse transformations
Lower barrier to entry for teams without Kafka infrastructure

Airbyte disadvantages:

Batch/micro-batch oriented — minutes of latency minimum, often hours
CDC mode (when available) is Debezium-embedded but managed for batch delivery, not true streaming
Non-CDC modes miss deletes
Elastic License 2.0 (EL2) restricts offering Airbyte as a managed service

When to choose Debezium

You need real-time streaming of database changes (sub-second latency)
Your architecture already uses Kafka — you want changes flowing into Kafka topics for downstream processors
You need to capture deletes and schema changes reliably
You are building CDC for microservices or CDC for real-time data warehousing
You need the outbox pattern for reliable event publishing from transactional databases
Your team has Kafka operations expertise

When to choose Airbyte

You need to sync data to a data warehouse (Snowflake, BigQuery, Redshift) on a scheduled basis
You need SaaS source connectors (Salesforce, Stripe, GitHub, etc.) alongside database sources
You don't have Kafka infrastructure and don't want to build it
Minutes of latency is acceptable for your analytics use case
Your team wants a UI-driven pipeline configuration with minimal code

Can Debezium and Airbyte coexist?

Yes — they occupy different layers of a data architecture:

Use Debezium for real-time operational use cases: streaming CDC into Kafka, event-driven microservices, real-time analytics pipelines
Use Airbyte for batch ELT to data warehouses: historical loads, SaaS API ingestion, daily/hourly refreshes for BI

A common pattern: Debezium feeds Kafka (operational streaming tier) while Airbyte feeds the data warehouse (analytical batch tier). Both read the same source database but serve different consumers with different latency requirements.

Does Airbyte use Debezium?

For supported database sources (such as PostgreSQL and MySQL) configured with CDC-based incremental sync, Airbyte embeds Debezium within its connector runtime to read the transaction log. Airbyte manages the Debezium lifecycle; you configure Airbyte, not Debezium directly. SaaS connectors and non-database sources use polling or webhook patterns — not Debezium.

Can Debezium load data directly into Snowflake or BigQuery?

Not directly. Debezium outputs to Kafka topics. Loading into Snowflake or BigQuery requires a Kafka Connect sink connector for that destination (e.g., Snowflake Kafka Connector, BigQuery Kafka Connector). This multi-hop pipeline adds latency and operational components but enables real-time streaming into the warehouse, which batch tools like Airbyte cannot match.

Is Debezium reliable? It seems complex.

Debezium is production-proven at scale (Netflix, Shopify, Airbnb). The complexity is real: replication slots must be managed to prevent WAL bloat in PostgreSQL, schema evolution requires careful handling, and the initial snapshot of large tables must be managed. Tools like Conduktor can help manage Kafka Connect connectors including Debezium connector lifecycle.

What is the license difference between Debezium and Airbyte?

Debezium is Apache 2.0 — fully permissive, including for managed service use. Airbyte uses the Elastic License 2.0 (EL2) for most components, which prohibits third parties from offering Airbyte as a managed service. For internal use, both licenses are effectively permissive.

Conduktor Console: Manage Kafka Connect connectors with GitOps and one-click rollback. Explore Conduktor Console →