Kafka vs Pulsar: Architecture Compared

Stéphane Derosiaux May 23, 2026 9 min read

Apache Kafka stores streams in a partitioned, replicated log on broker-local disks; brokers own both compute and storage. Apache Pulsar separates compute (brokers) from storage (Apache BookKeeper) and uses a segmented log model instead of a monolithic partition file. The core architectural difference: Kafka binds partitions to specific brokers; Pulsar brokers are stateless and storage is disaggregated across a BookKeeper cluster.

TL;DR

DimensionApache KafkaApache Pulsar
Storage modelPartitioned log on broker-local diskSegmented log on Apache BookKeeper
Compute/storage couplingCoupled (brokers own storage)Disaggregated (stateless brokers + BookKeeper)
Partition rebalancingRequires data movementInstant (brokers are stateless)
Multi-tenancyManual (naming conventions, quotas)Native (tenants, namespaces, topics)
Subscription modelsConsumer groups (offset-based)Exclusive, shared, failover, key-shared
Message queuingNo built-in queue semanticsYes (shared subscription)
Tiered storageVia plugins or Kafka 3.6+ KIPNative
Geo-replicationMirrorMaker 2 or ConfluentBuilt-in (async replication)
LicenseApache 2.0Apache 2.0
EcosystemVery largeSmaller, growing
Operational complexityModerate (KRaft)High (Kafka + BookKeeper + ZooKeeper*)
*Pulsar historically depends on ZooKeeper for metadata. Newer versions support pluggable metadata backends (ZooKeeper, etcd, and others depending on deployment). Verify ZooKeeper requirements against your target Pulsar version.

What is Apache Kafka?

Apache Kafka is a distributed event streaming platform using a partitioned append-only log. Each partition is stored as a sequence of segment files on the broker's local disk. Brokers hold both the data and serve reads and writes for their assigned partition leaders. Kafka 3.3+ uses KRaft (Raft-based consensus) for metadata, eliminating ZooKeeper. See Understanding KRaft Mode in Kafka.

What is Apache Pulsar?

Apache Pulsar is a cloud-native distributed messaging and streaming platform developed at Yahoo and open-sourced in 2016. It separates brokers (compute, protocol handling) from storage (Apache BookKeeper). Pulsar brokers are stateless: they own no persistent data. All data lives in BookKeeper, a distributed write-ahead log service that stores data as ledger entries across an ensemble of bookies (BookKeeper nodes).

Topics in Pulsar are composed of segments (BookKeeper ledgers). When a ledger fills, a new one is created. Brokers can be assigned any topic instantly because no data movement is required — the new broker simply points to the existing BookKeeper ledgers.

Architecture compared

Storage disaggregation

Kafka: Each partition is "owned" by a leader broker. That broker's local disk holds the partition data. Rebalancing a partition to a different broker requires copying gigabytes of data across the network. Scaling Kafka storage typically means adding brokers and triggering partition reassignment — an expensive operation.

Pulsar: Brokers are stateless. Adding a broker instantly gives it capacity to serve topics without any data movement. Scaling storage means adding BookKeeper bookies; scaling compute means adding Pulsar brokers. These dimensions scale independently.

The tradeoff: Pulsar's disaggregated architecture adds another layer of operational complexity. You run and monitor three tiers (brokers, bookies, ZooKeeper/metadata store) instead of Kafka's single broker tier.

Segmented vs partitioned log

Kafka's partitioned log: A partition is a sequence of segment files. Segments are immutable once rolled. Oldest segments are deleted (or tiered) when retention limits are hit. Data is stored on broker-local SSD/HDD.

Pulsar's segmented log (BookKeeper): A topic is a sequence of BookKeeper ledgers. Each ledger is replicated across an ensemble of bookies (configurable write quorum and ack quorum). Ledgers are sealed when a bookie fails or when the broker decides to roll. The broker tracks the ledger sequence in metadata. This model makes Pulsar naturally resilient to bookie failures without partition leadership handoff.

Subscription models

Kafka's model: a consumer group assigns partitions to consumers. One consumer per partition within a group; ordering is guaranteed per partition.

Pulsar provides four subscription types:

  • Exclusive: single consumer per subscription (like a dedicated consumer)
  • Shared: multiple consumers share messages round-robin (like a queue / competing consumers)
  • Failover: primary + standby consumers; standby takes over if primary disconnects
  • Key-shared: messages with the same key always go to the same consumer (ordering per key, across consumers)

Pulsar's shared and key-shared subscriptions give it native message-queue semantics that Kafka lacks without application-level workarounds.

Multi-tenancy

Kafka multi-tenancy is manual: teams use topic naming conventions (team-a.orders, team-b.events), quotas per client ID, and ACLs per topic. See Multi-Tenancy in Kafka Environments.

Pulsar has native multi-tenancy built into the data model: persistent://tenant/namespace/topic. Tenants and namespaces are first-class objects with authentication, authorization, and resource quota policies. This makes Pulsar well-suited for SaaS platforms where strong isolation between tenants is required without naming-convention discipline.

Geo-replication

Kafka uses MirrorMaker 2 or Confluent Replicator for cross-cluster replication. MirrorMaker 2 is operationally significant — it's a separate Kafka Connect cluster. See Kafka MirrorMaker 2.

Pulsar includes async geo-replication as a built-in feature: namespaces are configured to replicate across geographically distributed Pulsar clusters. No external replication process is required.

Operational trade-offs

Kafka advantages:

  • Simpler operational model: a single process type (broker / controller) — no separate storage tier
  • Mature ecosystem: Kafka Connect, ksqlDB, Schema Registry, MirrorMaker 2, Conduktor, and thousands of integrations tested in production
  • KRaft mode eliminates ZooKeeper — Kafka's dependency stack is simpler today
  • Dominant adoption: most job postings, most cloud managed offerings (MSK, Confluent, Aiven, Upstash), most training material

Kafka disadvantages:

  • Partition rebalancing requires data movement — slow and I/O intensive
  • Multi-tenancy requires convention and tooling rather than native isolation
  • Tiered storage is newer (KIP-405 in Kafka 3.6+, still maturing) compared to Pulsar

Pulsar advantages:

  • Independent scaling of compute and storage
  • Instant broker scaling without data movement
  • Native multi-tenancy with tenant/namespace isolation
  • Built-in geo-replication and tiered storage
  • Flexible subscription models covering both streaming and queuing use cases

Pulsar disadvantages:

  • Significantly higher operational complexity: brokers + bookies + metadata store (ZooKeeper or equivalent)
  • Smaller ecosystem: fewer connectors, less tooling, smaller community
  • BookKeeper expertise is scarce and debugging bookie issues is non-trivial
  • Higher base resource requirements for a functioning cluster (minimum 3 bookies + 3 brokers + 3 ZooKeeper nodes for production)

When to choose Kafka

  • Your team has existing Kafka expertise, tooling, and managed-service contracts
  • You use managed Kafka (MSK, Confluent Cloud, Aiven) — all major clouds run Kafka, not Pulsar
  • You need the broadest connector and integration ecosystem
  • Your multi-tenancy requirements are manageable via quotas and ACLs (see Kafka Quotas and Rate Limiting)
  • You want a single-tier broker architecture with lower baseline operational complexity

When to choose Pulsar

  • You need native multi-tenancy as a first-class platform feature (SaaS product, shared infrastructure platform)
  • Independent scaling of compute and storage is architecturally important (bursty compute, low-cost object-storage tiering from day one)
  • You need both streaming (ordered log) and queuing (competing consumers) from a single system
  • Built-in geo-replication without external replication infrastructure is a hard requirement
  • Your team can invest in BookKeeper expertise

Migration considerations

  • No wire compatibility: Pulsar does not implement the Kafka protocol natively. Pulsar provides a Kafka-on-Pulsar (KoP) protocol handler that maps Kafka clients to Pulsar, but KoP coverage is incomplete — not all Kafka APIs are supported, and performance characteristics differ.
  • Data migration: There is no live migration tooling between Kafka and Pulsar equivalent to MirrorMaker. Dual-write patterns (write to both, drain Kafka readers, cut over) are the typical approach.
  • Connector ecosystem gap: Kafka Connect has thousands of connectors. Pulsar IO supports fewer; verify your specific sources and sinks before committing.
  • Consumer offset semantics: Pulsar uses cursor-based positions (MessageId) rather than integer offsets, which affects consumer migration tooling.

See also: Redpanda vs Kafka for a Kafka-compatible alternative with a simpler operational model.

Is Pulsar better than Kafka?

Neither is universally better. Pulsar's disaggregated storage excels for independent compute/storage scaling and native multi-tenancy. Kafka excels for ecosystem maturity, managed-service availability, and simpler operations. The right choice depends on your specific scaling requirements and team expertise.

Can I use Kafka clients with Pulsar?

Yes, via the Kafka-on-Pulsar (KoP) protocol handler. KoP translates Kafka protocol calls to Pulsar's native API. Coverage is incomplete — some Kafka APIs (certain transaction endpoints, admin APIs) may not work. KoP performance and compatibility require evaluation against your specific client version.

Does Pulsar require ZooKeeper?

Pulsar historically depends on ZooKeeper for metadata and coordination. Newer Pulsar versions support pluggable metadata backends (ZooKeeper, etcd, and others), but production maturity of ZooKeeper-free deployments varies by version. Verify ZooKeeper requirements against your target Pulsar release before assuming it can be eliminated.

Why don't managed cloud services offer Pulsar?

AWS, GCP, and Azure offer managed Kafka (MSK, HDInsight, Confluent Cloud). None offer managed Pulsar at the same tier. StreamNative offers managed Pulsar/StreamNative Cloud. The operational complexity of the multi-tier architecture makes managed Pulsar more expensive to run than managed Kafka, which reduces cloud provider investment incentive.

Conduktor Console: Free Kafka platform for teams. Install in 5 minutes. Explore Conduktor Console →