# Kafka vs Pulsar: Architecture Compared

**Apache Kafka** stores streams in a partitioned, replicated log on broker-local disks; brokers own both compute and storage. **Apache Pulsar** separates compute (brokers) from storage (Apache BookKeeper) and uses a segmented log model instead of a monolithic partition file. The core architectural difference: Kafka binds partitions to specific brokers; Pulsar brokers are stateless and storage is disaggregated across a BookKeeper cluster.

## TL;DR

| Dimension | Apache Kafka | Apache Pulsar |
|---|---|---|
| Storage model | Partitioned log on broker-local disk | Segmented log on Apache BookKeeper |
| Compute/storage coupling | Coupled (brokers own storage) | Disaggregated (stateless brokers + BookKeeper) |
| Partition rebalancing | Requires data movement | Instant (brokers are stateless) |
| Multi-tenancy | Manual (naming conventions, quotas) | Native (tenants, namespaces, topics) |
| Subscription models | Consumer groups (offset-based) | Exclusive, shared, failover, key-shared |
| Message queuing | No built-in queue semantics | Yes (shared subscription) |
| Tiered storage | Via plugins or Kafka 3.6+ KIP | Native |
| Geo-replication | MirrorMaker 2 or Confluent | Built-in (async replication) |
| License | Apache 2.0 | Apache 2.0 |
| Ecosystem | Very large | Smaller, growing |
| Operational complexity | Moderate (KRaft) | High (Kafka + BookKeeper + ZooKeeper*) |

*Pulsar historically depends on ZooKeeper for metadata. Newer versions support pluggable metadata backends (ZooKeeper, etcd, and others depending on deployment). Verify ZooKeeper requirements against your target Pulsar version.

## What is Apache Kafka?

[Apache Kafka](https://www.conduktor.io/glossary/apache-kafka) is a distributed event streaming platform using a partitioned append-only log. Each [partition](https://www.conduktor.io/glossary/kafka-partitions-explained) is stored as a sequence of segment files on the broker's local disk. [Brokers](https://www.conduktor.io/glossary/kafka-brokers-explained) hold both the data and serve reads and writes for their assigned partition leaders. Kafka 3.3+ uses KRaft (Raft-based consensus) for metadata, eliminating ZooKeeper. See [Understanding KRaft Mode in Kafka](https://www.conduktor.io/glossary/understanding-kraft-mode-in-kafka).

## What is Apache Pulsar?

Apache Pulsar is a cloud-native distributed messaging and streaming platform developed at Yahoo and open-sourced in 2016. It separates brokers (compute, protocol handling) from storage (Apache BookKeeper). Pulsar brokers are **stateless**: they own no persistent data. All data lives in BookKeeper, a distributed write-ahead log service that stores data as ledger entries across an ensemble of bookies (BookKeeper nodes).

Topics in Pulsar are composed of **segments** (BookKeeper ledgers). When a ledger fills, a new one is created. Brokers can be assigned any topic instantly because no data movement is required — the new broker simply points to the existing BookKeeper ledgers.

## Architecture compared

### Storage disaggregation

**Kafka**: Each partition is "owned" by a leader broker. That broker's local disk holds the partition data. Rebalancing a partition to a different broker requires copying gigabytes of data across the network. Scaling Kafka storage typically means adding brokers and triggering partition reassignment — an expensive operation.

**Pulsar**: Brokers are stateless. Adding a broker instantly gives it capacity to serve topics without any data movement. Scaling storage means adding BookKeeper bookies; scaling compute means adding Pulsar brokers. These dimensions scale independently.

The tradeoff: Pulsar's disaggregated architecture adds another layer of operational complexity. You run and monitor three tiers (brokers, bookies, ZooKeeper/metadata store) instead of Kafka's single broker tier.

### Segmented vs partitioned log

**Kafka's partitioned log**: A partition is a sequence of segment files. Segments are immutable once rolled. Oldest segments are deleted (or tiered) when retention limits are hit. Data is stored on broker-local SSD/HDD.

**Pulsar's segmented log (BookKeeper)**: A topic is a sequence of BookKeeper ledgers. Each ledger is replicated across an ensemble of bookies (configurable write quorum and ack quorum). Ledgers are sealed when a bookie fails or when the broker decides to roll. The broker tracks the ledger sequence in metadata. This model makes Pulsar naturally resilient to bookie failures without partition leadership handoff.

### Subscription models

Kafka's model: a consumer group assigns partitions to consumers. One consumer per partition within a group; ordering is guaranteed per partition.

Pulsar provides four subscription types:

- **Exclusive**: single consumer per subscription (like a dedicated consumer)
- **Shared**: multiple consumers share messages round-robin (like a queue / competing consumers)
- **Failover**: primary + standby consumers; standby takes over if primary disconnects
- **Key-shared**: messages with the same key always go to the same consumer (ordering per key, across consumers)

Pulsar's shared and key-shared subscriptions give it native message-queue semantics that Kafka lacks without application-level workarounds.

### Multi-tenancy

Kafka multi-tenancy is manual: teams use topic naming conventions (`team-a.orders`, `team-b.events`), quotas per client ID, and ACLs per topic. See [Multi-Tenancy in Kafka Environments](https://www.conduktor.io/glossary/multi-tenancy-in-kafka-environments).

Pulsar has **native multi-tenancy built into the data model**: `persistent://tenant/namespace/topic`. Tenants and namespaces are first-class objects with authentication, authorization, and resource quota policies. This makes Pulsar well-suited for SaaS platforms where strong isolation between tenants is required without naming-convention discipline.

### Geo-replication

Kafka uses MirrorMaker 2 or Confluent Replicator for cross-cluster replication. MirrorMaker 2 is operationally significant — it's a separate Kafka Connect cluster. See [Kafka MirrorMaker 2](https://www.conduktor.io/glossary/kafka-mirrormaker-2-for-cross-cluster-replication).

Pulsar includes async geo-replication as a built-in feature: namespaces are configured to replicate across geographically distributed Pulsar clusters. No external replication process is required.

## Operational trade-offs

**Kafka advantages:**
- Simpler operational model: a single process type (broker / controller) — no separate storage tier
- Mature ecosystem: Kafka Connect, ksqlDB, Schema Registry, MirrorMaker 2, Conduktor, and thousands of integrations tested in production
- KRaft mode eliminates ZooKeeper — Kafka's dependency stack is simpler today
- Dominant adoption: most job postings, most cloud managed offerings (MSK, Confluent, Aiven, Upstash), most training material

**Kafka disadvantages:**
- Partition rebalancing requires data movement — slow and I/O intensive
- Multi-tenancy requires convention and tooling rather than native isolation
- Tiered storage is newer (KIP-405 in Kafka 3.6+, still maturing) compared to Pulsar

**Pulsar advantages:**
- Independent scaling of compute and storage
- Instant broker scaling without data movement
- Native multi-tenancy with tenant/namespace isolation
- Built-in geo-replication and tiered storage
- Flexible subscription models covering both streaming and queuing use cases

**Pulsar disadvantages:**
- Significantly higher operational complexity: brokers + bookies + metadata store (ZooKeeper or equivalent)
- Smaller ecosystem: fewer connectors, less tooling, smaller community
- BookKeeper expertise is scarce and debugging bookie issues is non-trivial
- Higher base resource requirements for a functioning cluster (minimum 3 bookies + 3 brokers + 3 ZooKeeper nodes for production)

## When to choose Kafka

- Your team has existing Kafka expertise, tooling, and managed-service contracts
- You use managed Kafka (MSK, Confluent Cloud, Aiven) — all major clouds run Kafka, not Pulsar
- You need the broadest connector and integration ecosystem
- Your multi-tenancy requirements are manageable via quotas and ACLs (see [Kafka Quotas and Rate Limiting](https://www.conduktor.io/glossary/quotas-and-rate-limiting-in-kafka))
- You want a single-tier broker architecture with lower baseline operational complexity

## When to choose Pulsar

- You need **native multi-tenancy** as a first-class platform feature (SaaS product, shared infrastructure platform)
- Independent scaling of compute and storage is architecturally important (bursty compute, low-cost object-storage tiering from day one)
- You need both streaming (ordered log) and queuing (competing consumers) from a single system
- Built-in geo-replication without external replication infrastructure is a hard requirement
- Your team can invest in BookKeeper expertise

## Migration considerations

- **No wire compatibility**: Pulsar does not implement the Kafka protocol natively. Pulsar provides a Kafka-on-Pulsar (KoP) protocol handler that maps Kafka clients to Pulsar, but KoP coverage is incomplete — not all Kafka APIs are supported, and performance characteristics differ.
- **Data migration**: There is no live migration tooling between Kafka and Pulsar equivalent to MirrorMaker. Dual-write patterns (write to both, drain Kafka readers, cut over) are the typical approach.
- **Connector ecosystem gap**: Kafka Connect has thousands of connectors. Pulsar IO supports fewer; verify your specific sources and sinks before committing.
- **Consumer offset semantics**: Pulsar uses cursor-based positions (MessageId) rather than integer offsets, which affects consumer migration tooling.

See also: [Redpanda vs Kafka](https://www.conduktor.io/glossary/redpanda-vs-kafka) for a Kafka-compatible alternative with a simpler operational model.

**Is Pulsar better than Kafka?**

Neither is universally better. Pulsar's disaggregated storage excels for independent compute/storage scaling and native multi-tenancy. Kafka excels for ecosystem maturity, managed-service availability, and simpler operations. The right choice depends on your specific scaling requirements and team expertise.

**Can I use Kafka clients with Pulsar?**

Yes, via the Kafka-on-Pulsar (KoP) protocol handler. KoP translates Kafka protocol calls to Pulsar's native API. Coverage is incomplete — some Kafka APIs (certain transaction endpoints, admin APIs) may not work. KoP performance and compatibility require evaluation against your specific client version.

**Does Pulsar require ZooKeeper?**

Pulsar historically depends on ZooKeeper for metadata and coordination. Newer Pulsar versions support pluggable metadata backends (ZooKeeper, etcd, and others), but production maturity of ZooKeeper-free deployments varies by version. Verify ZooKeeper requirements against your target Pulsar release before assuming it can be eliminated.

**Why don't managed cloud services offer Pulsar?**

AWS, GCP, and Azure offer managed Kafka (MSK, HDInsight, Confluent Cloud). None offer managed Pulsar at the same tier. StreamNative offers managed Pulsar/StreamNative Cloud. The operational complexity of the multi-tier architecture makes managed Pulsar more expensive to run than managed Kafka, which reduces cloud provider investment incentive.
