# Transactional Outbox: Database-Kafka Consistency

I've debugged this bug more times than I can count: database commits, Kafka write fails, systems are now inconsistent. The order exists in the DB but downstream services never know.

This is the dual-write problem. The transactional outbox pattern is the standard solution.

> *We lost 6 hours of orders because Kafka was down and the database kept committing. With outbox, the database acts as a buffer. We deployed during a Kafka outage and didn't lose a single event.*
>
> *Staff Engineer at an e-commerce platform*

## The Dual-Write Problem

```java
// BROKEN: This code has a consistency bug
public void createOrder(Order order) {
    orderRepository.save(order);  // Database commits
    kafkaTemplate.send("orders", order.getId(), event);  // Kafka fails
    // Order exists but no event published
}
```

Three failure scenarios break this:
1. Kafka write fails after database commit
2. Application crashes between writes
3. Kafka succeeds but database rolls back

Wrapping in `@Transactional` doesn't help—Kafka isn't part of that transaction.

## How the Outbox Pattern Works

Instead of writing to Kafka directly, write the event to an outbox table in the same database transaction. A separate process relays to Kafka.

```
┌─────────────────────────────────────────┐
│       Single Database Transaction        │
│  ┌─────────────┐    ┌─────────────────┐ │
│  │ orders      │    │ outbox          │ │
│  │ INSERT      │    │ INSERT event    │ │
│  └─────────────┘    └─────────────────┘ │
└─────────────────────────────────────────┘
                        │
                        ▼ Relay (CDC or Polling)
                      Kafka
```

Both writes succeed or both fail. The relay handles Kafka delivery asynchronously.

## Setting Up the Outbox Table

```sql
CREATE TABLE outbox (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    aggregate_type VARCHAR(255) NOT NULL,
    aggregate_id VARCHAR(255) NOT NULL,
    event_type VARCHAR(255) NOT NULL,
    payload JSONB NOT NULL,
    created_at TIMESTAMP NOT NULL DEFAULT NOW()
);
```

| Column | Purpose |
|--------|---------|
| `aggregate_type` | Routes to Kafka topic |
| `aggregate_id` | Becomes the message key |
| `payload` | Event data as JSON |

Write both in the same transaction:

```java
@Transactional
public void createOrder(Order order) {
    orderRepository.save(order);
    outboxRepository.save(new OutboxEntry("Order", order.getId(),
        "OrderCreated", toJson(order)));
}
```

## Polling vs CDC

**Polling:** Background process queries outbox, publishes to Kafka, deletes rows. Simple, works with any database. Adds 100ms+ latency.

**CDC (Debezium):** Tails the database transaction log. Near-realtime (milliseconds). More infrastructure.

Start with polling for prototypes. Use CDC for production with latency requirements under 1 second.

## Debezium Outbox Setup

Enable logical replication in PostgreSQL:

```properties
wal_level = logical
max_replication_slots = 4
```

Deploy the Debezium connector with the EventRouter transform. [Kafka Connect monitoring](https://docs.conduktor.io/guide/manage-kafka/kafka-resources/kafka-connect) helps track connector status and detect relay failures early.

```json
{
  "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
  "table.include.list": "public.outbox",
  "transforms": "outbox",
  "transforms.outbox.type": "io.debezium.transforms.outbox.EventRouter",
  "transforms.outbox.route.by.field": "aggregate_type",
  "transforms.outbox.table.field.event.key": "aggregate_id"
}
```

The EventRouter routes messages to topics based on `aggregate_type` and uses `aggregate_id` as the key.

## Handling Duplicates

The outbox pattern provides at-least-once delivery. The relay might crash after publishing but before deleting the row. On restart, it publishes again.

**Consumers must be idempotent:**

```java
public void handleOrderEvent(OrderCreatedEvent event) {
    jdbcTemplate.update("""
        INSERT INTO order_projections (order_id, amount, status)
        VALUES (?, ?, ?)
        ON CONFLICT (order_id) DO NOTHING
        """, event.getOrderId(), event.getAmount(), "CREATED");
}
```

Or track processed event IDs and skip duplicates.

## Outbox Table Cleanup

The table grows unbounded. Options:

**Delete after publish (polling):** Delete rows after successful Kafka send.

**Partition by time (high volume):**

```sql
CREATE TABLE outbox (...) PARTITION BY RANGE (created_at);
-- Drop entire partitions: instant, no row-by-row delete
DROP TABLE outbox_2026_01;
```

## Common Failures

**"Replication slot already exists"** — Another Debezium instance holds the slot. Check `pg_replication_slots` and drop if needed.

**Events out of order** — Different `aggregate_id` values go to different partitions. Ensure all events for an aggregate use the same key.

**High latency (polling)** — Reduce poll interval, but watch database load.

## Alternatives

**Listen to yourself:** Write to Kafka first, update database by consuming your own event. Tradeoff: API call fails if Kafka is down.

**Kafka transactions:** Only works for Kafka-to-Kafka. Doesn't solve database + Kafka consistency.

The outbox pattern trades latency for consistency. Events arrive milliseconds to seconds after the commit, but they always arrive.

[Book a demo](https://www.conduktor.io/contact/demo) to see how Conduktor Console gives you visibility into topic schemas, consumer lag, and message contents across all your Kafka clusters.
