# GDPR and Kafka: Right to Erasure

GDPR Article 17 gives users the right to demand deletion within 30 days. Kafka's append-only architecture makes traditional deletion impossible.

I've helped multiple companies navigate this conflict. The solutions exist, but each has tradeoffs you need to understand before implementation.

> *We thought GDPR would force us off Kafka. Crypto shredding let us keep our event-driven architecture while satisfying regulators.*
>
> *Compliance Engineer at a European bank*

## Why Kafka Makes Erasure Difficult

You cannot surgically remove records from a Kafka topic. Data exists in active segments, replicas, consumer state stores, downstream databases. Physical deletion would require rewriting segments, coordinating across replicas, invalidating offsets. Kafka doesn't support this.

## Solution 1: Short Retention

Configure retention below 30 days:

```properties
retention.ms=2419200000  # 28 days
```

After 28 days, all data is automatically purged. When a deletion request arrives, the data will be removed within the retention window.

**When this works:** Event streaming consumed quickly, real-time dashboards, staging topics.

**When this fails:** Audit requirements mandate longer retention, event sourcing needs full history.

## Solution 2: Tombstones on Compacted Topics

Compacted topics retain only the latest record per key. A tombstone (null value) signals deletion.

```properties
cleanup.policy=compact
delete.retention.ms=604800000  # 7 days - tombstones must remain visible for all consumers
```

**Why 7 days?** Consumers offline for longer than `delete.retention.ms` won't see the tombstone and won't delete their local state. 24 hours is too short for maintenance windows or holiday outages.

Publish a tombstone:

```java
producer.send(new ProducerRecord<>("users", "user123", null));
```

After compaction, the record is gone. Consumers must handle nulls:

```java
if (user == null) {
    userCache.remove(key);  // Tombstone: delete from local state
}
```

**Critical requirement:** Topics must be keyed by user ID. Random keys make tombstones useless.

## Solution 3: Crypto Shredding

Encrypt data per user. On deletion, destroy the key. The ciphertext remains but becomes meaningless. See [Conduktor's encryption guide](https://docs.conduktor.io/guide/tutorials/configure-encryption) for implementing field-level encryption without code changes.

```text
Master KEK (KMS) → User DEKs → Encrypted Records
                      ↓
              Delete on GDPR request
```

Encrypt on produce:

```java
SecretKey dek = keyStore.getOrCreateDek(userId);
byte[] ciphertext = encrypt(dek, plaintext);
```

Delete on erasure:

```java
keyStore.deleteDek(userId);  // Data becomes unreadable
auditLog.record("GDPR_ERASURE", userId, Instant.now());
```

GDPR requires making data "inaccessible." Destroying the key achieves this without modifying Kafka's immutable log.

**Cost consideration:** Cloud KMS charges per key. At scale, use derived keys from a master key + user ID to reduce costs.

## Solution 4: Separate PII from Analytics

```text
events-with-pii (28-day retention) → transform → events-anonymized (indefinite)
```

A stream processor strips or hashes PII:

```java
piiEvents
    .mapValues(event -> new AnonymizedEvent(
        hash(event.getUserId()),
        event.getCountry(),
        event.getTimestamp()
    ))
    .to("events-anonymized");
```

GDPR erasure only affects the PII topic. Analytics continue uninterrupted.

## Downstream Systems

Kafka is rarely the only place personal data lives. Deletion must cascade to:

- ksqlDB materialized views
- Databases via Kafka Connect
- Search indices
- Caches

**Pattern:** Produce to a `user-deletions` topic. All downstream systems consume and purge:

```java
producer.send(new ProducerRecord<>("user-deletions", userId, new DeletionEvent(userId)));
```

## Choosing a Strategy

| Scenario | Strategy |
|----------|----------|
| All topics < 28 days retention | Short retention |
| Topics keyed by user ID | Tombstones |
| Long retention required | Crypto shredding |
| PII mixed with analytics | Separate topics |

Most organizations use a combination: short retention for transient data, crypto shredding for sensitive data requiring long retention.

[Book a demo](https://www.conduktor.io/contact/demo) to see how Conduktor Gateway, a Kafka proxy, provides field-level encryption and crypto shredding without application changes.
