# Consumer auto offset reset behavior

*Learn how to configure consumer offset reset behavior*

When a consumer starts without committed offsets, or when committed offsets are invalid, Kafka needs to know where to start reading. The `auto.offset.reset` configuration controls this behavior and is critical for understanding data processing guarantees.

**What you'll learn:**
- The three auto offset reset options and when each is triggered
- How to choose the right setting for your use case
- Best practices for production and development environments
- How to handle offset reset scenarios programmatically

## Auto offset reset options

When a Kafka consumer starts and there are no committed offsets for its consumer group, or when the committed offset is no longer valid (e.g., because the data has been deleted), the consumer needs to decide where to start reading from. This behavior is controlled by the `auto.offset.reset` configuration.

### earliest
```properties
auto.offset.reset=earliest
```
- Consumer will start reading from the beginning of the partition
- Reads all available messages from the earliest available offset
- Useful for reprocessing all historical data
- **Use case**: Data migration, audit requirements, complete reprocessing

### latest (default)
```properties
auto.offset.reset=latest
```
- Consumer will start reading from the end of the partition
- Only processes new messages produced after the consumer starts
- **Use case**: Real-time processing where historical data is not needed

### none
```properties
auto.offset.reset=none
```
- Consumer throws an exception if no previous offset is found
- Forces explicit offset management
- **Use case**: Strict control over consumer behavior, prevents accidental data loss or reprocessing

### Decision guide

![Decision flowchart for choosing auto.offset.reset: need historical data leads to earliest, strict offset control leads to none, otherwise latest, each with its risk](https://www.conduktor.io/assets/kafka/diagrams/consumer-auto-offsets-reset-behavior.svg)

```mermaid
flowchart TD
    Start["New consumer<br/>or invalid offset"] --> Q1{"Need historical<br/>data?"}

    Q1 -->|Yes| Earliest["earliest<br/>Read from beginning"]
    Q1 -->|No| Q2{"Strict offset<br/>control needed?"}

    Q2 -->|Yes| None["none<br/>Throw exception"]
    Q2 -->|No| Latest["latest<br/>Read new messages only"]

    Earliest --> Risk1["⚠️ Risk: May reprocess<br/>large amounts of data"]
    Latest --> Risk2["⚠️ Risk: May miss<br/>messages produced while down"]
    None --> Risk3["⚠️ Requires: Explicit<br/>error handling in code"]
```

## When auto offset reset is triggered

The `auto.offset.reset` behavior is triggered in these scenarios:

| Scenario | Description | Example |
|----------|-------------|---------|
| **New consumer group** | First time a consumer group subscribes to a topic | Deploying a new application |
| **Invalid offset** | Committed offset no longer exists (data deleted due to retention) | Consumer offline longer than retention period |
| **Offset out of range** | Committed offset is beyond the current log boundaries | Log truncation or corruption |

## Common scenarios

### Scenario 1: New consumer group
```java
// First time this consumer group runs
Properties props = new Properties();
props.put("group.id", "new-consumer-group");
props.put("auto.offset.reset", "earliest"); // Will read from beginning
```

### Scenario 2: Data retention cleanup
```java
// Consumer was offline for too long, committed offset expired
// Behavior depends on auto.offset.reset setting
Properties props = new Properties();
props.put("group.id", "existing-group");
props.put("auto.offset.reset", "latest"); // Will skip to latest
```

## Best practices

### For production systems
```properties
# Be explicit about offset reset behavior
auto.offset.reset=latest

# Enable offset commits
enable.auto.commit=true
auto.commit.interval.ms=5000
```

### For development/testing
```properties
# Often want to reprocess data
auto.offset.reset=earliest

# May want manual control
enable.auto.commit=false
```

### For critical data processing
```properties
# Prevent accidental data loss or reprocessing
auto.offset.reset=none

# Handle exceptions explicitly in code
```

## Error handling example

```java
Properties props = new Properties();
props.put("auto.offset.reset", "none");

KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);

try {
    consumer.subscribe(Arrays.asList("my-topic"));
    while (true) {
        ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(1000));
        // Process records
    }
} catch (NoOffsetForPartitionException e) {
    // Handle case where no valid offset exists
    // Decide whether to seek to beginning or end
    consumer.seekToBeginning(consumer.assignment());
    // or consumer.seekToEnd(consumer.assignment());
}
```

## Offset management strategies

### Automatic offset management
- Use `enable.auto.commit=true`
- Set appropriate `auto.commit.interval.ms`
- Choose suitable `auto.offset.reset` policy

### Manual offset management
- Use `enable.auto.commit=false`
- Call `commitSync()` or `commitAsync()` after processing
- Handle offset reset scenarios explicitly

### External offset storage
- Store offsets in external systems (database, file system)
- Use `seek()` methods to position consumer
- Implement custom offset management logic

> **Data loss vs duplication**
> - `auto.offset.reset=latest` can cause data loss if messages arrive while consumer is down
> - `auto.offset.reset=earliest` can cause message duplication if consumer group is recreated
> - `auto.offset.reset=none` requires explicit error handling but provides the most control

## Configuration recommendations

| Use case | auto.offset.reset | enable.auto.commit | Notes |
|----------|------------------|-------------------|-------|
| High-throughput | `latest` | `true` | Accept potential data loss for speed |
| Critical data | `none` | `false` | Manual control, handle exceptions |
| Replay scenarios | `earliest` | `false` | Process all historical data |
| Development | `earliest` | `true` | Easy testing with full data |

> **See it in practice with Conduktor**
> [Conduktor Console](https://docs.conduktor.io/guide/monitor-brokers-apps) lets you monitor consumer group offsets and lag in real-time. Identify when offset resets occur and track consumer position across partitions to validate your offset management strategy.

## Next steps

- [Read from the closest replica](https://www.conduktor.io/kafka/consumer-read-from-closest-replica) to cut cross-datacenter latency
- [Understand delivery semantics](https://www.conduktor.io/kafka/delivery-semantics-for-kafka-consumers) for reliable processing
- [Configure consumer settings](https://www.conduktor.io/kafka/kafka-consumer-important-settings-poll-and-internal-threads-behavior) for optimal performance
