# Kafka consumers

*Learn how Kafka consumers read data from topics*

Consumers are applications that read data from Kafka topics. Understanding how consumers work, including deserialization and the pull model, is essential for building reliable data processing applications.

**What you'll learn:**
- How consumers read messages from Kafka topics
- How message deserialization works
- The consumer pull model and its benefits
- Best practices for message format compatibility

## Kafka consumers

Once a topic has been created in Kafka and data has been placed in the topic, we can start to build applications that make use of this data stream. Applications that pull event data from one or more Kafka topics are known as Kafka consumers.

Applications that read data from Kafka topics are known as consumers. Applications integrate a Kafka client library to read from Apache Kafka. Excellent client libraries exist for almost [all programming languages](https://www.conduktor.io/kafka/kafka-sdk-list) that are popular today including Python, Java, Go, and others.

Consumers can read from one or more partitions at a time in Apache Kafka, and data is read in order **within each partition** as shown below.

![Kafka consumers in this diagram are reading messages from various Apache Kafka Brokers and Topics.](https://www.conduktor.io/assets/kafka/Kafka-Consumers-1.png)

## How consumers read data

A consumer always reads data from a lower offset to a higher offset and cannot read data backwards (due to how Apache Kafka and clients are implemented).

If the consumer consumes data from more than one partition, the message order is not guaranteed across multiple partitions because they are consumed simultaneously, but the message read order is still guaranteed within each individual partition.

By default, Kafka consumers will only consume data that was produced after it first connected to Kafka. Which means that to read historical data in Kafka, one has to specify it as an input to the command, as we will see in the practice section.

### Consumer pull model

Kafka consumers are also known to implement a "pull model". This means that Kafka consumers have to request data from Kafka brokers in order to get it (instead of having Kafka brokers continuously push data to consumers). This implementation was made so that consumers can control the speed at which the topics are being consumed.

![Sequence diagram of the Kafka consumer pull model: the consumer calls poll() to request messages, the broker returns a batch, the consumer processes it, then polls again for the next batch.](https://www.conduktor.io/assets/kafka/diagrams/kafka-consumers.svg)

```mermaid
sequenceDiagram
    participant Consumer
    participant Broker

    Consumer->>Broker: poll() - request messages
    Broker-->>Consumer: Return batch of messages
    Note over Consumer: Process messages
    Consumer->>Broker: poll() - request more
    Broker-->>Consumer: Return next batch
```

**Benefits of the pull model:**
- Consumers control their own consumption rate
- Slow consumers don't affect broker performance
- Consumers can batch process messages efficiently
- Natural backpressure handling

## Kafka message deserializers

> Consumed data has to be deserialized in the same format it was serialized in.

As we have seen before, the data sent by the Kafka producers is [serialized](https://www.conduktor.io/kafka/kafka-producers). This means that the data received by the Kafka consumers has to be correctly deserialized in order to be useful within your application.

Data being consumed has to be deserialized in the same format it was serialized in. For example:

- if the producer serialized a `String` using `StringSerializer`, the consumer has to deserialize it using `StringDeserializer`
- if the producer serialized an `Integer` using `IntegerSerializer`, the consumer has to deserialize it using `IntegerDeserializer`

![Kafka Consumers must use the same format for deserialization that was used by the producer when serializing the message. This diagram shows the deserialization process.](https://www.conduktor.io/assets/kafka/Kafka-Consumers-2.png)

### Serialization compatibility

The serialization and deserialization format of a topic should not change during a topic lifecycle. If you intend to switch a topic data format (for example from JSON to Avro), it is considered best practice to create a new topic and migrate your applications to leverage that new topic.

> **Poison pills**
> Messages sent to a Kafka topic that do not respect the agreed-upon serialization format are called **poison pills**. [They are not fun to deal with.](https://www.slideshare.net/ConfluentInc/streaming-apps-and-poison-pills-handle-the-unexpected-with-kafka-streams-loic-divad-xebia-france-kafka-summit-sf-2019)

Failure to correctly deserialize may cause crashes or inconsistent data being fed to the downstream processing applications. This can be tough to debug, so it is best to think about it as you're writing your code the first time.

### Handling deserialization errors

| Strategy | When to use |
|----------|-------------|
| Fail fast | Development, testing |
| Log and skip | Non-critical data, metrics |
| Dead letter queue | Production, data recovery needed |
| Schema validation | Prevent bad data at producer |

> **See it in practice with Conduktor**
> [Conduktor Console](https://docs.conduktor.io/guide/manage-kafka/kafka-resources/topics) lets you consume and browse messages from topics directly in the UI. View message keys, values, headers, and timestamps with automatic deserialization support for common formats.

## Next steps

- [Scale with consumer groups and offsets](https://www.conduktor.io/kafka/kafka-consumer-groups-and-consumer-offsets) to understand parallel consumption and progress tracking
- [Explore delivery semantics](https://www.conduktor.io/kafka/delivery-semantics-for-kafka-consumers) for exactly-once processing
- [Write a Java consumer](https://www.conduktor.io/kafka/complete-kafka-consumer-with-java) with hands-on code examples
