Consumer poll and heartbeat settings

How the Kafka consumer poll loop and poll timeout work, plus poll interval, heartbeat and session.timeout.ms settings that control throughput and stability.

Learn how to tune Kafka consumer settings for optimal performance

Kafka consumers use sophisticated polling and heartbeat mechanisms to efficiently fetch data while maintaining group membership. Understanding these settings is essential for building high-performance, reliable consumer applications.

What you'll learn:

How the consumer poll loop works internally
The relationship between heartbeat and poll threads
Key configuration settings and their impact
How to tune for throughput vs latency

Consumer poll behavior

Kafka consumers poll the Kafka broker to receive batches of data. Once the consumer is subscribed to Kafka topics, the poll loop handles all details of coordination, partition rebalances, heartbeats, and data fetching, leaving the developer with a clean API that simply returns available data from assigned partitions.

Internal code optimization
If the consumer successfully fetched some data from Kafka, it will start sending the next fetch requests ahead of time, so that while processing the current batch, there will be less waiting on the next .poll() call.

Kafka Consumer Poll Behavior

Polling allows consumers to control:

From where in the log they want to consume
How fast they want to consume
Ability to replay events

Internal poll thread and heartbeat thread

The way consumers maintain membership in a consumer group and ownership of partitions is by sending heartbeats to a Kafka broker designated as the group coordinator.

Kafka Consumer Internal Threads

Consumer internal threads: the poll thread running your code exchanges poll() calls and records with a broker, while a background heartbeat thread exchanges heartbeats and rebalance signals with the group coordinator

Kafka consumer heartbeat thread

Heartbeats help to determine consumer liveliness:

As long as the consumer sends heartbeats at regular intervals, it is assumed to be alive and processing messages
If the consumer stops sending heartbeats long enough, its session will time out and trigger a rebalance

Key configurations:

heartbeat.interval.ms (default: 3 seconds)
session.timeout.ms (Kafka v3.0+: 45 seconds)

This mechanism detects consumer application downtime or network failures.

Kafka consumer poll thread

Consumers poll brokers periodically using the .poll() method.

Key configuration:

max.poll.interval.ms (default: 5 minutes)

This controls the maximum time between calls to poll(). If this interval is exceeded, the consumer is considered failed and triggers a rebalance.

Important consumer settings

Poll behavior settings

Setting	Default	Description
`max.poll.records`	500	Maximum records returned in single `poll()`
`fetch.min.bytes`	1	Minimum data to return for a fetch request
`fetch.max.wait.ms`	500ms	Maximum wait time if insufficient data

max.poll.records

Lower values can improve latency but may reduce throughput
Higher values improve throughput but may increase processing time per batch

fetch.min.bytes

Setting higher values can improve throughput by reducing request overhead
May increase latency as consumer waits for more data

Session and heartbeat settings

Setting	Default (Kafka 3.0+)	Description
`session.timeout.ms`	45 seconds	Timeout for detecting consumer failures
`heartbeat.interval.ms`	3 seconds	Expected time between heartbeats
`max.poll.interval.ms`	5 minutes	Maximum delay between `poll()` calls

Heartbeat interval rule
Set heartbeat.interval.ms to approximately 1/3 of session.timeout.ms. This ensures the consumer sends enough heartbeats within the session timeout.

Performance tuning guidelines

For high throughput

# Maximize batch sizes and reduce overhead
max.poll.records=1000
fetch.min.bytes=1048576    # 1MB
fetch.max.wait.ms=1000

# Allow more time for processing large batches
max.poll.interval.ms=300000  # 5 minutes

For low latency

# Return data quickly, small batches
max.poll.records=100
fetch.min.bytes=1
fetch.max.wait.ms=100

# Shorter timeouts for faster rebalances
session.timeout.ms=10000     # 10 seconds
heartbeat.interval.ms=3000   # 3 seconds

For long processing times

# Accommodate slow processing without triggering rebalance
max.poll.interval.ms=600000  # 10 minutes
session.timeout.ms=60000     # 1 minute
heartbeat.interval.ms=20000  # 20 seconds

# Smaller batches to stay within poll interval
max.poll.records=100

Decision tree

Tuning decision tree: throughput priority leads to a high-throughput config (max.poll.records=1000, fetch.min.bytes=1MB), latency leads to a low-latency config (max.poll.records=100, fetch.min.bytes=1), and processing time leads to a long-processing config (max.poll.interval.ms=10min, max.poll.records=100), all converging on monitoring consumer lag

Best practices

Tune max.poll.records based on your processing time per message
Set max.poll.interval.ms higher than your worst-case processing time
Monitor consumer lag to ensure settings are appropriate
Test rebalance behavior under your expected load conditions
Consider batch processing patterns when setting poll configurations

Avoid blocking in poll loop
Never perform long-running operations in the thread that calls poll(). This can trigger unnecessary rebalances and degrade performance. Use separate worker threads for heavy processing.

Common anti-patterns

Anti-pattern	Problem	Solution
Blocking in poll loop	Triggers rebalance	Use async processing
`max.poll.interval.ms` too low	Constant rebalances	Increase or reduce batch size
Ignoring heartbeat settings	Slow failure detection	Tune for your SLAs
Same config for all consumers	Suboptimal performance	Tune per use case

See it in practice with Conduktor
Conduktor Console displays real-time consumer lag and rebalance events. Monitor how your configuration changes affect consumer performance and identify optimal settings for your workload.

Next steps

Configure auto offset reset for new consumers
Understand delivery semantics for reliable processing
Implement incremental rebalancing to reduce disruption