Strimzi on Kubernetes: From Zero to Production Kafka
Deploy production-grade Kafka on Kubernetes with Strimzi. Separate node pools, KRaft mode, and the configuration choices that matter.

Running Kafka on Kubernetes used to be a bad idea. Stateful workloads and container orchestration didn't mix well. Strimzi changed that.
I've deployed Strimzi clusters across AWS, GCP, and on-prem environments. The operator handles rolling upgrades, certificate rotation, and rack awareness automatically. You declare what you want, and Strimzi makes it happen.
We moved to Strimzi because we wanted Kafka to be as declarative as the rest of our infrastructure. GitOps for Kafka wasn't possible before.
Platform Engineer at a Fortune 500 retailer
Install the Operator
kubectl create namespace kafka
helm repo add strimzi https://strimzi.io/charts/
helm install strimzi strimzi/strimzi-kafka-operator \
--namespace kafka --set replicas=2 The replicas=2 gives you operator high availability.
The Two Resources That Matter
Strimzi 0.46+ runs Kafka in KRaft mode. No ZooKeeper. Two resources define your cluster:
| Resource | Purpose |
|---|---|
Kafka | Cluster-wide config: listeners, security, entity operator |
KafkaNodePool | Node groups: replicas, storage, roles, resources |
Production Configuration
apiVersion: kafka.strimzi.io/v1
kind: KafkaNodePool
metadata:
name: controllers
labels:
strimzi.io/cluster: prod-cluster
spec:
replicas: 3
roles: [controller]
storage:
type: jbod
volumes:
- id: 0
type: persistent-claim
size: 10Gi
class: fast-ssd
resources:
requests: { memory: 2Gi, cpu: "1" }
---
apiVersion: kafka.strimzi.io/v1
kind: KafkaNodePool
metadata:
name: brokers
labels:
strimzi.io/cluster: prod-cluster
spec:
replicas: 3
roles: [broker]
storage:
type: jbod
volumes:
- id: 0
type: persistent-claim
size: 500Gi
class: fast-ssd
resources:
requests: { memory: 8Gi, cpu: "2" }
jvmOptions:
-Xms: 4096m
-Xmx: 4096m JVM heap rule: Set heap to 25-50% of container memory. Kafka relies on OS page cache.
Storage: The Critical Decision
Kafka requires low-latency block storage. NFS and EFS are not recommended due to high latency and potential consistency issues under load.
| Cloud | Storage Class |
|---|---|
| AWS | gp3, io2 |
| GCP | pd-ssd |
| Azure | managed-premium |
deleteClaim: false. You don't want kubectl delete kafka to wipe your data. External Access
listeners:
- name: external
port: 9094
type: loadbalancer
tls: true
authentication:
type: scram-sha-512 Cost note: A 3-broker cluster creates 4 load balancers (1 bootstrap + 3 per-broker). That's $60/month on AWS. Use NodePort for cost-sensitive environments.
User and Topic Management
The Entity Operator manages topics and users as Kubernetes resources:
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaUser
metadata:
name: app-producer
labels:
strimzi.io/cluster: prod-cluster
spec:
authentication:
type: scram-sha-512
authorization:
type: simple
acls:
- resource: { type: topic, name: events, patternType: literal }
operations: [Write, Describe] Get the password: kubectl get secret app-producer -o jsonpath='{.data.password}' | base64 -d
Monitoring
Key metrics to alert on:
| Metric | Critical Threshold |
|---|---|
kafka_server_replicamanager_underreplicatedpartitions | > 0 for 5 minutes |
kafka_controller_kafkacontroller_offlinepartitionscount | > 0 |
Upgrades
Kafka version upgrades are one YAML change:
spec:
kafka:
version: 3.9.0 # Changed from 3.8.0 The operator handles rolling upgrades: brokers first, then follower controllers, then the active controller last. Upgrade Strimzi operator first, then Kafka.
Pod Disruption Budget
Kubernetes cluster upgrades drain nodes, which can evict multiple brokers simultaneously. Add to your KafkaNodePool:
template:
pod:
podDisruptionBudget:
maxUnavailable: 1 This ensures node drains wait for one broker to restart before evicting the next.
Common Issues
Pods stuck in Pending: Storage class doesn't exist or can't provision. Check kubectl get pvc -n kafka.
Connection refused from external clients: Verify LoadBalancer provisioning with kubectl get svc -n kafka.
Topics not created: Check Entity Operator logs and verify strimzi.io/cluster label matches.
Strimzi makes Kafka deployment repeatable and version-controlled. The operational complexity of rolling upgrades, certificate rotation, and config changes is managed by software that's better at it than manual runbooks.
Book a demo to see how Conduktor Console provides unified visibility across your Strimzi clusters, MSK, and Confluent environments.