Last mod: 2026.03.05

Kafka on Kubernetes

Installing Apache Kafka 4.0 on Kubernetes with KRaft, Strimzi, Prometheus, Grafana and OTel.

Prerequisites

We need a configured Kubernetes cluster according to the instructions described here. In the example, the hosts are: 192.168.3.22 (msi), 192.168.3.23 (hp), and 192.168.3.25 (x510). Naturally, other hosts can be used, but in that case the scripts must be modified accordingly.

Download

Download package kafka-k8s.tar.gz and unpack:

wget https://dziak.tech/content/DevOps/kafka_on_kubernetes/downloads/kafka-k8s.tar.gz
tar -xzvf kafka-k8s.tar.gz

Cluster topology

Role Name IP CPU RAM Disk
Worker node msi 192.168.3.22 i7 32 GB SSD 250 GB
Worker node hp 192.168.3.23 i5 32 GB SSD 250 GB
Control plane x510 192.168.3.24 i5 8 GB SSD 120 GB + HDD 900 GB (NFS)


Shared NFS directory: /mnt/nfs-k8s exported by x510.


Node Preparation (perform on ALL nodes)

1. NFS Configuration on the control-plane (x510)

# On X510 – NFS server installation
sudo apt install -y nfs-kernel-server

# Add export
echo "/mnt/nfs-k8s  192.168.3.0/24(rw,sync,no_subtree_check,no_root_squash)" \
  | sudo tee -a /etc/exports

sudo mkdir -p /mnt/nfs-k8s
sudo chmod 777 /mnt/nfs-k8s
sudo exportfs -ra
sudo systemctl enable --now nfs-kernel-server

2. NFS client on worker nodes (hp, msi)

# On hp and msi
sudo apt install -y nfs-common
sudo mkdir -p /mnt/nfs-k8s

# Mounting Test (Optional)
sudo mount -t nfs 192.168.3.24:/mnt/nfs-k8s /mnt/nfs-k8s
df -h /mnt/nfs-k8s
sudo umount /mnt/nfs-k8s

3. Node labels (on control-plane, z kubectl)

kubectl label node x510 kubernetes.io/hostname=x510
kubectl label node hp   kubernetes.io/hostname=hp
kubectl label node msi  kubernetes.io/hostname=msi

4. Taint control-plane (optional – already set by default)

Strimzi and other workloads are deployed to hp and msi. The Kafka controller can run on x510 - if the control-plane has a NoSchedule taint, remove it temporarily during installation or allow the controller to run only on the worker nodes.

# Check taints
kubectl describe node x510 | grep Taint

# Remove taint if needed (for Kafka controller on X510):
kubectl taint nodes x510 node-role.kubernetes.io/control-plane:NoSchedule-

Component installation

Step 1 – Namespaces
kubectl apply -f 00-namespaces/namespaces.yaml
Step 2 – NFS StorageClass and Provisioner
kubectl apply -f 01-nfs/nfs-provisioner.yaml

# Verification
kubectl get pods -n kube-system | grep nfs
kubectl get storageclass
Step 3 – Strimzi Operator

Download and install the Strimzi Cluster Operator (version 0.41.x supports Kafka 4.x):

STRIMZI_VER="0.41.0"
kubectl create -f \
  "https://github.com/strimzi/strimzi-kafka-operator/releases/download/${STRIMZI_VER}/strimzi-cluster-operator-${STRIMZI_VER}.yaml" \
  -n kafka

# or a shorter alias:
kubectl apply -f "https://strimzi.io/install/latest?namespace=kafka" -n kafka

# Wait for readiness
kubectl rollout status deployment/strimzi-cluster-operator -n kafka --timeout=180s
kubectl get pods -n kafka
Step 4 – Kafka 4.x with KRaft (ZooKeeper-less)
# ConfigMap with JMX Metrics
kubectl apply -f 03-kafka/kafka-metrics-configmap.yaml

# Kafka Cluster (KafkaNodePool + Kafka CR)
kubectl apply -f 03-kafka/kafka-cluster.yaml

# Monitor status – this may take 3–5 minutes
kubectl get kafka -n kafka -w
kubectl get kafkanodepool -n kafka
kubectl get pods -n kafka -w

# Ready when:
kubectl wait kafka/kafka-cluster --for=condition=Ready --timeout=600s -n kafka
Step 5 – Kafka Topics
kubectl apply -f 03-kafka/kafka-topics.yaml
kubectl get kafkatopic -n kafka
Step 6 – Prometheus
kubectl apply -f 04-prometheus/prometheus.yaml

# Verification
kubectl rollout status deployment/prometheus -n monitoring
kubectl get svc prometheus -n monitoring
# Access: http://192.168.3.22:30090

Optional (if you have Prometheus Operator / kube-prometheus-stack):

kubectl apply -f 04-prometheus/strimzi-podmonitor.yaml
Step 7 – Grafana
kubectl apply -f 05-grafana/grafana.yaml
kubectl rollout status deployment/grafana -n monitoring
# Access: http://192.168.3.22:30300
# Login: admin / kafka-admin-2024

Import Kafka dashboards for Strimzi:

In Grafana → Dashboards → Import → paste the ID or URL:

  • Kafka Overview: https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/main/examples/metrics/grafana-dashboards/strimzi-kafka.json
  • KRaft / ZooKeeper: https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/main/examples/metrics/grafana-dashboards/strimzi-kraft.json
  • Kafka Operator: https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/main/examples/metrics/grafana-dashboards/strimzi-operators.json

Or via kubectl:

# Download Strimzi dashboards
curl -sL \
  https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/main/examples/metrics/grafana-dashboards/strimzi-kafka.json \
  -o /tmp/strimzi-kafka.json

kubectl create configmap grafana-kafka-dashboard \
  --from-file=strimzi-kafka.json=/tmp/strimzi-kafka.json \
  -n monitoring
Step 8 – OpenTelemetry collector
kubectl apply -f 06-otel/otel-collector.yaml
kubectl rollout status deployment/otel-collector -n otel

# Verification
kubectl get pods -n otel
kubectl logs -n otel -l app=otel-collector --tail=20

Post-installation verification

Status of all components

kubectl get pods -n kafka
kubectl get pods -n monitoring
kubectl get pods -n otel
kubectl get pvc -A

Kafka test

# Producer (send a test message)
kubectl run kafka-producer -it --rm \
  --image=quay.io/strimzi/kafka:latest-kafka-4.0.0 \
  --restart=Never \
  -n kafka \
  -- bin/kafka-console-producer.sh \
     --bootstrap-server kafka-cluster-kafka-bootstrap:9092 \
     --topic test-topic

# Consumer (receive message)
kubectl run kafka-consumer -it --rm \
  --image=quay.io/strimzi/kafka:latest-kafka-4.0.0 \
  --restart=Never \
  -n kafka \
  -- bin/kafka-console-consumer.sh \
     --bootstrap-server kafka-cluster-kafka-bootstrap:9092 \
     --topic test-topic \
     --from-beginning

# Information about the KRaft Cluster
kubectl run kafka-admin -it --rm \
  --image=quay.io/strimzi/kafka:latest-kafka-4.0.0 \
  --restart=Never \
  -n kafka \
  -- bin/kafka-metadata-quorum.sh \
     --bootstrap-server kafka-cluster-kafka-bootstrap:9092 \
     describe --status

OTel test – sending a sample trace

# From any node in the 192.168.3.x network
curl -X POST http://192.168.3.22:30318/v1/traces \
  -H "Content-Type: application/json" \
  -d '{
    "resourceSpans": [{
      "resource": {"attributes": [{"key":"service.name","value":{"stringValue":"test-service"}}]},
      "scopeSpans": [{
        "spans": [{
          "traceId":"0102030405060708090a0b0c0d0e0f10",
          "spanId":"0102030405060708",
          "name":"test-span",
          "startTimeUnixNano":"1700000000000000000",
          "endTimeUnixNano":"1700000001000000000"
        }]
      }]
    }]
  }'

Service sddresses

Service External Address Port
Prometheus http://192.168.3.22:30090 NodePort
Grafana http://192.168.3.22:30300 NodePort (admin/kafka-admin-2024)
OTel gRPC 192.168.3.22:30317 NodePort
OTel HTTP 192.168.3.22:30318 NodePort
Kafka external 192.168.3.22:32100 NodePort
Kafka internal kafka-cluster-kafka-bootstrap.kafka:9092 ClusterIP


Data architecture / workflow


Applications/Services
       │
       ▼ OTLP (gRPC :4317 / HTTP :4318)
┌──────────────────┐
│ OTel Collector   │  (2 replicas – hp and msi)
│ namespace: otel  │
└──────┬───────────┘
       │ Kafka Producer
       ▼
┌────────────────────┐     ┌─────────────────────┐
│  Kafka 4.x KRaft   │◄────│  Strimzi Operator   │
│  namespace: kafka  │     │   (manages CR)      │
│  - controller(x510)│     └─────────────────────┘
│  - broker(hp)      │
│  - broker(msi)     │
└──────┬─────────────┘
       │ JMX /metrics :9404
       ▼
┌──────────────────┐     ┌──────────────────────┐
│   Prometheus     │────►│     Grafana          │
│   namespace:     │     │   Dashboards Kafka   │
│   monitoring     │     │   + OTel metrics     │
└──────────────────┘     └──────────────────────┘

Troubleshooting

Kafka does not start

kubectl describe kafka kafka-cluster -n kafka
kubectl describe kafkanodepool broker -n kafka
kubectl logs -n kafka -l strimzi.io/kind=Kafka --tail=50

PVC in Pending state

kubectl get pvc -A
kubectl describe pvc <name> -n kafka
# Check if the NFS provisioner is working:
kubectl get pods -n kube-system | grep nfs
kubectl logs -n kube-system -l app=nfs-client-provisioner

No metrics in Prometheus

# Check if port 9404 is accessible on the Kafka pods
kubectl get svc -n kafka
kubectl exec -n monitoring -it deploy/prometheus -- \
  wget -qO- http://kafka-cluster-kafka-brokers.kafka.svc:9404/metrics | head -20

Full system reset

kubectl delete kafka kafka-cluster -n kafka
kubectl delete kafkanodepool controller broker -n kafka
kubectl delete pvc --all -n kafka
kubectl delete pvc --all -n monitoring