I am running Apache Kafka on Kubernetes using minikube and I have also a Pod with a confluent consumer written in python. I am monitoring Kafka broker metrics with a JMX exporter and consumer metrics with kminion consumer exporter. These exporters are also 2 seperate pods. Lastly I have Prometheus monitoring both of these exporters and reading the metrics.
I am producing 2 messages per second to a certain topic. My consumer consumes a message and then runs a task. The task needs 0.4 seconds to complete. So I am also consuming with a rate of 2 messages per second.
My hypothesis is that the queue lag metric should be either zero or 2 at all times since I am producing and consuming at the same rate. I am monitoring the queue every second and this is what I get over a period of 5 seconds:
t = 0: Queue is 0.
t = 1: Queue is 3.
t = 2: Queue is 5.
t = 3: Queue is 7.
t = 4: Queue is 9.
t = 5: Queue is 0.
And it repeats the same cycle. So the avg_over_time of the queue lag is 5. Why is this happening? I know the consumer cannot consume 9 messages at once since it runs a task that takes 0.3 seconds to complete and therefore my maximum consume rate is 2 perSecond.
I have also tried using a different exporter for consumer metrics but I still get the same results.
When is your consumer committing the offsets?
If you are not manually committing offsets after processing each message, by default, a consumer commits them every 5 seconds: https://kafka.apache.org/documentation/#consumerconfigs_auto.commit.interval.ms
Which would explain why you see these queue values