preloader
blog-post

Monitoring the Kafka consumer lag

Table of Contents

Kafka consumer lag gives the offset difference between last produced message and the last consumed message. If the rate of production is higher than the rate of consumption then consumer groups will exhibit lag. An efficient streaming application requires consumer lag to be minimal, a higher consumer lag indicates performance problems with the consumers. Managed Kafka services like Confluent Kafka, Amazon MSK come with consumer lag monitoring built using Control Center, Cloud Watch respectively. In this article we will see a few more options to monitor Kafka consumers.

Prerequisite

  • An working Kafka cluster with some producers and consumers
  • An Linux machine with access to your Kafka cluster

Start Container

For the purpose of the demo we will use a docker container called kafka-tools which has kafkacat and Kafka binary installed.

  • Clone kafka-examples repo

    git clone https://github.com/entechlog/kafka-examples.git
    
  • cd into kafka-examples/kafka-tools directory and start the container

    cd kafka-examples/kafka-tools
    docker-compose up -d --build
    

Validate Container

  • Validate the containers by running

    docker ps
    
  • SSH into the container

    docker exec -it kafka-tools /bin/bash
    
  • Validate kafkacat and connection to the Kafka cluster. This command lists all brokers in the cluster

    kafkacat -L -b 192.168.0.107:39092 | grep broker
    

    Make sure to change the broker name 192.168.0.107:39092 to Kafka broker on the next few commands which references a Kafka broker.

  • Validate Kafka binary and connection to the Kafka cluster. This command lists all brokers in the cluster

    cd /kafka/bin/
    ./kafka-broker-api-versions.sh --bootstrap-server 192.168.0.107:39092 | awk '/id/{print $1}'
    

Create Kafka Producer and Consumer

This is an optional step, if you don’t have a Kafka cluster, Kafka producer and a Kafka consumer you can also create one up by running below commands.

cd kafka-examples/kafka-postgres-superset
docker-compose up --remove-orphans -d --build

Monitor Kafka Consumer Lag

In the next few steps we will review a couple of options to get the Kafka consumer lag starting from simple CLI options to monitoring using Prometheus and Grafana.

Kafka Consumer Groups Script

Kafka ships with a script called kafka-consumer-groups.sh, which can be used to list all consumer groups, describe a consumer group, delete consumer group info, or reset consumer group offsets. Getting Kafka consumer lag using this script would be the simplest way to get the consumer lag of a consumer group.

  • Get the list of consumer groups and identify the consumer group for which you want to check the consumer lag

    ./kafka-consumer-groups.sh --bootstrap-server 192.168.0.107:39092 --list
    

    This script is located in /kafka/bin directory of kafka-tools container

  • Get the consumer offsets and lag using the consumer group name identified from above list command

    ./kafka-consumer-groups.sh --bootstrap-server 192.168.0.107:39092 --group <consumer-group-name> --describe
    

This gives the lag information for a specific consumer group. Still it would be a manual effort and inefficient way to get lag when you have a few consumer groups.

Simple Lag Checker Script

To reduce the manual effort of checking each consumer group, we will create a wrapper script simple-lag-checker.sh on top of kafka-consumer-groups.sh. This wrapper script will loop over all available consumer groups and capture the lag information in a html report.

  • Clone the repo kafka-utilities

    git clone https://github.com/entechlog/kafka-utilities.git
    
  • cd to the directory which contains the script

    cd kafka-utilities/simple-lag-checker
    
  • Here is the list of input variables accepted by simple-lag-checker.sh

    Input Description
    kafka_home Path in which kafka is installed
    bootstrap_servers Kafka host and port number in host:port format
    command_config Full path for command config file
    consumer_groups Filter to fetch all consumers with matching pattern
  • Execute the script by running the command

    ./simple-lag-checker.sh \
    --kafka_home /kafka --bootstrap_servers 192.168.0.107:39092 \
    --command_config /home/ubuntu/kafka-utilities/simple-lag-checker/input/config/client.properties
    
  • Successful execution will create reports in html and csv format inside reports directory. We will copy the report from docker to local machine for review

    docker cp 59c182a8c325:/home/ubuntu/kafka-utilities/simple-lag-checker/reports/simple-lag-checker_2022_05_0105_08_23.html C:\Users\entechlog\Downloads
    
  • Here is how a report generated by this process will look like

This is a slight improvement over the previous process which helps to quickly get the Kafka consumer lag. The disadvantage of this process is, it is still a manual process.

Burrow

Burrow is a monitoring companion for Apache Kafka that provides consumer lag checking as a service without the need for specifying thresholds. It monitors committed offsets for all consumers and calculates the status of those consumers on demand. An HTTP endpoint is provided to request status on demand, as well as provide other Kafka cluster information. There are also configurable notifiers that can send status out via email or HTTP calls to another service.

  • cd to the kafka-consumer-lag directory. This will be our work directory to bring up docker containers required for this demo

    cd kafka-examples\kafka-consumer-lag
    
  • Update kafka-consumer-lag\burrow\config\burrow.toml to add Kafka cluster information. See the wiki page for all supported configuration

  • Start Burrow containers

    docker-compose -f docker-compose-burrow.yml up -d --build
    
  • Navigate to Burrow Dashboard to see the consumer details

CMAK

CMAK (previously known as Kafka Manager) is a tool for managing Apache Kafka clusters. CMAK helps to manage Kafka clusters, inspect cluster state like topics, consumers, offsets, brokers, replica distribution, partition distribution. Though the primary function of CMAK is to manage the Kafka clusters, it can be also used to visualize Kafka consumer lag and other important Kafka metrics.

  • Update kafka-consumer-lag\ymck\config\consumer.properties to add Kafka broker information

  • Update ZK_HOSTS in “docker-compose-ymck.yml” to add your zookeeper information. You can also comment “extra_hosts” information if you don’t have a need for hostname mapping

  • Start CMAK containers

    docker-compose -f docker-compose-ymck.yml up -d --build
    
  • Navigate to CMAK Dashboard to add the Kafka cluster details and enable “Poll consumer information”

  • After adding the Kafka cluster details, Navigate to Consumers tab to view the Kafka consumer lag information

Kafka Exporter

Kafka exporter for Prometheus, exposes brokers, topics and consumer groups metrics. See here for details exposed by Kafka exporter.

  • Update Kafka broker information in “docker-compose-kafka-exporter.yml”

  • Start the Kafka exporter, Prometheus and Grafana containers

    docker-compose -f docker-compose-kafka-exporter.yml up -d --build
    
  • Navigate to Grafana Dashboard and select Kafka Exporter Overview dashboard

Kafka Lag Exporter

Monitor Kafka Consumer Group Latency with Kafka Lag Exporter. Kafka Lag Exporter provides features to run easily on Docker and Kubernetes clusters. We will see how to run it in Docker and Kubernetes along with Prometheus and Grafana.

Docker

  • Update kafka-consumer-lag\kafka-lag-exporter\application.conf to add Kafka cluster. Here is an example configuration

    kafka-lag-exporter {
      reporters.prometheus.port = 8000
      poll-interval = 30 seconds
      clusters = [
        {
          name = "kafka-demo-cluster"
          bootstrap-brokers = "192.168.0.107:39092"
          consumer-properties = {
            security.protocol = "PLAINTEXT"
          }
          admin-client-properties = {
            security.protocol = "PLAINTEXT"
          }
        }
      ]
    }
    

    See here for all supported configuration for kafka-lag-exporter

  • Start Kafka lag exporter, Prometheus and Grafana containers

    docker-compose -f docker-compose-kafka-lag-exporter.yml up -d --build
    
  • Navigate to Grafana Dashboard and select Kafka Lag Exporter dashboard. This should give a neat view of consumer lag by time and offset

Kubernetes

The best part of Kafka Lag Exporter is it can deployed in Kubernetes, Here is a quick run down on how to install Kafka Lag Exporter in Kubernetes

  • Add the chart repository for kafka-lag-exporter

    helm repo add kafka-lag-exporter https://seglo.github.io/kafka-lag-exporter/repo/
    
  • Update information of available charts locally from chart repositories

    helm repo update
    
  • List and verify the chart repository

    helm repo list
    
  • Create values.yml file based on your use case, See here for a sample file with detailed explanation for each of the configuration. You can change service type in this as well

  • Do a dry run install to verify the generated yaml file

    helm install kafka-lag-exporter \
    kafka-lag-exporter/kafka-lag-exporter \
    --namespace kafka --dry-run --debug \
    -f kafka-lag-exporter/lag-exporter-values.yml
    
  • Do actual install by running below command

    helm install kafka-lag-exporter \
    kafka-lag-exporter/kafka-lag-exporter \
    --namespace kafka \
    -f kafka-lag-exporter/lag-exporter-values.yml
    
  • If you used ClusterIP you can access the metric endpoint following below steps

    kubectl port-forward service/kafka-lag-exporter-service 8000:8000 --namespace kafka
    # Navigate to below endpoint in your browser
    http://127.0.0.1:8000
    
  • If you used NodePort you can access the metric endpoint following below steps

    export NODE_PORT=$(kubectl get --namespace kafka -o jsonpath="{.spec.ports[0].nodePort}" services kafka-lag-exporter-service)
    export NODE_IP=$(kubectl get nodes --namespace kafka -o jsonpath="{.items[0].status.addresses[0].address}")
    echo http://$NODE_IP:$NODE_PORT
    # Navigate to the endpoint from above echo 
    # If you use Kubernetes in Docker then use localhost:NODEPORT to access the endpoint like `http://localhost:30072/`
    
  • If the endpoint is operational you can add it to Prometheus target. If you don’t have a Prometheus target for demo create one quickly by running below docker-compose from kafka-examples

    docker-compose -f docker-compose-kafka-lag-exporter-kube.yml up -d --build
    

Once deployed in Kubernetes OR as an application Kafka Lag Exporter provides a way to always monitor the consumer lag and send alerts when certain limits are reached.

UI for Apache Kafka

UI for Apache Kafka is a free, open-source web UI to monitor and manage Apache Kafka clusters. Its lightweight dashboard makes it easy to track key metrics of your Kafka clusters - Brokers, Topics, Partitions, Production, and Consumption.

  • Update docker-compose-kafka-ui.yml to add your Kafka cluster details. Here is an example configuration

        environment:
          - KAFKA_CLUSTERS_0_NAME=kube-cluster
          - KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS=192.168.0.101:30020
          - KAFKA_CLUSTERS_0_READONLY=true
    

    You can add more the one cluster by adding more environment like KAFKA_CLUSTERS_1, KAFKA_CLUSTERS_2, etc

  • Start UI for Apache Kafka

    docker-compose -f docker-compose-kafka-ui.yml up -d --build
    
  • Navigate to UI for Apache Kafka and click Consumers to view the Kafka consumer details

Getting visibility into Kafka consumer lag will help to decide the next operational task on fixing the lag by tuning the consumer properties, increasing the number of Kafka partitions, and adding additional consumers to your consumer group are few of the potential actions which you could take.

Clean Demo Resources

  • Run the below command to delete the docker containers and related volumes.

    ## Kafka tools
    cd kafka-examples\kafka-tools
    docker-compose down -v --remove-orphans --rmi all
    ## Burrow
    cd kafka-examples\kafka-consumer-lag
    docker-compose -f docker-compose-burrow.yml down -v --remove-orphans --rmi
    ## YMCK
    docker-compose -f docker-compose-ymck.yml down -v --remove-orphans --rmi
    ## Kafka Exporter
    docker-compose -f docker-compose-kafka-exporter.yml down -v --remove-orphans --rmi
    ## Kafka Lag Exporter
    docker-compose -f docker-compose-kafka-lag-exporter.yml down -v --remove-orphans --rmi
    docker-compose -f docker-compose-kafka-lag-exporter-kube.yml down -v --remove-orphans --rmi
    ## UI for Apache Kafka
    docker-compose -f docker-compose-kafka-ui.yml down -v --remove-orphans --rmi
    
  • You can list and delete the helm deployments after the demo by running below command

    helm list
    
    helm delete kafka-lag-exporter --namespace kafka
    

Hope this was helpful. Did I miss something ? Let me know in the comments OR in the forum section.

References

Share this blog:
Comments

Related Articles