1 kafka监控方式

Kafka这样的Java进程可以先通过JMX Agent或者第三方Agent(kafka_exporter\KMINION等)获取监控数据,再通过Prometheus采集数据、通过Grafana模板展示数据即可。

2 kafka_exporter

2.1 安装配置kafka_exporter

2.1.1 下载kafka_exporter

下载地址
https://github.com/danielqsj/kafka_exporter/releases/
#加速下载
wget https://mirror.ghproxy.com/https://github.com/danielqsj/kafka_exporter/releases/download/v1.7.0/kafka_exporter-1.7.0.linux-amd64.tar.gz

2.1.2 安装kafka

tar -xf kafka_2.13-3.7.0.tgz -C /app/module/
mv /app/module/kafka_2.13-3.7.0/ /app/module/kafka

2.1.3 更改配置

cd /app/module/kafka/config/
vim server.properties
listeners=PLAINTEXT://192.168.137.131:9092
advertised.listeners=PLAINTEXT://192.168.137.131:9092

2.1.4 启动kafka

cd /app/module/kafka/bin/
sh kafka-server-start.sh -daemon ../config/server.properties

2.1.5 解压kafka_exporter

tar -xf kafka_exporter-1.7.0.linux-amd64.tar.gz -C /app/module/
ln -s /app/module/kafka_exporter-1.7.0.linux-amd64/ /app/module/kafka_exporter

2.1.6 配置kafka_exporter启动⽂件

vim /usr/lib/systemd/system/kafka_exporter.service
[Unit]
Description=kafka_exporter
Documentation=https://prometheus.io/
After=network.target
[Service]
ExecStart=/app/module/kafka_exporter/kafka_exporter \
  --web.listen-address=:9308 \
  --kafka.server=192.168.137.131:9092
ExecReload=/bin/kill -HUP
TimeoutStopSec=20s
Restart=always
[Install]
WantedBy=multi-user.target

#kafka_exporter --kafka.server=kafka:9092 [--kafka.server=another-server ...]
意味着一个kafka_exporter可以配置多个kafka服务
#--kafka.server尽量不要用localhost:9092,有些时候获取不到partition数据

2.1.7 启动kafka_exporter

systemctl daemon-reload
systemctl start kafka_exporter.service

2.2 配置Prometheus

1、编辑Prometheus配置⽂件,将haproxy服务纳⼊监控
  - job_name: "kafka_exporter"
    metrics_path: "/metrics"
    static_configs:
    - targets: ["192.168.137.131:9308"]

2、重新加载Prometheus配置⽂件 
curl -X POST http://192.168.137.131:9090/-/reload

2.3 kafka常用指标

2.3.1 brokers相关指标

指标名称

指标类型

指标含义

kafka_brokers

gauge

Kafka集群中的brokers数量

2.3.2 Topics相关指标

指标名称

指标类型

指标含义

kafka_topic_partitions

gauge

该主题的分区数

kafka_topic_partition_current_offset

gauge

分区在主题/分区上的当前偏移量

kafka_topic_partition_oldest_offset

gauge

分区在主题/分区上的最旧偏移量

kafka_topic_partition_in_sync_replica

gauge

该主题/分区的同步副本数量

kafka_topic_partition_leader

gauge

该主题/分区的领导者

2.3.3 消费者组相关指标

指标名称

指标类型

指标含义

kafka_consumergroup_current_offset

gauge

消费者组在主题/分区上的当前位置

kafka_consumergroup_lag

gauge

消费者组在主题/分区上的当前大约滞后

2.4 kafka告警规则文件

2.4.1 告警规则⽂件

vim /app/module/prometheus/rules/kafka_rules.yml
groups:
- name: kafka告警规则
  rules:
  - alert: kafka brokers异常
    expr: kafka_broker_info != 1
    for: 2m
    labels:
      severity: critical
    annotations:
      description: "{{ $labels.name }}当前brokers异常:{{ $labels.address }}"
  - alert: kafka消息整体积压
    expr: sum(kafka_consumergroup_lag_sum{job="kafka-exporter"}) by (name,consumergroup, topic)>5000
    for: 2m
    labels:
      severity: critical
    annotations:
      description: "【环境】{{ $labels.name }}\n【消费组】{{ $labels.consumergroup }}\n【topic】{{ $labels.topic }}【积压】:{{ $value | printf \"%.2f\" }}"
  - alert: kafka消息分区积压
    expr: (sum(kafka_consumergroup_lag{job="kafka-exporter"}) by (name,consumergroup, topic, partition)>1500) AND ON() (hour()+8)%24 >= 7 <= 21
    for: 3m
    labels:
      severity: critical
    annotations:
      description: "【环境】{{ $labels.name }}\n【消费组】{{ $labels.consumergroup }}\n【topic】{{$labels.topic}}【分区】{{ $labels.partition }}【积压】:{{ $value | printf \"%.2f\" }}"
  - alert: kafka分区数过多
    expr: sum by(name)(kafka_topic_partitions{job="kafka-exporter",topic !~"__.*"})>1500
    for: 2m
    labels:
      severity: critical
    annotations:
      description: "{{ $labels.name }}当前分区数:{{ $value | printf \"%.2f\" }}"
  - alert:  kafka_brokers丢失
    expr: kafka_brokers{job="kafka-exporter"} < 3
    for: 2m
    labels:
      severity: critical
    annotations:
      description: "{{ $labels.name }}当前brokers数:{{ $value | printf \"%.2f\" }}"
  - alert:  kafka_TopicsReplicas
    expr: sum(kafka_topic_partition_in_sync_replica{job="kafka-exporter"}) by (name,topic) <1
    for: 2m
    labels:
      severity: critical
    annotations:
      description: "{{ $labels.name }} kafka topic in-sync partition:{{ $value | printf \"%.2f\" }}"

2.4.2 检查rules语法

/app/module/prometheus/promtool check rules /app/module/prometheus/rules/kafka_rules.yml

2.4.3 重新加载Prometheus

curl -X POST http://192.168.137.131:9090/-/reload

2.4.4 验证告警规则

​​Prometheus监控之kafka_vim

2.5 导入kafka图形

2.5.1 导入ID7589

​​Prometheus监控之kafka_vim_02

这个模板匹配kafka_consumergroup_current_offset,如果是做实验的kafka,数据少,这个参数kafka_consumergroup_current_offset可能会不存在,如果参数不存在,job那里就是空的

2.5.2 导入ID21078

​​Prometheus监控之kafka_vim_03

3 jmx_exporter

3.1 下载配置文件

wget https://mirror.ghproxy.com/https://github.com/prometheus/jmx_exporter/blob/main/example_configs/kafka-2_0_0.yml

3.2 kafka添加配置

export KAFKA_OPTS="-javaagent:/app/module/jmx_exporter/jmx_prometheus_javaagent-0.20.0.jar=9991:/app/module/jmx_exporter/kafka-2_0_0.yml"
配置方式有如下几种:
修改bin/kafka-run-class.sh脚本,在开始运行的最上方加入以上配置。
修改bin/kafka-server-start.sh脚本,在开始运行的最上方加入以上配置。

3.3 启动kafka

sh kafka-server-start.sh -daemon ../config/server.properties

3.4 查看指标

​​Prometheus监控之kafka_数据_04

3.5 配置Prometheus

- job_name: "jmx_exporter"
    metrics_path: "/metrics"
    static_configs:
    - targets: ["192.168.137.131:9991"]

重新加载Prometheus
curl -X POST http://192.168.137.131:9090/-/reload

3.6 导入JVM图形

导⼊⼀个JVM的kafka模板。Dashboard ID为 1827

​​Prometheus监控之kafka_kafka_05