文章目录
- Dubbo Prometheus + Grafana 监控SpringBoot项目
- 方式1:侵入式,通过修改spring boot代码实现
- 准备镜像
- Prometheus 搭建
- Grafana 搭建
- Spring Boot程序
- 配置grafana
- 方式2:无侵入式,直接通过agent实现promethues监控
- Grafana和Promethues配置都相同
- Spring Boot
- 其他关键问题及解决方法
- 问题1 : 多个Spring Boot如何加入监控?
- 错误
- 验证修改是否正确
Dubbo Prometheus + Grafana 监控SpringBoot项目
方式1:侵入式,通过修改spring boot代码实现
准备镜像
# 下载grafana镜像
docker pull grafana/grafana:9.1.8
# 下载promethues镜像
docker pull prom/prometheus:v2.39.1
Prometheus 搭建
# 创建外部挂载目录
mkdir -p /docker/prometheus/server
# 进入目录
cd /docker/prometheus/server
# 创建文件
touch rules.yml
touch prometheus.yml
# 修改文件
vim prometheus.yml
# 如下
scrape_configs:
# prometheus配置
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
# 本地启动的Spring Boot项目配置
- job_name: 'springboot-prometheus'
metrics_path: '/actuator/prometheus'
static_configs:
- targets: ['192.168.0.1:8082']
# 启动prometheus
docker run --name prometheus -p 9090:9090 --restart=always \
-v /docker/prometheus/server/prometheus.yml:/etc/prometheus/prometheus.yml \
-v /docker/prometheus/server/rules.yml:/etc/prometheus/rules.yml \
-itd prom/prometheus:v2.39.1 \
--config.file=/etc/prometheus/prometheus.yml \
--web.enable-lifecycle
------------------------------------------------------------------------------------
注:启动时加上
--web.enable-lifecycle: 启用远程热加载配置文件
--config.file:启动时加载配置文件
Grafana 搭建
# 先启动下(一会还要启动一次,这里启动是为了拷贝文件)
docker run --name=grafana -d -p 3000:3000 -itd grafana/grafana:9.1.8
# 创建外部挂载目录
mkdir -p /docker/prometheus/grafana
mkdir -p /docker/prometheus/grafana/data
# 修改配置文件grafana.ini,配置smtp邮件报警信息(报警会用到)
docker cp grafana:/etc/grafana/grafana.ini /docker/prometheus/grafana/
docker rm -f grafana
#防止grafana生成文件权限受阻,全开放
chmod 777 docker/prometheus/*
# 正式启动
docker run -p 3000:3000 --name grafana --restart=always \
-v /docker/prometheus/grafana/grafana.ini:/etc/grafana/grafana.ini \
-v /docker/prometheus/grafana/data:/var/lib/grafana \
-e "GF_SECURITY_ADMIN_PASSWORD=admin" \
-itd grafana/grafana:9.1.8
------------------------------------------------------------------------------------
注:-e "GF_SECURITY_ADMIN_PASSWORD=XXXXX"
是设置grafana登陆页面的密码,如不添加这条,默认账号密码为admin/admin
Spring Boot程序
- 加入依赖
<!-- 需要注意版本是否匹配 -->
<!-- 查看版本:2.1.5.RELEASE -->
<!-- https://docs.spring.io/spring-boot/docs/2.1.5.RELEASE/reference/html/appendix-dependency-versions.html -->
<!-- 查看版本:2.3.4.RELEASE -->
<!-- https://docs.spring.io/spring-boot/docs/2.3.4.RELEASE/reference/html/appendix-dependency-versions.html -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
<!-- 因为我的spring boot 是2.1.5.RELEASE, 所以用1.1.4版本,查看版本看上面链接 -->
<version>1.1.4</version>
</dependency>
- 修改配置
management:
endpoints:
web:
exposure:
include: "*"
metrics:
tags:
application: ${spring.application.name}
spring:
application:
name: application_name
# HELP tomcat_global_received_bytes_total
# TYPE tomcat_global_received_bytes_total counter
tomcat_global_received_bytes_total{application="application_name",name="http-nio-8082",} 0.0
# HELP tomcat_global_sent_bytes_total
# TYPE tomcat_global_sent_bytes_total counter
tomcat_global_sent_bytes_total{application="application_name",name="http-nio-8082",} 1452186.0
# HELP http_server_requests_seconds
# TYPE http_server_requests_seconds summary
http_server_requests_seconds_count{application="application_name",exception="None",method="GET",outcome="SUCCESS",status="200",uri="/api/kms/client/v2/initKeys",} 6492.0
http_server_requests_seconds_sum{application="application_name",exception="None",method="GET",outcome="SUCCESS",status="200",uri="/api/kms/client/v2/initKeys",} 95.939401895
http_server_requests_seconds_count{application="application_name",exception="None",method="GET",outcome="SUCCESS",status="200",uri="/actuator/prometheus",} 91.0
http_server_requests_seconds_sum{application="application_name",exception="None",method="GET",outcome="SUCCESS",status="200",uri="/actuator/prometheus",} 3.033350001
# HELP http_server_requests_seconds_max
.......
.......
.......
(有数据就证明Spring Boot 已经准备好了)
配置grafana
- 访问 http://localhost:3000 密码上面配置了,默认admin/admin
- 配置数据源
- 导入模版
因为默认模版比较丑,导入一个JVM的模版,来监控Spring Boot程序
- 输入模版ID,直接获取JVM监控大屏模版
点击这里,查看所有官方模版 - 查看仪表盘
方式2:无侵入式,直接通过agent实现promethues监控
Grafana和Promethues配置都相同
Spring Boot
程序侧不需要做任何改动,按照下面步骤接入监控!
也不需要引用 spring-boot-starter-actuator 和 micrometer-registry-prometheus
- 下载jmx_exporter
点击进入GitHub官方Releases - 创建目录,并将下载的文件放入创建的目录
# 创建目录
mkdir -p /data/server/jmx
# 修改目录权限
chmod 777 /data/server/jmx
# 创建必要的启动配置
touch simple-config.yml
# 将下载的文件拷贝的这个目录
.......
# 或者直接在此目录wget github的文件下载地址
- simple-config.yml(这个配置很关键,设置展示的指标项,具体做啥还要研究下)
---
lowercaseOutputLabelNames: true
lowercaseOutputName: true
rules:
- pattern: 'Catalina<type=GlobalRequestProcessor, name=\"(\w+-\w+)-(\d+)\"><>(\w+):'
name: tomcat_$3_total
labels:
port: "$2"
protocol: "$1"
help: Tomcat global $3
type: COUNTER
- pattern: 'Catalina<j2eeType=Servlet, WebModule=//([-a-zA-Z0-9+&@#/%?=~_|!:.,;]*[-a-zA-Z0-9+&@#/%=~_|]), name=([-a-zA-Z0-9+/$%~_-|!.]*), J2EEApplication=none, J2EEServer=none><>(requestCount|maxTime|processingTime|errorCount):'
name: tomcat_servlet_$3_total
labels:
module: "$1"
servlet: "$2"
help: Tomcat servlet $3 total
type: COUNTER
- pattern: 'Catalina<type=ThreadPool, name="(\w+-\w+)-(\d+)"><>(currentThreadCount|currentThreadsBusy|keepAliveCount|pollerThreadCount|connectionCount):'
name: tomcat_threadpool_$3
labels:
port: "$2"
protocol: "$1"
help: Tomcat threadpool $3
type: GAUGE
- pattern: 'Catalina<type=Manager, host=([-a-zA-Z0-9+&@#/%?=~_|!:.,;]*[-a-zA-Z0-9+&@#/%=~_|]), context=([-a-zA-Z0-9+/$%~_-|!.]*)><>(processingTime|sessionCounter|rejectedSessions|expiredSessions):'
name: tomcat_session_$3_total
labels:
context: "$2"
host: "$1"
help: Tomcat session $3 total
type: COUNTER
- 当启动java应用时,设置javaagent
# 很关键,这种方式就是启动的时候通过无侵入的方式监听jvm应用
nohup java -javaagent:/data/server/jmx/jmx_prometheus_javaagent-0.17.1.jar=3010:/data/server/jmx/simple-config.yml -jar /data/server/jmx/spring_boot_application.jar > spring_boot_application.log &
- 配置prometheus
prometheus.yml
scrape_configs:
- job_name: "springboot_jvm"
# 多久采集一次数据
scrape_interval: 5s
# 采集时的超时时间
scrape_timeout: 5s
# 采集的路径
metrics_path: '/metrics'
# 采集Springboot服务的地址
static_configs:
- targets: ['10.100.64.154:3010']
- 重新加载
curl -X POST http://localhost:9090/-/reload
- grafana监控模版(这种方式目前知道的是可以通过模版编号3066进行查看)效果如下
其他关键问题及解决方法
问题1 : 多个Spring Boot如何加入监控?
- 分析
主要是修改prometheus的配置文件,然后要让promethues加载此配置,使其有效。 - 解决方法
1.通过shell或者程序动态修改prometheus.yml,shell比较适合在jenkins当中,或者CICD平台通过Python直接修改目标机器的prometheus.yml 配置。
2.加载的话,可以通过一行curl命令执行,如下:
# 配置修改后,执行这个,就无需重启promethues了
curl -X POST http://IP:9090/-/reload
# 当重载成功后,prometheus日志会打印类似下面的信息
... msg="Loading configuration file" filename=prometheus.yml ...
错误
# 如果执行下面的语句
curl -X POST http://192.168.163.172:9090/-/reload
# 返回下面的信息(执行错误)
Lifecycle API is not enabled
####### 解决办法 #######
# 1.找到prometheus.services位置
systemctl status prometheus
# 执行上面的语句会显示位置,然后将下面的语句加到指定位置(如图)
--web.enable-lifecycle
# 再重启prometheus
systemctl daemon-reload
systemctl restart prometheus
systemctl enable prometheus
验证修改是否正确
# 执行下面命令
prometheus/bin/promtool check config /data/prometheus/cfg/prometheus.yml
# 返回修改结果,会做检查,然后再执行curl -X POST http://IP:9090/-/reload 加载修改
Checking /data/prometheus/cfg/prometheus.yml
SUCCESS: 1 rule files found
SUCCESS: /data/prometheus/cfg/prometheus.yml is valid prometheus config file syntax
Checking /data/prometheus/rules/node_rules.yml
SUCCESS: 4 rules found