使用Prometheus+Grafana打造Flink监控页面
1. 下载对应的安装
组件 | 版本 | 下载地址 |
Prometheus | 2.36.1 | |
Pusgateway | 1.4.3 | |
node_exporter | 1.3.1 | |
Grafana | 8.5.6 |
2. 安装并运行
1. 安装pushgateway
将pushgateway解压到指定目录后,直接运行pushgateway即可
启动pushgateway
nohup ./pushgateway --web.listen-address=":9092" > ./pushgateway-start.log 2>&1 &
将push gateway安装为系统服务,让systemctl
来管理,直接修改/usr/lib/systemd/system/pushgateway.service
,写入以下内容:
[Unit]
Description=pushgateway
After=local-fs.target network-online.target network.target
Wants=local-fs.target network-online.target network.target
[Service]
ExecStart=/opt/modules/pushgateway/pushgateway --web.listen-address=:9092
Restart=on-failure
[Install]
WantedBy=multi-user.target
设置开机自启
systemctl enable pushgateway
systemctl start pushgateway
systemctl status pushgateway
2. 安装node_exporter
将node_exporter解压到指定目录后,直接运行node_exporter即可
启动node_exporter
nohup ./node_exporter --web.listen-address=":9100" > ./node_exporter-start.log 2>&1 &
将node_exporter安装为系统服务,让systemctl
来管理,直接修改/usr/lib/systemd/system/node_exporter.service
,写入以下内容:
[Unit]
Description=node_exporter
After=local-fs.target network-online.target network.target
Wants=local-fs.target network-online.target network.target
[Service]
ExecStart=/opt/modules/node_exporter/node_exporter --web.listen-address=:9100
Restart=on-failure
[Install]
WantedBy=multi-user.target
设置开机自启
systemctl enable node_exporter
systemctl start node_exporter
systemctl status node_exporter
3. 安装Prometheus
解压Prometheus到安装目录,并修改目录下的prometheus.yml
配置文件:
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"
static_configs:
# 默认端口是9090
- targets: ["cdh1:9093"]
labels:
instances: prometheus
- job_name: "pushgateway"
# 默认为1分钟
scrape_interval: 5s
static_configs:
# 默认端口是9091
- targets: ["cdh1:9092"]
labels:
instances: pushgateway
- job_name: "node_exporter"
scrape_interval: 5s
static_configs:
- targets: ["cdh1:9100"]
labels:
instances: node_exporter
启动Prometheus服务
nohup ./prometheus --config.file=./prometheus.yml --web.listen-address=":9093" > ./prometheus-start.log 2>&1 &
启动之后在web页面上可以看到pushgateway、node_exporter以及prometheus的运行情况,地址ip:port/targets
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-iARGX1s8-1655369166919)(http://yanko.test.upcdn.net/images/prometheus-targets.jpg)]
将prometheus安装为系统服务,让systemctl
来管理,直接修改/usr/lib/systemd/system/prometheus.service
,写入以下内容:
[Unit]
Description=prometheus
After=local-fs.target network-online.target network.target
Wants=local-fs.target network-online.target network.target
[Service]
ExecStart=/opt/modules/prometheus/prometheus --config.file=/opt/modules/prometheus/prometheus.yml --web.listen-address=:9093
Restart=on-failure
[Install]
WantedBy=multi-user.target
设置开机自启
systemctl enable prometheus
systemctl start prometheus
systemctl status prometheus
4. 安装Grafana
wget https://dl.grafana.com/enterprise/release/grafana-enterprise-9.0.0-1.x86_64.rpm
sudo yum install grafana-enterprise-9.0.0-1.x86_64.rpm
启动Grafana服务
sudo systemctl start grafana-server
启动Grafana后,可以访问web页面,地址是ip:3000
,默认用户名和密码是admin/admin
。
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-60H9LHUk-1655369166919)(http://yanko.test.upcdn.net/images/grafana-login.jpg)]
3. 配置Grafana数据源
进入Setting
中选择Add Data source
即可添加数据源,选择prometheus即可,添加完成后保存测试。
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-rVYP0qsI-1655369166920)(http://yanko.test.upcdn.net/images/grafana-prometheus.jpg)]
4. 配置Flink上报Metrics
修改flink-conf.yaml
配置文件,新增如下配置:
metrics.reporter.promgateway.class: org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter
# pushgateway的地址和端口号
metrics.reporter.promgateway.host: cdh1
metrics.reporter.promgateway.port: 9092
metrics.reporter.promgateway.jobName: FlinkJob
# 是否自动生成JobName
metrics.reporter.promgateway.randomJobNameSuffix: true
# 作业结束后删除其对应的Metrics
metrics.reporter.promgateway.deleteOnShutdown: false
metrics.reporter.promgateway.groupingKey: k1=v1;k2=v2
# 上报Metrics的频率
metrics.reporter.promgateway.interval: 10 SECONDS
配置好之后运行flink的测试案例:
bin/flink run -d -t yarn-per-job ./examples/streaming/TopSpeedWindowing.jar
再次访问pushgateway的页面可以看到如下的metrics上报信息
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-vQQ7KHGc-1655369166920)(http://yanko.test.upcdn.net/images/pushgateway-metrics.jpg)]
5. 登陆Grafana配置Flink监控面板
访问Grafana的dashboards页面,下载Flink的dashboards模板,官网下载地址https://grafana.com/grafana/dashboards/
下载好之后打开Grafana Web UI通过import导入刚刚所下载的Flink Metrics监控模板
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-4Vq3RE7D-1655369166920)(http://yanko.test.upcdn.net/images/grafana-flink.jpg)]
之后打开该dashboards即可看到flink的metrics信息
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-sNmZxruN-1655369166920)(http://yanko.test.upcdn.net/images/grafana-flink-metrics.jpg)]