一、Prometheus+Grafana

Prometheus+Grafana的安装参考之前的文档 ​​centos7.9搭建prometheus+grafana监控平台​

linux7搭建Prometheus+Grafana+Alertmanager监控客户端mysql,docker等服务并配置邮箱告警_grafana

Prometheus Server: 普罗米修斯的主服务器,端口号9090

NodeEXporter: 负责收集Host硬件信息和操作系统信息,端口号9100

cAdvisor:负责收集Host上运行的容器信息,端口号占用8080

Grafana:负责展示普罗米修斯监控界面,端口号3000

altermanager:等待接收prometheus发过来的告警信息,altermanager再发送给定义的收件人

二、配置mysql服务告警

1.被监控端配置mysqld_exporter

下载安装包:mysqld_exporter-0.14.0.linux-amd64.tar.gz

tar zxf mysqld_exporter-0.14.0.linux-amd64.tar.gz -C /usr/local/
ln -sv /usr/local/mysqld_exporter-0.14.0.linux-amd64/ /usr/local/mysqld_exporter


vim /usr/local/mysqld_exporter/.my.cnf
[client]
user=mysql_monitor
password=Password1!

######################################登录mysql进行授权###################################
mysql -u root -p
mysql> GRANT ALL PRIVILEGES ON *.* TO 'mysql_monitor'@'localhost' IDENTIFIED BY 'Password1!' WITH GRANT OPTION;
mysql> flush privileges;

------------------------------配置mysqld_exporter开机启动---------------------------
touch /usr/lib/systemd/system/mysqld_exporter.service
chown prometheus:prometheus /usr/lib/systemd/system/mysqld_exporter.service
chown -R prometheus:prometheus /usr/local/mysqld_exporter

vim /usr/lib/systemd/system/mysqld_exporter.service
[Unit]
Descriptinotallow=mysql_exporter
After=network.target
[Service]
Type=simple
User=prometheus
ExecStart=/usr/local/mysqld_exporter/mysqld_exporter --config.my-cnf=/usr/local/mysqld_exporter/.my.cnf
Restart=on-failure
[Install]
WantedBy=multi-user.target


systemctl daemon-reload
systemctl enable mysqld_exporter.service
systemctl restart mysqld_exporter.service

netstat -tanp | grep 9104 #查看9104端口服务启动
tcp6 0 0 :::9104 :::* LISTEN 3999/mysqld_exporte
tcp6 0 0 192.168.142.134:9104 192.168.142.132:37396 ESTABLISHED 3999/mysqld_exporte

2.监控端Prometheus添加job

vim /opt/prometheus/prometheus.yml
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"

# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.

static_configs:
- targets: ["localhost:9090"]

- job_name: 'Linux'
static_configs:
- targets: ['192.168.142.132:9100','192.168.142.134:9100']
labels:
group: 'client-node-exporter'
- job_name: 'mysql' #对客户端mysqld服务监控
static_configs:
- targets: ['192.168.142.134:9104']
labels:
group: 'client-node-exporter'



curl -X POST http://localhost:9090/-/reload #Prometheus热加载配置信息

3.监控页面Grafana加载mysql监控模板

导入模板id:7362

linux7搭建Prometheus+Grafana+Alertmanager监控客户端mysql,docker等服务并配置邮箱告警_prometheus_02

linux7搭建Prometheus+Grafana+Alertmanager监控客户端mysql,docker等服务并配置邮箱告警_grafana_03

linux7搭建Prometheus+Grafana+Alertmanager监控客户端mysql,docker等服务并配置邮箱告警_prometheus_04

三、配置监控windows主机

1.被监控端安装windows_exporter

windows2016镜像:​​https://www.microsoft.com/zh-cn/evalcenter/download-windows-server-2016​

下载windows_exporter: ​​https://github.com/prometheus-community/windows_exporter/releases​

下载安装包:windows_exporter-0.20.0-amd64.exe后,直接启动即可

访问查看收集信息:http://ip:9182/metrics

2.监控端Prometheus添加job

vim /opt/prometheus/prometheus.yml
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"

# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.

static_configs:
- targets: ["localhost:9090"]

- job_name: 'Linux'
static_configs:
- targets: ['192.168.142.132:9100','192.168.142.134:9100']
labels:
group: 'client-node-exporter'
- job_name: 'mysql' #对客户端mysqld服务监控
static_configs:
- targets: ['192.168.142.134:9104']
labels:
group: 'client-node-exporter'
- job_name: 'windows' #对windows监控
static_configs:
- targets: ['192.168.142.100:9182','192.168.142.101:9182','192.168.142.102:9182']
labels:
group: 'client-node-exporter'


curl -X POST http://localhost:9090/-/reload #Prometheus热加载配置信息

3.监控页面Grafana加载windows监控模板

导入模板id:13868

linux7搭建Prometheus+Grafana+Alertmanager监控客户端mysql,docker等服务并配置邮箱告警_linux_05

四、配置监控docker服务

1.被监控端安装docker监控组件cadvisor

docker run -v /:/rootfs:ro -v /var/run:/var/run/:rw -v /sys:/sys:ro -v /var/lib/docker:/var/lib/docker:ro -p 8080:8080 --detach=true --name=cadvisor --net=host google/cadvisor
netstat -tanp | grep cadvisor

tcp6 0 0 :::8080 :::* LISTEN 4669/cadvisor
tcp6 0 0 192.168.142.134:8080 192.168.142.1:57865 ESTABLISHED 4669/cadvisor
tcp6 0 0 192.168.142.134:8080 192.168.142.1:58532 ESTABLISHED 4669/cadvisor
tcp6 0 0 192.168.142.134:8080 192.168.142.1:57870 ESTABLISHED 4669/cadvisor
tcp6 0 0 192.168.142.134:8080 192.168.142.132:49092 ESTABLISHED 4669/cadvisor

2.监控端Prometheus添加job

vim /opt/prometheus/prometheus.yml
---------------找到相关设置job的地方添加----------------
- job_name: 'docker'
static_configs:
- targets: ['192.168.142.134:8080']
labels:
group: 'docker-exporter'


curl -X POST http://localhost:9090/-/reload #Prometheus热加载配置信息

3.监控页面Grafana加载docker监控模板

这里导入模板id:193

linux7搭建Prometheus+Grafana+Alertmanager监控客户端mysql,docker等服务并配置邮箱告警_prometheus_06

模板id:10619

linux7搭建Prometheus+Grafana+Alertmanager监控客户端mysql,docker等服务并配置邮箱告警_grafana_07

五、alertmanager告警组件安装配置

邮箱和企业微信开通和配置设置参考​​zabbix5.0自定义web监控和邮箱告警,企业微信告警​

1.alertmanager安装配置

tar zxf alertmanager-0.23.0.linux-amd64.tar.gz  
mv alertmanager-0.23.0.linux-amd64 /opt/prometheus/alertmanager

---------------------------配置alertmanager开机启动----------------------------
vim /usr/lib/systemd/system/alertmanager.service
[Unit]
Descriptinotallow=alertmanager
Documentatinotallow=https://github.com/prometheus/alertmanager
After=network.target
[Service]
Type=simple
User=prometheus
ExecStart=/opt/prometheus/alertmanager/alertmanager --config.file=/opt/prometheus/alertmanager/alertmanager.yml --storage.path=/opt/prometheus/alertmanager/data
Restart=on-failure
[Install]
WantedBy=multi-user.target


vim /opt/prometheus/prometheus.yml
找到alertmanager告警相关配置进行修改:
alerting:
alertmanagers:
- static_configs:
- targets:
- 192.168.142.132:9093

rule_files:
# - "first_rules.yml"
- "rules.yml" #定义监控规则


vim /opt/prometheus/rules.yml
groups:
- name: hostStatsAlert
rules:
- alert: NodeDown
expr: up == 0
for: 1m
labels:
severity: "Critical"
annotations:
summary: "Instance {{$labels.instance}} down"
description: "{{$labels.instance}} of job {{$labels.job}} has been down for more than 5 minutes."

- alert: NodeCPUUsage
expr: sum(avg without (cpu)(irate(node_cpu_seconds_total{mode!='idle'}[5m]))) by (instance) > 0.85
for: 1m
labels:
severity: "Warning"
annotations:
summary: "Instance {{ $labels.instance }} CPU usgae high"
description: "{{ $labels.instance }} CPU usage above 85% (current value: {{ $value }})"

- alert: NodeMemoryUsage
expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes)/node_memory_MemTotal_bytes > 0.85
for: 1m
labels:
severity: "Warning"
annotations:
summary: "Instance {{ $labels.instance }} MEM usgae high"
description: "{{ $labels.instance }} MEM usage above 85% (current value: {{ $value }})"

- alert: filesystemUsageAlert
expr: 100 - ((node_filesystem_avail_bytes{mountpoint="/",fstype=~"ext4|xfs"} * 100) / node_filesystem_size_bytes {mountpoint="/",fstype=~"ext4|xfs"}) > 85
for: 1m
labels:
severity: "Warning"
annotations:
summary: "Instance {{ $labels.instance }} root DISK usgae high"
description: "{{ $labels.instance }} root DISK usage above 85% (current value: {{ $value }})"


vim /opt/prometheus/alertmanager/alertmanager.yml
global:
resolve_timeout: 5m
smtp_smarthost: 'smtp.163.com:465'
smtp_from: '******@163.com'
smtp_auth_username: '******@163.com'
smtp_auth_password: 'VTKQYELFHUNAPLYC' #获取的授权码
smtp_require_tls: false

templates:
- '/opt/prometheus/alertmanager/template/*.tmpl'

route:
group_by: ['alertname']
group_wait: 30s
group_interval: 5m
repeat_interval: 10m
receiver: 'mail'
receivers:
- name: 'mail'
email_configs:
- to: '*********@163.com' #自己的邮箱
wechat_configs: # 企业微信报警配置
- send_resolved: true
to_party: '2' # 接收组的id
agent_id: '1000002' # (企业微信-->自定应用-->AgentId)
corp_id: '*************' # 企业信息(我的企业-->CorpId[在底部])
api_secret: '***************' # 企业微信(企业微信-->自定应用-->Secret)
message: '{{ template "test_wechat.html" . }}' # 发送消息模板的设定

inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance']


vim /opt/prometheus/alertmanager/template/testmail.tmpl #邮件告警页面内容
{{ define "test.html" }}
<table border="1">
<tr>
<td>报警项</td>
<td>实例</td>
<td>报警阀值</td>
<td>开始时间</td>
</tr>
{{ range $i, $alert := .Alerts }}
<tr>
<td>{{ index $alert.Labels "alertname" }}</td>
<td>{{ index $alert.Labels "instance" }}</td>
<td>{{ index $alert.Annotations "value" }}</td>
<td>{{ $alert.StartsAt }}</td>
</tr>
{{ end }}
</table>
{{ end }}

vim /opt/prometheus/alertmanager/template/testwechat.tmpl
{{ define "cdn_live_wechat.html" }}
{{ range $i, $alert := .Alerts.Firing }}
[报警项]:{{ index $alert.Labels "alertname" }}
[实例]:{{ index $alert.Labels "instance" }}
[报警阀值]:{{ index $alert.Annotations "value" }}
[开始时间]:{{ $alert.StartsAt }}
{{ end }}
{{ end }}

2.alertmanager服务启动

chown -R prometheus:prometheus /usr/lib/systemd/system/alertmanager.service
chown -R prometheus:prometheus /opt/prometheus/*
curl -X POST http://localhost:9090/-/reload
systemctl daemon-reload
systemctl enable alertmanager
systemctl start alertmanager

六.邮件告警测试

linux被监控主机执行  fallocate -l 20G /etc/swap 使磁盘使用超过90%达到告警条件

linux7搭建Prometheus+Grafana+Alertmanager监控客户端mysql,docker等服务并配置邮箱告警_linux_08

linux7搭建Prometheus+Grafana+Alertmanager监控客户端mysql,docker等服务并配置邮箱告警_prometheus_09

七.Alertmanager grafana展示

1.安装软件包

grafana-cli plugins install camptocamp-prometheus-alertmanager-datasource
systemctl restart grafana-server.service

2.页面配置Alertmanager grafana

linux7搭建Prometheus+Grafana+Alertmanager监控客户端mysql,docker等服务并配置邮箱告警_prometheus_10

linux7搭建Prometheus+Grafana+Alertmanager监控客户端mysql,docker等服务并配置邮箱告警_prometheus_11

linux7搭建Prometheus+Grafana+Alertmanager监控客户端mysql,docker等服务并配置邮箱告警_grafana_12

linux7搭建Prometheus+Grafana+Alertmanager监控客户端mysql,docker等服务并配置邮箱告警_prometheus_13

linux7搭建Prometheus+Grafana+Alertmanager监控客户端mysql,docker等服务并配置邮箱告警_grafana_14

linux7搭建Prometheus+Grafana+Alertmanager监控客户端mysql,docker等服务并配置邮箱告警_prometheus_15

八.遇到问题

1.prometheus 服务器重启后没数据了

prometheus 服务器关机很久了,重启后发现"Error on ingesting samples that are too old or are too far into the future"

linux7搭建Prometheus+Grafana+Alertmanager监控客户端mysql,docker等服务并配置邮箱告警_linux_16

解决办法:

首先确保主机时间和正常时间一致,然后备份prometheus数据库,新建data目录,重启prometheus和grafana服务
mv /opt/prometheus/data/ /opt/prometheus/data_bak
mkdir -pv /opt/prometheus/data && chown -R prometheus.prometheus /opt/prometheus/data
systemctl daemon-reload
systemctl restart prometheus
systemctl restart grafana-server