docker部署prometheus+grafana+alertmanager邮件报警_linux


实验环境

服务器ip

部署服务

服务端口号



prometheus_server  192.168.10.42

prometheus

node-exporter

cAdvisor

alertmanager

grafana

    prometheus:9090      pro服务端程序

    node-exporter:9100  监控主机信息

    cAdvisor:8080           收集容器信息

    alertmanager:9093    报警模块

    grafana:3000             web图形展示


prometheus_node    192.168.10.45

node-exporter

cAdvisor

    node-exporter:9100   监控主机信息

    cAdvisor:8080            收集容器信息



cat /etc/redhat-release && uname -a

CentOS Linux release 7.8.2003 (Core)

Linux centos7-1 3.10.0-1127.el7.x86_64


systemctl stop firewalld && systemctl disable firewalld

echo SELINUX=disabled > /etc/sysconfig/selinux


软件安装

yum install -y   yum-utils  docker-ce-18.06.3.ce    chrony

systemctl daemon-reload

systemctl  enable chronyd docker --now  && hwclock  -w

docker --version

Docker version 18.06.3-ce, build 6d37f41


docker pull   prom/prometheus  

docker pull   prom/node-exporter

docker pull   prom/alertmanager

docker pull   google/cadvisor:v0.33.0

docker pull   grafana/grafana


配置prometheus服务端

touch /usr/local/docker/prometheus/prometheus.yml 

cat /usr/local/docker/prometheus/prometheus.yml

global:

 scrape_interval: 15s  

 evaluation_interval: 15s     采集数据时间


alerting:

 alertmanagers:

   - static_configs:

       - targets: ["192.168.10.42:9093"]


rule_files:

   - "/etc/prometheus/*_rules.yml"   容器内部ruler配置路径


scrape_configs:

 - job_name: "promeserver"     主节点 promeserver服务端主机信息采集

   static_configs:

     - targets: ["192.168.10.42:9100",192.168.10.45:9100"]


 - job_name: "cadvisor"   主节点 cadvisor 容器信息采集

   static_configs:

     - targets: ["192.168.10.42:8080","192.168.10.45:8080"]


docker run -itd --name promserver \

 --restart=always -p 9090:9090 \

 -v /usr/local/docker/prometheus/:/etc/prometheus \

 -v /etc/localtime:/etc/localtime --net=host prom/prometheus \

 --config.file=/etc/prometheus/prometheus.yml \

--web.enable-lifecycle  &&   docker  logs -f  promserver  | grep 9090

caller=web.go:570 level=info component=web msg="Start listening for connections" address=0.0.0.0:9090  查看promserver日志


docker exec -it --user=root  prometheus /bin/sh -c 'promtool check config /etc/prometheus/prometheus.yml'

Checking /etc/prometheus/prometheus.yml

SUCCESS: 0 rule files found   检测prometheus 配置文件语法  0为没有问题


http://serverip:9090/

 docker部署prometheus+grafana+alertmanager邮件报警_docker_02


 docker部署prometheus+grafana+alertmanager邮件报警_linux_03

   

配置prometheus服务端/客户端

docker run -itd  --restart=always  -p 9100:9100 \

 -v /etc/localtime:/etc/localtime  \

 --name=promnode   --net=host \

prom/node-exporter  &&  docker logs -f  promnode  | grep 9100

caller=node_exporter.go:199 level=info msg="Listening on" address=:9100    prometheus_server/prometheus_node部署


http://serverip:9100/

 docker部署prometheus+grafana+alertmanager邮件报警_linux_04


 docker部署prometheus+grafana+alertmanager邮件报警_linux_05


配置容器分析工具cadvisor 

docker run -itd --name cadvisor \

--restart=always -p 8080:8080 \

-v /etc/localtime:/etc/localtime \

-v /:/rootfs:ro -v /var/run/:/var/run/:rw  \

-v /sys/:/sys/:ro -v /var/lib/docker/:/var/lib/docker/:ro \

-v /dev/disk/:/dev/disk/:ro  \

--net=host google/cadvisor:v0.33.0  && docker logs -f cadvisor 

查看cadvisor容器日志   prometheus_server/prometheus_node部署

prometheus_server/prometheus_node部署


http://serverip:8080

 docker部署prometheus+grafana+alertmanager邮件报警_linux_06


配置邮件报警  

cat /usr/local/docker/alertmanager/alertmanager.yml

global:

 resolve_timeout: 5m   检测时间

 smtp_from: 'xxx@'    发送邮箱

 smtp_smarthost: 'smtp.:465'    邮箱smtp地址 必须带端口号

 smtp_auth_username: 'xxx@'     发送邮件账户名

 smtp_auth_password: 'MXVTTTZNKCWKFEER'  邮件激活码  非登入密码

 smtp_require_tls: false

route:

 group_by: ['alert']

 group_wait: 5s

 group_interval: 5s

 repeat_interval: 5s

 receiver: 'email'

receivers:

 - name: 'email'

   email_configs:

    - to: 'xxxx@'           发送邮箱地址                           

      send_resolved: true                   立即发送


touch cat /usr/local/docker/prometheus/node_rules.yml

cat /usr/local/docker/prometheus/node_rules.yml  

groups:

- name: node-up

 rules:

 - alert: node-up

   expr: up{job="promserver"} == 0    

   for: 5s

   labels:

     severity: 1

     team: node

   annotations:

     summary: "{{ $labels.instance }} 已停止运行超过 5s!"

https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/   报警规则参考官网连接


docker run -itd --name alert  \

 --restart=always  -p 9093:9093  \

 -v /usr/local/docker/alertmanager/alertmanager.yml:/etc/alertmanager/alertmanager.yml \

-v /etc/localtime:/etc/localtime --net=host  prom/alertmanager  \

--config.file=/etc/alertmanager/alertmanager.yml  && docker logs -f alert  | grep  9093


docker exec -it  --user=root  alert  /bin/sh  -c 'amtool check-config /etc/alertmanager/alertmanager.yml'      alertmanager语法检测

Checking '/etc/alertmanager/alertmanager.yml'  SUCCESS

Found:

- global config

- route

- 1 inhibit rules

- 1 receivers

- 0 templates


http://serverip:9093/#/alerts

 docker部署prometheus+grafana+alertmanager邮件报警_linux_07

http://serverip:9090/alerts

 docker部署prometheus+grafana+alertmanager邮件报警_docker_08

http://serverip:9090/rules

 docker部署prometheus+grafana+alertmanager邮件报警_docker_09


测试报警邮件报警

docker  stop promnode   测试效果  停止 node-exporter容器


http://serverip:9090/targets

 docker部署prometheus+grafana+alertmanager邮件报警_linux_10


http://serverip:9090/alerts

 docker部署prometheus+grafana+alertmanager邮件报警_linux_11

登入163邮箱查看是否收到报警邮件

 docker部署prometheus+grafana+alertmanager邮件报警_docker_12

以上操作均为prometheus_server


配置图形化工具grafana

docker run -itd -p 3000:3000 \

 --restart=always  -v /etc/localtime:/etc/localtime \

 --name grafana  --net=host  grafana/grafana &&  docker logs -f grafana | grep 3000


logger=http.server address=[::]:3000 protocol=http subUrl= socket=

查看grafana日志    prometheus_server部署


netstat -tuplna | grep LISTEN

tcp6       0      0 :::9090       :::*    LISTEN     57258/prometheus

tcp6       0      0 :::9100       :::*    LISTEN     102907/node_exporte

tcp6       0      0 :::8080       :::*    LISTEN     57855/cadvisor

tcp6       0      0 :::9093       :::*    LISTEN     60201/alertmanager

tcp6       0      0 :::9094       :::*    LISTEN      128823/alertmanager

tcp6       0      0 :::3000       :::*    LISTEN     52855/grafana-serve


http://serverip:3000/login

 docker部署prometheus+grafana+alertmanager邮件报警_linux_13

 用户名 admin  密码 admin 


 docker部署prometheus+grafana+alertmanager邮件报警_docker_14


 docker部署prometheus+grafana+alertmanager邮件报警_docker_15


 docker部署prometheus+grafana+alertmanager邮件报警_docker_16


 docker部署prometheus+grafana+alertmanager邮件报警_docker_17


 docker部署prometheus+grafana+alertmanager邮件报警_linux_18


 docker部署prometheus+grafana+alertmanager邮件报警_docker_19

其他模板从grafana官网下载导入模板即可

 docker部署prometheus+grafana+alertmanager邮件报警_linux_20


 docker部署prometheus+grafana+alertmanager邮件报警_docker_21

https://grafana.com/grafana/dashboards/   监控模板下载地址