Prometheus+Grafana+Alertmanager监控部署

1、环境准备

Prometheus部署

node-export 部署

编辑Prometheus.yml

Grafana 安装部署

部署Alertmanger 告警

1、环境准备

阿里云ECS实例*1 4C 8G 系统盘100G 数据盘 300G

Prometheus+Grafana+Alermanager

开启防火墙对应端口 :

systemctl start firewalld

firewall-cmd --zone=pubic --add-port=9090/tcp --premanent #Prometheus端口

firewall-cmd --zone=pubic --add-port=3000/tcp --premanent #grafana看板端口

firewall-cmd --zone=pubic --add-port=9100/tcp --premanent #node-export 端口

firewall-cmd --zone=pubic --add-port=9093/tcp --premanent # Alermanager 端口

firewall-cmd --zone=pubic --add-port=9094/tcp --premanent # Alermanager 端口

firewall-cmd --reload #重载防火墙,生效

1

2

3

4

5

6

7

Prometheus部署

下载Prometheus次新稳定版本 : https://prometheus.io/download/

Prometheus基础环境配置

1、 yum -y install golang nodejs yarn

yum -y install golang nodejs yarn ## yum安装golang不需要可以不需要再/etc/profile配置环境变量,

1

如果使用编译安装,则需要再/etc/profile 设置环境变量,示例:

export GOROOT=/usr/lib/golang

export GOPATH=/root/Work/programmer/go/gopath/

export PATH=$PATH:$GOROOT/bin:$GOPATH/bin

1

2

3

2、解压Prometheus的tar包,我这里下载的是2.26

tar -zxvf prometheus-2.26.0.linux-amd64.tar.gz

mv prometheus-2.26.0.linux-amd64 /usr/local/

1

2

3、运行Prometheus有两种方式

1)使用nohup ./prometheus >> out.log &

但是使用nohup 如果需要关闭或重启Prometheus,需要ps -aux | grep prometheus 查询Prometheus进程的PID ,手动kill -9 进行关闭进程

2)编写system文件

vim /usr/lib/systemd/system/prometheus.service

[Unit]

Description=Prometheus

Documentation=https://prometheus.io/

After=network.target

[Service]

Type=simple

User=prometheus

ExecStart=/usr/local/prometheus-2.26.0.linux-amd64/prometheus --config.file=/usr/local/prometheus-2.26.0.linux-amd64/prometheus.yml --storage.tsdb.path=/var/lib/prometheus --storage.tsdb.retention=180d --web.enable-admin-api

# storage.tsdb.path手动指定Prometheus的存储位置 config.file指定Prometheus的配置文件 ExecStart指定Prometheus运行文件

Restart=on-failure

[Install]

WantedBy=multi-user.target

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

配置完system文件之后

systemctl start prometheus

systemctl enable prometheus

node-export 部署

开启Prometheus node端

下载 node_exporter-1.1.2.linux-amd64.tar.gz Linux监控node端

下载地址: https://github.com/prometheus/node_exporter

tar -zxvf node_exporter-1.1.2.linux-amd64.tar.gz

mv /usr/local/node_exporter-1.1.2.linux-amd64

cd /usr/local/node_exporter-1.1.2.linux-amd64

nohup ./node_exporter >> out.log &

1

2

3

4

编辑Prometheus.yml

编辑Prometheus.yml 文件

vim /usr/local/prometheus-2.26.0.linux-amd64/prometheus.yml

scrape_configs:

- job_name: 'prometheus'

static_configs:

- targets: ['localhost:9090']

- job_name: Linux_export

static_configs:

- targets: ['localhost:9100']

1

2

3

4

5

6

7

8

9

10

之后重启Prometheus

systemctl restart prometheus

访问 localhost:9090 查看Prometheus状态

Grafana 安装部署

下载grafana安装包

https://grafana.com/grafana/dow

yum localinstall grafana-7.3.6-1.x86_64.rpm

systemctl start grafana

systemctl enable grafana

1

2

启动grafana后,访问

localhost:3000 (默认端口3000) 查看grfana面板

grafana默认密码 admin admin 第一次登陆需要修改密码

grafana官网拥有许多模板可以选择,我们这边监控Linux的话,就选择模板编号为 8919

部署Alertmanger 告警

下载alermanager告警组件

https://prometheus.io/docs/alerting/latest/alertmanager/

这边用的是0.20.0版本

tar -zxvf alertmanager-0.20.0.linux-amd64.tar.gz

mv alertmanager-0.20.0.linux-amd64 /usr/local/alertmanager

vim /usr/local/alertmanager/alertmanager.yml

global:

resolve_timeout: 5m

smtp_smarthost: 'smtp.163.com:465'

smtp_from: 'xxxxxxxxx@163.com'

smtp_auth_username: 'xxxxxxxx@163.com'

smtp_auth_password: 'xxxxxxxxxxxxxxx'

smtp_require_tls: false

templates:

- '/alertmanager/template/*.tmpl'

route:

group_by: ['alertname']

group_wait: 30s

group_interval: 5m

repeat_interval: 10m

receiver: 'mail'

receivers:

- name: 'mail'

email_configs:

- to: 'xxxxxxxxx@xxx.com'

inhibit_rules:

- source_match:

severity: 'critical'

target_match:

severity: 'warning'

equal: ['alertname', 'dev', 'instance']

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

编辑system文件

vim /usr/lib/systemd/system/alertmanager.service

[Unit]

Description=alertmanager

Documentation=https://github.com/prometheus/alertmanager

After=network.target

[Service]

Type=simple

User=prometheus

ExecStart=/usr/local/alertmanager/alertmanager --config.file=/usr/local/alertmanager/alertmanager.yml

Restart=on-failure

[Install]

WantedBy=multi-user.target

1

2

3

4

5

6

7

8

9

10

11

12

13

systemctl start alermanage

systemctl enable alermanage

编辑Prometheus.yml 文件

vim /usr/local/prometheus-2.26.0.linux-amd64/prometheus.yml

alerting:

alertmanagers:

- static_configs:

- targets:

- localhost:9093 #指定本地alermanage端口

'evaluation_interval'.

rule_files:

- "/usr/local/prometheus-2.26.0.linux-amd64/rule/*.yml" #在Prometheus下定制的规则将用于告警

1

2

3

4

5

6

7

8

9

10

之后再Prometheus目录下,创建一个rule文件夹,将定义规则的 .yml文件 放入rule文件夹中 实现告警

以下以告警组件状态为例:

cd /usr/local/prometheus-2.26.0.linux-amd64/rule/

vim nodestatus_rule.yml

groups:

- name: alert-rules.yml

rules:

- alert: InstanceStatus

expr: up == 0

for: 10s

labels:

severity: "critical"

annotations:

summary: "主机 {{ $labels.instance }} 无响应"

description: "{{ $labels.instance }} 无响应 (当前值: {{ $value }}) "

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

之后重启Prometheus,就可以正常告警了

systemctl restart prometheus

1

如果想验证告警是否成功,可以手动关闭 node-export 验证是否有告警邮件发出

ps -aux | grep node

kill -9 PID

1

2

3

4

感谢观看,您的点赞是对我最大的支持

后续会陆续更新如何监控nginx、tomcat、MySQL、Oracle

以后会继续更新ansible相关教程。

————————————————