安装服务端
演示环境:http://116.85.46.86/ 用户名:demo,密码:demo.2021
提供了一个一键部署的命令,如果操作系统是centos的话,可以执行下面命令一键安装
# 需要以root权限执行,机器需要可以连接互联网
# 安装脚本做了3件事情
# 1. 安装promethues作为存储,夜莺支持对接多种存储,我们选择单机版Prometheus来快速开始
# 2. 安装mysql,root默认密码为1234
# 3. 安装n9e-server
curl -s http://116.85.64.82/install_n9e_server.sh|bash
# 进程如果启动了,理论上会监听2个端口,一个http端口一个rpc端口
# 通过下面命令可以查看端口是否在监听,如果端口都在监听,就说明启动成功
ss -tlnp|grep n9e-server
安装脚本的详细内容如下,如果机器的操作系统不是centos,可以根据自己的需求来做调整
#!/bin/bash
# 1.安装promethues作为存储,夜莺支持对接多种存储,我们选择单机版Prometheus来快速开始
mkdir -p /opt/prometheus
wget https://s3-gz01.didistatic.com/n9e-pub/prome/prometheus-2.28.0.linux-amd64.tar.gz -O prometheus-2.28.0.linux-amd64.tar.gz
tar xf prometheus-2.28.0.linux-amd64.tar.gz
cp -far prometheus-2.28.0.linux-amd64/* /opt/prometheus/
# service
cat <<EOF >/etc/systemd/system/prometheus.service
[Unit]
Description="prometheus"
Documentation=https://prometheus.io/
After=network.target
[Service]
Type=simple
ExecStart=/opt/prometheus/prometheus --config.file=/opt/prometheus/prometheus.yml --storage.tsdb.path=/opt/prometheus/data --web.enable-lifecycle --enable-feature=remote-write-receiver --query.lookback-delta=2m
Restart=on-failure
RestartSecs=5s
SuccessExitStatus=0
LimitNOFILE=65536
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=prometheus
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable prometheus
systemctl restart prometheus
systemctl status prometheus
# 2.安装mysql,root默认密码为1234
yum -y install mariadb*
# 假设机器的/home分区是个SSD的大分区,datadir设置为/home/mysql
# mkdir -p /home/mysql
# chown mysql:mysql /home/mysql
# sed -i '/^datadir/s/^.*$/datadir=\/home\/mysql/g' /etc/my.cnf
# 启动mysql进程
systemctl start mariadb.service
# 将mysql设置为开机自启动
systemctl enable mariadb.service
# 设置mysql root密码
mysql -e "SET PASSWORD FOR 'root'@'localhost' = PASSWORD('1234');"
# 安装 notify.py 依赖
pip install bottle
pip install requests
# 3.安装n9e-server
mkdir -p /opt/n9e
cd /opt/n9e
wget 116.85.64.82/n9e-server-5.0.0-rc6.tar.gz
tar zxvf n9e-server-5.0.0-rc6.tar.gz
mysql -uroot -p1234 < /opt/n9e/server/sql/n9e.sql
cp /opt/n9e/server/etc/service/n9e-server.service /etc/systemd/system/
systemctl daemon-reload
systemctl enable n9e-server
systemctl restart n9e-server
systemctl status n9e-server
启动正常的情况下,ip+8000即可访问到服务
安装采集器
- 下载zip
https://github.com/n9e/n9e-agentd/releases/download/v5.0.0-rc8/n9e-agentd-5.0.0-rc8.linux.amd64.tar.gz- 解压缩
tar xzvf n9e-agentd-x.x.x.linux.amd64.tar.gz- 创建目录
mkdir -p /opt/n9e
移动目录
mv ./linux-amd64 /opt/n9e/agentd- 备份初始配置
cp /opt/n9e/agentd/etc/agentd.yaml.default /opt/n9e/agentd/etc/agentd.yaml- 修改配置
vim /opt/n9e/agentd/etc/agentd.yaml
- 启动
/opt/n9e/agentd/bin/n9e-agentd -c /opt/n9e/agentd/etc/agentd.yaml- 启动命令添加到系统命令
cp -a misc/systemd/n9e-agentd.service /usr/lib/systemd/system/- 开启开机启动
systemctl enable n9e-agentd- 重启测试
systemctl restart n9e-agentd
以上步骤昨晚即可看到
安装完毕后的运行日志在/opt/n9e/server/logs/目录下,修改/opt/n9e/server/etc/server.yml文件,调整日志输出级别,否则DEBUG文件太大
前缀匹配
导入默认告警监控策略
创建端口监听
修改告警策略通知,设置告警通知人或告警通知团队(须在个人信息中填写邮箱信息)
当触发告警策略是就会指定脚本(/opt/n9e/server/etc/script/notify.py(脚本可自定义通知媒体介质))发送邮件通知
告警触发预览
导入监控大盘
可自定义指标查看数据
告警策略触发模式
服务安装路径
/opt/n9e/server
启动命令: systemctl start n9e-server
日志输出目录: /opt/n9e/server/logs
采集器安装路径
/opt/n9e/agentd
启动命令: systemctl start n9e-agentd
journalctl -u n9e-agentd -f
可用于查看插件采集执行的状态
配置钉钉群机器人
在通知时同样调用 (/opt/n9e/server/etc/script/notify.py)此脚本,通知不成功,可以查看日志 /opt/n9e/server/logs
钉钉群机器人通知预览
监测HTTP状态码告警
使用prometheus结合blackbox_exporter进行远程探测服务状态
- 下载探测器
wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.19.0/blackbox_exporter-0.19.0.linux-amd64.tar.gz- 探测器配置
准备blackbox_exporter所用的配置,姑且叫做blackbox.yml
cat <<"EOF" > blackbox.yml
modules:
http_2xx:
prober: http
http_post_2xx:
prober: http
http:
method: POST
tcp_connect:
prober: tcp
pop3s_banner:
prober: tcp
tcp:
query_response:
- expect: "^+OK"
tls: true
tls_config:
insecure_skip_verify: false
ssh_banner:
prober: tcp
tcp:
query_response:
- expect: "^SSH-2.0-"
irc_banner:
prober: tcp
tcp:
query_response:
- send: "NICK prober"
- send: "USER prober prober prober :prober"
- expect: "PING :([^ ]+)"
send: "PONG ${1}"
- expect: "^:[^ ]+ 001"
icmp:
prober: icmp
EOF
- 启动探测器
这是使用systemd托管blackbox_exporter,大家也可以自行使用自己习惯的进程托管工具,比如supervisor、god之类的,也可以直接用nohup扔到后台。注意下面的ExecStart这一行配置,样例给的二进制和配置文件都放在了/opt/app/blackbox_exporter,请自行修改适配自己的环境。
cat <<"EOF" > /etc/systemd/system/blackbox_exporter.service
[Unit]
Description=blackbox_exporter Exporter
Wants=network-online.target
After=network-online.target
[Service]
ExecStart=/opt/n9e/blackbox_exporter/blackbox_exporter --config.file=/opt/n9e/blackbox_exporter/blackbox.yml
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=blackbox_exporter
[Install]
WantedBy=default.target
EOF
systemctl daemon-reload
systemctl restart blackbox_exporter
- 检查探测器
我们用blackbox探测器来探测一下baidu.com,看看能否返回metrics结果,如果有正常返回,就表示探测器正常
curl 'http://localhost:9115/probe?target=baidu.com&module=http_2xx'
- 配置抓取器
抓取器就是指prometheus,原理是:prometheus周期性调用blackbox_exporter的接口,让blackbox_exporter对目标地址做探测,对blackbox_exporter的探测结果做解析,生成时序数据写入时序库。
注意下面的配置都需要配置 prometheus 的 relabel 能力,具体原理请自行查阅。http接口的探测配置,使用http_2xx模块,GET方法:
- job_name: 'blackbox_http'
metrics_path: /probe
# 传入的参数,选用那个探测模块
params:
module: [http_2xx]
static_configs:
- targets:
- http://prometheus.io # Target to probe with http.
- https://www.baidu.com # Target to probe with https.
- http://localhost:3000 # Target to probe with http on port 3000.
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 127.0.0.1:9115 # The blackbox exporter's real hostname:port.
配置玩这些,访问9090 记忆出境可以查看到我们索配置的监控的服务的状态了
- 最后想要多增加监听服务,只需要修改/opt/prometheus/prometheus.yml.
- 重启prometheus systemctl restart prometheus