部署AlertManager
项目(java 11)有调用AlertManager(POST 9093/api/v1/alerts),进行部署。
参考
- 去prometheus官网https://prometheus.io/download/下载linux x86_64 版本alertmanager-0.25.0.linux-amd64.tar.gz。放到
/root
路径下,tar xvf
解压。进入解压后的目录,并通过cp
备份下yml文件(方便如果需要修改yml)。
准备一个数据路径,如/data/alertmanager/
。 - 在
/alertmanager-0.25.0.linux-amd64#
下启动
nohup ./alertmanager --log.level=info --log.format=json --web.listen-address=":9093" --config.file="/root/alertmanager-0.25.0.linux-amd64/alertmanager.yml" --storage.path="/data/alertmanager/" --data.retention=120h &
上面的参考给了日志重定向到指定文件,没有使用
nohup ./alertmanager --log.level=info --log.format=json --web.listen-address=":9093" --config.file="/root/alertmanager-0.25.0.linux-amd64/alertmanager.yml" --storage.path="/data/alertmanager/" --data.retention=120h &>>/opt/alertmanager-0.19/logs/alertmanager.log &
jobs
可以查看启动,nohup.out
查看启动日志。停止就用kill
杀掉pid。- postman尝试通过api发报警
参考
startAt:默认是接收告警事件的当前时间
endsAt:默认设置为可配置的超时时间,告警事件解决的时间
特别要注意,startsAt要小于当前时间,endsAt要大于当前时间,这样告警才不会invalid,页面才能显示。告警在过时后会从页面和list中消失。
不写时间也可以,GET就可以获取数据,也就知道了当前的时间。
POST http://127.0.0.1:9093/api/v1/alerts
[{
"labels": {
"alertname": "短信服务",
"instance": "192.168.1.1",
"job": "无",
"severity": "warning",
"team": "SMSService"
},
"annotations": {
"summary": "短信账号全部欠费了,无法切换可用服务,发不出短信"
},
"startsAt": "2023-02-13T16:54:52.37603417+08:00",
"endsAt": "2023-02-13T17:40:52.37603417+08:00"
}]
页面上可看到
GET
部署Prometheus
- 官网https://prometheus.io/download/下载,最新版本prometheus-2.42.0.linux-amd64.tar.gz。放到/root路径下,tar xvf解压。进入解压后的目录,并通过cp备份下yml文件(方便如果需要修改yml)。
- 修改yml文件。添加新的job,并且由于新job数据路径不是默认,也要添加新路径。同时设置alertManager。
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets: ['localhost:9093']
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ["localhost:9090"]
- job_name: "prometheus-client"
# metrics_path defaults to '/metrics'
metrics_path: '/metrics/prometheus'
# scheme defaults to 'http'.
static_configs:
- targets: ["localhost:8181"]
- 在
/prometheus-2.42.0.linux-amd64#
下启动,自动创建数据路径,/data
。
nohup ./prometheus --config.file="/root/prometheus-2.42.0.linux-amd64/prometheus.yml" &
启动后访问9090页面,在statics-targets中可以看到两个job状况,up就是正常
部署Grafana
参考https://cloud.tencent.com/developer/article/1807679
- 官网https://grafana.com/grafana/download?pg=get&plcmt=selfmanaged-box1-cta1下载。(ubuntu用deb比较好)。放到/root路径下。
- 安装。默认路径是
/usr/share/grafana/
。
sudo dpkg -i grafana<edition>_<version>_amd64.deb
- 启动服务
sudo systemctl daemon-reload
sudo systemctl start grafana-server
sudo systemctl status grafana-server
# 想开机自启动,配置
sudo systemctl enable grafana-server.service
启动后的结果
root@fh:~# sudo systemctl status grafana-server
● grafana-server.service - Grafana instance
Loaded: loaded (/usr/lib/systemd/system/grafana-server.service; disabled; vendor preset: enabled)
Active: active (running) since Mon 2023-02-20 09:18:54 CST; 18s ago
Docs: http://docs.grafana.org
Main PID: 71468 (grafana-server)
Tasks: 7 (limit: 8244)
CGroup: /system.slice/grafana-server.service
└─71468 /usr/sbin/grafana-server --config=/etc/grafana/grafana.ini --pidfile=/run/grafana/grafana-server.pid --packaging=deb cfg:default.paths.logs=/var/log/grafana cfg:default.paths.data=/var/lib/grafana cfg:default.paths.plugins=/var/lib/grafana/plugins cfg
2月 20 09:18:55 fh grafana-server[71468]: logger=server t=2023-02-20T09:18:55.879931227+08:00 level=info msg="Writing PID file" path=/run/grafana/grafana-server.pid pid=71468
2月 20 09:18:55 fh grafana-server[71468]: logger=provisioning.alerting t=2023-02-20T09:18:55.880529498+08:00 level=info msg="starting to provision alerting"
2月 20 09:18:55 fh grafana-server[71468]: logger=provisioning.alerting t=2023-02-20T09:18:55.880567595+08:00 level=info msg="finished to provision alerting"
2月 20 09:18:55 fh grafana-server[71468]: logger=grafanaStorageLogger t=2023-02-20T09:18:55.881289219+08:00 level=info msg="storage starting"
2月 20 09:18:55 fh grafana-server[71468]: logger=report t=2023-02-20T09:18:55.887325384+08:00 level=warn msg="Scheduling and sending of reports disabled, SMTP is not configured and enabled. Configure SMTP to enable."
2月 20 09:18:55 fh grafana-server[71468]: logger=http.server t=2023-02-20T09:18:55.888978444+08:00 level=info msg="HTTP Server Listen" address=[::]:3000 protocol=http subUrl= socket=
2月 20 09:18:55 fh grafana-server[71468]: logger=ngalert.state.manager t=2023-02-20T09:18:55.889264255+08:00 level=info msg="Warming state cache for startup"
2月 20 09:18:55 fh grafana-server[71468]: logger=ngalert.state.manager t=2023-02-20T09:18:55.898712507+08:00 level=info msg="State cache has been initialized" states=0 duration=9.446141ms
2月 20 09:18:55 fh grafana-server[71468]: logger=ticker t=2023-02-20T09:18:55.89930827+08:00 level=info msg=starting first_tick=2023-02-20T09:19:00+08:00
2月 20 09:18:55 fh grafana-server[71468]: logger=ngalert.multiorg.alertmanager t=2023-02-20T09:18:55.899460503+08:00 level=info msg="starting MultiOrg Alertmanager"
- 启动后访问3000页面。先从左侧配置数据源,选择prometheus,只需配置url。
- Grafana有丰富的仪表可使用,可以从https://grafana.com/grafana/dashboards选择。比如搜索jvm,选择JVM overview - Prometheus。import可以通过id(没成功)json(成功了)导入。(记得save!)
- 如果想选择prometheus的metrics,选择new dashboard。
apply之后,还可以通过add panel添加新的。