prometheus-配置文件
global、rule_files、remote_read、remote_write
一、global(全局配置)
global:
# 抓取指标的间隔,默认1m
scrape_interval: 15s
# 抓取指标的超时时间,默认10s
scrape_timeout: 10s
# 指定Prometheus评估规则的频率[记录规则(record)和告警规则(alert)],默认1m.
# 可以理解为执行规则的时间间隔
evaluation_interval: 30s
# 用于区分不同的prometheus
external_labels:
prometheus: test
# PromQL查询记录日志文件。重新加载配置会重新打开文件。
query_log_file: /tmp/query.log
二、rule_files(规则配置)
这里介绍一下prometheus支持的两种规则:
- 记录规则(recording rules):允许预先计算使用频繁且开销大的表达式,并将结果保存为一个新的时间序列数据,然后查询的时候就不会耗费太多的系统资源和加快查询速度。
- 警报规则(alerting rules): 这个就是自定义告警规则的
# 加载指定的规则文件
rule_files:
- "first.rules"
- "my/*.rules"
Prometheus支持两种类型的规则:记录规则和警报规则。 要在Prometheus中包含规则,请创建一个包含必要规则语句的文件,并让Prometheus通过Prometheus配置中的rule_files
字段加载规则文件。
通过将SIGHUP
发送到Prometheus进程,可以在运行时重新加载规则文件。 这些更改仅适用于所有规则文件格式良好的情况。
语法检查规则
要在不启动Prometheus进程的情况下快速检查规则文件是否在语法上正确,可以通过安装并运行Prometheus的promtool命令行工具来校验:
go get github.com/prometheus/prometheus/cmd/promtool
使用例子
[root@fabric-cli prometheus-2.2.1.linux-amd64]# ls -l
总用量 108104
drwxrwxr-x 2 1000 1000 38 3月 14 22:14 console_libraries
drwxrwxr-x 2 1000 1000 173 3月 14 22:14 consoles
drwxr-xr-x 5 root root 85 5月 12 00:05 data
-rw-rw-r-- 1 1000 1000 11357 3月 14 22:14 LICENSE
-rw-rw-r-- 1 1000 1000 2769 3月 14 22:14 NOTICE
-rwxr-xr-x 1 1000 1000 66176282 3月 14 22:17 prometheus
-rw-r--r-- 1 root root 167 5月 4 10:47 prometheus.rules.yml
-rw-rw-r-- 1 1000 1000 879 5月 4 10:49 prometheus.yml
-rwxr-xr-x 1 1000 1000 44492910 3月 14 22:18 promtool
[root@fabric-cli prometheus-2.2.1.linux-amd64]# ./promtool check rules prometheus.rules.yml
Checking prometheus.rules.yml
SUCCESS: 1 rules found
规则语法:
groups:
[ - <rule_group> ]
<rule_group>的语法
# 规则组名 必须是唯一的
name: <string>
# 规则评估间隔时间
[ interval: <duration> | default = global.evaluation_interval ]
rules:
[ - <rule> ... ]
<rule>的语法
# 收集的指标名称
record: <string>
# 评估时间
# evaluated at the current time, and the result recorded as a new set of
# time series with the metric name as given by 'record'.
expr: <string>
# Labels to add or overwrite before storing the result.
labels:
[ <labelname>: <labelvalue> ]
例子
groups:
- name: example
rules:
- record: job:http_inprogress_requests:sum
expr: sum(http_inprogress_requests) by (job)
groups:
- name: instance
rules:
- record: test:go_memstats_alloc_bytes:sum
expr: sum(go_memstats_alloc_bytes) by (instance)
- record: instance:node_cpu_load:1m
expr: node_load1
- record: instance:node_cpu_load:5m
expr: node_load5
- record: instance:node_cpu_load:15m
expr: node_load15
- record: instance:node_cpu_util:1m
expr: (1- (sum(increase(node_cpu_seconds_total{mode="idle"}[1m])) by(instance) /sum(increase(node_cpu_seconds_total[1m])) by(instance)))*100
另告警规则语法如下
# The name of the alert. Must be a valid metric name.
alert: <string>
# The PromQL expression to evaluate. Every evaluation cycle this is
# evaluated at the current time, and all resultant time series become
# pending/firing alerts.
expr: <string>
# Alerts are considered firing once they have been returned for this long.
# Alerts which have not yet fired for long enough are considered pending.
[ for: <duration> | default = 0s ]
# Labels to add or overwrite for each alert.
labels:
[ <labelname>: <tmpl_string> ]
# Annotations to add to each alert.
annotations:
[ <labelname>: <tmpl_string> ]
告警规则例子
groups:
- name: example
rules:
- alert: HighErrorRate
expr: job:request_latency_seconds:mean5m{job="myjob"} > 0.5
for: 10m
labels:
severity: page
annotations:
summary: High request latency
三、remote_read、remote_write(远程读写配置)
remote_write:
# 指定写入数据的url
- url: http://remote1/push
# 远程写配置的名称,如果指定,则在远程写配置中必须是唯一的。该名称将用于度量标准和日志记录中,代替生成的值,以帮助用户区分远程写入配置。
name: drop_expensive
# 远程写重新打标签配置
write_relabel_configs:
- source_labels: [__name__]
regex: expensive.*
action: drop
# 指定写入数据的第二个url
- url: http://remote2/push
name: rw_tls
# tls连接配置
tls_config:
cert_file: valid_cert_file
key_file: valid_key_file
remote_read:
# 指定读取数据的url
- url: http://remote1/read
# 表示近期数据也要从远程存储读取,因为Prometheus近期数据无论如何都是要读本地存储的。设置为true时,Prometheus会把本地和远程的数据进行Merge。默认是false,即从本地缓存查询近期数据.
read_recent: true
name: default
# 指定读取数据的第二个url
- url: http://remote3/read
# 从本地缓存查询近期数据
read_recent: false
name: read_special
# 可选的匹配器列表,必须存在于选择器中以查询远程读取端点。
required_matchers:
job: special
tls_config:
cert_file: valid_cert_file
key_file: valid_key_file