20201028prometheus查询语句学习
https://prometheus.fuckcloudnative.io/di-san-zhang-prometheus/di-4-jie-cha-xun/examples
简单的示例
简单的时间序列选择
返回度量指标prometheus_http_requests_total
的所有时间序列样本数据
prometheus_http_requests_total
#返回内容
prometheus_http_requests_total{code="200",handler="/api/v1/label/:name/values",instance="localhost:9090",job="prometheus"} 1
prometheus_http_requests_total{code="200",handler="/api/v1/query",instance="localhost:9090",job="prometheus"} 15
prometheus_http_requests_total{code="200",handler="/graph",instance="localhost:9090",job="prometheus"} 1
prometheus_http_requests_total{code="200",handler="/metrics",instance="localhost:9090",job="prometheus"} 7
prometheus_http_requests_total{code="200",handler="/static/*filepath",instance="localhost:9090",job="prometheus"} 2
返回度量指标为prometheus_http_requests_total
,且标签包含job="prometheus"
和handler="/metrics"
的所有时间序列样本数据:
prometheus_http_requests_total{job="prometheus",handler="/metrics"}
#返回结果
prometheus_http_requests_total{code="200",handler="/metrics",instance="localhost:9090",job="prometheus"} 13
返回度量指标为prometheus_http_requests_total
,且标签包含job="prometheus"
和handler="/metrics"
的最近2分钟内的序列样本数据:
prometheus_http_requests_total{job="prometheus",handler="/metrics"}[2m]
#返回结果 15s收集一次,2min收集8次,没毛病
#注意:这是一个区间向量,不能在graph页面展示
prometheus_http_requests_total{code="200",handler="/metrics",instance="localhost:9090",job="prometheus"} 26 @1603868399.722
27 @1603868414.722
28 @1603868429.722
29 @1603868444.722
30 @1603868459.722
31 @1603868474.722
32 @1603868489.722
33 @1603868504.722
使用正则表达式:匹配标签handler的值是/api开头的所有时间序列
prometheus_http_requests_total{handler=~"/api.*"}
#返回结果
prometheus_http_requests_total{code="200",handler="/api/v1/label/:name/values",instance="localhost:9090",job="prometheus"} 1
prometheus_http_requests_total{code="200",handler="/api/v1/query",instance="localhost:9090",job="prometheus"} 22
prometheus_http_requests_total{code="400",handler="/api/v1/query_range",instance="localhost:9090",job="prometheus"} 1
使用正则表达式:返回指标名称是prometheus_http_requests_total
,且http状态码不是200的所有时间序列
prometheus_http_requests_total{code!~"200"}
#返回结果
prometheus_http_requests_total{code="400",handler="/api/v1/query_range",instance="localhost:9090",job="prometheus"} 1
使用函数,操作符等
返回指标prometheus_http_requests_total
过去5分钟内的http请求平均增长率((最后一个值 - 第一个值)/300s):
rate(prometheus_http_requests_total{handler="/metrics"}[5m])
#返回结果 这个返回结果的单位是 请求个数/每秒
{code="200",handler="/metrics",instance="localhost:9090",job="prometheus"} 0.06666666666666667
返回度量指标prometheus_http_requests_total
过去5分钟内的http请求数的平均增长率之和,维度是job (通俗点的理解:将prometheus_http_requests_total
该指标过去5分钟的时间序列按job名字分组,每组下面的时间序列算出平均增长速率并求和,得到每个组的平均增长率)
sum(rate(prometheus_http_requests_total[5m])) by (job)
#结果如下 没办法,所有时间序列的标签都是job="prometheus",因为我们只部署了一个prometheus
{job="prometheus"} 0.09473684210526315
返回node节点已使用的内存,单位是MiB,并向下取整
floor((node_memory_MemTotal_bytes - node_memory_MemFree_bytes) /1024 /1024)
#返回结果
{instance="172.17.0.3:9100",job="node-exporter-1"} 1637
初识PromQL
表达式语言数据类型
在Prometheus的表达式语言中,表达式或子表达式包括以下四种类型之一:
- 瞬时向量(instant vector):一组时间序列,每个时间序列包含一个样本数据,他们共享相同是时间戳
- 区间向量(Range vector):一组时间序列,每个试卷序列包含一段时间范围的样本数据
- 标量(Scalar):一个浮点型的数据值
- 字符串(String):一个简单的字符串
时间序列过滤器
瞬时向量过滤器(花括号)
通过标签来进行过滤,支持正则表达式,例如返回指标名称是prometheus_http_requests_total
,且http状态码不是200的所有时间序列
prometheus_http_requests_total{code!~"200"}
运算符说明:
- = 等于
- != 不等于
- =~ 正则匹配
- !~ 正则不匹配
所有的 PromQL 表达式必须至少包含一个指标名称,或者一个不会匹配到空字符串的标签过滤器。
区间向量过滤器(方括号)
获取某一段时间内的时间序列。例如返回指标prometheus_http_requests_total
过去5分钟内的http请求平均增长率
prometheus_http_requests_total[5m]
时间单位说明:
- s 秒
- m 分
- h 小时
- d 天
- w 周
- y 年
时间位移操作(offset)
以当前时间为基准,获取一段时间之前的样本数据。例如 返回指标prometheus_http_requests_total
5分钟前的样本数据
prometheus_http_requests_total offset 5m
还可以对区间向量操作,例如 返回指标prometheus_http_requests_total
5分钟之前 5分钟内的样本数据(有点绕)
prometheus_http_requests_total[5m] offset 5m
操作符
二元运算符
算术二元运算符
- 加 +
- 减 -
- 乘 *
- 除 /
- 取余 %
- 幂 ^
布尔运算符
- == 相等
- != 不等
- 大于 >
- 小于 <
- 大于等于 >=
- 小于等于 <=
加布尔运算符与不加布尔运算符的区别:
#表达式1
prometheus_http_requests_total{handler="/metrics"} offset 5s > 100
#返回结果1
prometheus_http_requests_total{code="200",handler="/metrics",instance="localhost:9090",job="prometheus"} 305
#表达式2
prometheus_http_requests_total{handler="/metrics"} offset 5s > bool 100
#返回结果2
{code="200",handler="/metrics",instance="localhost:9090",job="prometheus"} 1
集合运算符
- and (交集)
- or (并集)
- unless 排除 (差集)
匹配模式
https://prometheus.fuckcloudnative.io/di-san-zhang-prometheus/di-4-jie-cha-xun/operators
两边的样本标签不一致,如何处理等
一对一
一对多
聚合操作
- sum 求和
- min 求最小值
- max 求最大值
- avg 求平均值
- stddev 标准差
- stdvar 标准差异
- count 计数
- count_values 对value进行计数
- bottomk 样本最小的k个元素
- topk 样本最大的k个元素
- quantile 分布统计
这些操作符被用于聚合所有标签维度,或者通过without或者by子语句来保留不通的维度
without用于从计算结果中移除列举的标签,而保留其他标签。by正好相反,结果向量中只保留列出的标签,其他标签则移除。通过without和by可以按照样本的问题对数据进行聚合。
二元运算符的优先级
1. ^
2. * / %
3. + -
4.== != > < >= <=
5.and unless
6.or
PromQL内置函数
- absent 如果指标有样本数据,就返回no data,没有样本数据,就返回1
- ceil 向上取整
- count 计算个数
- floor向下取整
- histogram_quantile 计算分位数
- increase 获取增长量
- irate 瞬时增长率。通过最后两个样本计算,rate函数是通过第一个和最后一个样本计算
- label_replace 为了能够让客户端的图标更具有可读性,可以通过
label_replace
函数为时间序列添加额外的标签
#具体参数
label_replace(v instant-vector, dst_label string, replacement string, src_label string, regex string)
#案例
label_replace(up, "host", "$1", "instance", "(.*):.*")
#该函数会依次对up指标里的每一条时间序列进行处理,通过正则表达式"(.*):.*"匹配所有标签,将instance字符串替换为host
#输出结果
up{host="localhost",instance="localhost:9090",job="prometheus"} 1
- predict_linear 预测时间序列 v 在 t 秒后的值
- rate 平均增长率
- sort 排序
- sum 求和
- _over_time()
avg_over_time(range-vector)
: 区间向量内每个度量指标的平均值。min_over_time(range-vector)
: 区间向量内每个度量指标的最小值。max_over_time(range-vector)
: 区间向量内每个度量指标的最大值。sum_over_time(range-vector)
: 区间向量内每个度量指标的求和。count_over_time(range-vector)
: 区间向量内每个度量指标的样本数据个数。quantile_over_time(scalar, range-vector)
: 区间向量内每个度量指标的样本数据值分位数,φ-quantile (0 ≤ φ ≤ 1)。stddev_over_time(range-vector)
: 区间向量内每个度量指标的总体标准差。stdvar_over_time(range-vector)
: 区间向量内每个度量指标的总体标准方差。
HTTP API中使用PromQL
Prometheus当前稳定的HTTP API可以通过/api/v1访问
API响应格式
Prometheus API使用了JSON格式的响应内容。调用成功返回2XX状态码。失败状态码有下面几种:
- 404 Bad Request:当参数错误或缺失时
- 422 Unprocessable Entity:当表达式无法执行时
- 503 Service Unavailable:当请求超时或被中断时候
JSON格式:
{
"status": "success" | "error",
"data": <data>,
// Only set if status is "error". The data field may still hold
// additional data.
"errorType": "<string>",
"error": "<string>"
}
瞬时数据查询
GET /api/v1/query
参数:
- query=: PromQL表达式
- time=: 指定用于计算PromQL的时间戳。可选。默认使用当前系统时间
- timeout=: 超时设置。可选参数,默认情况下使用-query,timeout的全局设置
计算表达式up
当前的计算结果
[root@david ~]$ curl -s 'http://127.0.0.1:9090/api/v1/query?query=up' | jq
{
"status": "success",
"data": {
"resultType": "vector",
"result": [
{
"metric": {
"__name__": "up",
"group": "production",
"instance": "172.17.0.5:8080",
"job": "example-random"
},
"value": [
1603954696.79,
"1"
]
},
{
"metric": {
"__name__": "up",
"group": "production",
"instance": "172.17.0.5:8081",
"job": "example-random"
},
"value": [
1603954696.79,
"1"
]
},
{
"metric": {
"__name__": "up",
"group": "test",
"instance": "172.17.0.5:8082",
"job": "example-random"
},
"value": [
1603954696.79,
"1"
]
},
{
"metric": {
"__name__": "up",
"instance": "172.17.0.3:9100",
"job": "node-exporter-1"
},
"value": [
1603954696.79,
"1"
]
},
{
"metric": {
"__name__": "up",
"instance": "localhost:9090",
"job": "prometheus"
},
"value": [
1603954696.79,
"1"
]
}
]
}
}
响应数据类型
{
"resultType": "matrix" | "vector" | "scalar" | "string",
"result": <value>
}
- vector:瞬时向量
- matrix:区间向量
- scalar:标量
- string:字符串
区间数据查询
GET /api/v1/query_range
请求参数:
- query=:PromQL表达式
- tart=:起始时间戳
- end=: 结束时间戳
- step=:查询步长
- timeout:超时设置。可选参数
演示,查询2分钟内的prometheus请求次数,间隔为30s:
#注意,prometheus用utc时间;时间戳用date +%s获取即可
[root@david ~]$ endtime=$(date +%s --utc);starttime=$(date +%s -d "-2 min" --utc)
[root@david ~]$ curl --location --request GET "http://127.0.0.1:9090/api/v1/query_range?query=prometheus_http_requests_total&start=${starttime}&end=${endtime}&step=30"
{"status":"success","data":{"resultType":"matrix","result":[{"metric":{"__name__":"prometheus_http_requests_total","code":"200","handler":"/api/v1/label/:name/values","instance":"localhost:9090","job":"prometheus"},"values":[[1603958295,"1"],[1603958325,"1"],[1603958355,"1"],[1603958385,"1"],[1603958415,"1"]]},{"metric":{"__name__":"prometheus_http_requests_total","code":"200","handler":"/api/v1/query","instance":"localhost:9090","job":"prometheus"},"values":[[1603958295,"34"],[1603958325,"34"],[1603958355,"34"],[1603958385,"34"],[1603958415,"34"]]},{"metric":{"__name__":"prometheus_http_requests_total","code":"200","handler":"/api/v1/query_range","instance":"localhost:9090","job":"prometheus"},"values":[[1603958295,"3"],[1603958325,"4"],[1603958355,"4"],[1603958385,"4"],[1603958415,"4"]]},{"metric":{"__name__":"prometheus_http_requests_total","code":"200","handler":"/metrics","instance":"localhost:9090","job":"prometheus"},"values":[[1603958295,"1273"],[1603958325,"1275"],[1603958355,"1277"],[1603958385,"1279"],[1603958415,"1281"]]},{"metric":{"__name__":"prometheus_http_requests_total","code":"200","handler":"/static/*filepath","instance":"localhost:9090","job":"prometheus"},"values":[[1603958295,"2"],[1603958325,"2"],[1603958355,"2"],[1603958385,"2"],[1603958415,"2"]]},{"metric":{"__name__":"prometheus_http_requests_total","code":"400","handler":"/api/v1/query","instance":"localhost:9090","job":"prometheus"},"values":[[1603958295,"9"],[1603958325,"9"],[1603958355,"9"],[1603958385,"9"],[1603958415,"9"]]},{"metric":{"__name__":"prometheus_http_requests_total","code":"400","handler":"/api/v1/query_range","instance":"localhost:9090","job":"prometheus"},"values":[[1603958295,"7"],[1603958325,"7"],[1603958355,"7"],[1603958385,"7"],[1603958415,"7"]]},{"metric":{"__name__":"prometheus_http_requests_total","code":"404","handler":"/static/*filepath","instance":"localhost:9090","job":"prometheus"},"values":[[1603958295,"2"],[1603958325,"2"],[1603958355,"2"],[1603958385,"2"],[1603958415,"2"]]}]}}
查询元数据
通过标签选择器查询对应的时间序列
GET /api/v1/series
请求参数:
- <series_selector>
: 表示标签选择器是
series_selector。必须至少提供一个
match[]` 参数。 -
start=<rfc3339 | unix_timestamp>
: 起始时间戳。 -
end=<rfc3339 | unix_timestamp>
: 结束时间戳。
查询标签值
GET /api/v1/label/<label_name>/values
[root@david ~]$ curl -s 'http://127.0.0.1:9090/api/v1/label/handler/values' | jq
{
"status": "success",
"data": [
"/",
"/-/reload",
"/alerts",
"/api/v1/label/:name/values",
"/api/v1/query",
"/api/v1/query_range",
"/api/v1/series",
"/graph",
"/metrics",
"/static/*filepath",
"/targets"
]
}
获取所有标签名称
[root@david ~]$ curl -s 'http://127.0.0.1:9090/api/v1/labels' | jq
{
"status": "success",
"data": [
"__name__",
"address",
"branch",
"broadcast",
"call",
"cause",
"code",
"collector",
"config",
"cpu",
"device",
"dialer_name",
"domainname",
"duplex",
"endpoint",
"event",
"fstype",
"goversion",
"group",
"handler",
"instance",
"interval",
"ip",
"job",
"le",
"listener_name",
"machine",
"mode",
"mountpoint",
"name",
"nodename",
"operstate",
"path",
"quantile",
"queue",
"reason",
"release",
"revision",
"role",
"rule_group",
"scrape_job",
"service",
"slice",
"sysname",
"type",
"version"
]
}
查询目标target
[root@david ~]$ curl -s 'http://127.0.0.1:9090/api/v1/targets' | jq
{
"status": "success",
"data": {
"activeTargets": [
{
"discoveredLabels": {
"__address__": "172.17.0.5:8082",
"__metrics_path__": "/metrics",
"__scheme__": "http",
"group": "test",
"job": "example-random"
},
"labels": {
"group": "test",
"instance": "172.17.0.5:8082",
"job": "example-random"
},
"scrapePool": "example-random",
"scrapeUrl": "http://172.17.0.5:8082/metrics",
"globalUrl": "http://172.17.0.5:8082/metrics",
"lastError": "",
"lastScrape": "2020-10-29T08:16:43.5058997Z",
"lastScrapeDuration": 0.0018196,
"health": "up"
},
{
"discoveredLabels": {
"__address__": "172.17.0.5:8080",
"__metrics_path__": "/metrics",
"__scheme__": "http",
"group": "production",
"job": "example-random"
},
"labels": {
"group": "production",
"instance": "172.17.0.5:8080",
"job": "example-random"
},
"scrapePool": "example-random",
"scrapeUrl": "http://172.17.0.5:8080/metrics",
"globalUrl": "http://172.17.0.5:8080/metrics",
"lastError": "",
"lastScrape": "2020-10-29T08:16:40.5601044Z",
"lastScrapeDuration": 0.0102266,
"health": "up"
},
{
"discoveredLabels": {
"__address__": "172.17.0.5:8081",
"__metrics_path__": "/metrics",
"__scheme__": "http",
"group": "production",
"job": "example-random"
},
"labels": {
"group": "production",
"instance": "172.17.0.5:8081",
"job": "example-random"
},
"scrapePool": "example-random",
"scrapeUrl": "http://172.17.0.5:8081/metrics",
"globalUrl": "http://172.17.0.5:8081/metrics",
"lastError": "",
"lastScrape": "2020-10-29T08:16:44.3559787Z",
"lastScrapeDuration": 0.0029326,
"health": "up"
},
{
"discoveredLabels": {
"__address__": "172.17.0.3:9100",
"__metrics_path__": "/metrics",
"__scheme__": "http",
"job": "node-exporter-1"
},
"labels": {
"instance": "172.17.0.3:9100",
"job": "node-exporter-1"
},
"scrapePool": "node-exporter-1",
"scrapeUrl": "http://172.17.0.3:9100/metrics",
"globalUrl": "http://172.17.0.3:9100/metrics",
"lastError": "",
"lastScrape": "2020-10-29T08:16:32.9358176Z",
"lastScrapeDuration": 0.0147601,
"health": "up"
},
{
"discoveredLabels": {
"__address__": "localhost:9090",
"__metrics_path__": "/metrics",
"__scheme__": "http",
"job": "prometheus"
},
"labels": {
"instance": "localhost:9090",
"job": "prometheus"
},
"scrapePool": "prometheus",
"scrapeUrl": "http://localhost:9090/metrics",
"globalUrl": "http://40c6141831f7:9090/metrics",
"lastError": "",
"lastScrape": "2020-10-29T08:16:44.7169385Z",
"lastScrapeDuration": 0.003988,
"health": "up"
}
],
"droppedTargets": []
}
}
查村prometheus服务器运行信息
[root@david ~]$ curl -s 'http://127.0.0.1:9090/api/v1/status/runtimeinfo' | jq
{
"status": "success",
"data": {
"startTime": "2020-10-29T02:39:47.5187166Z",
"CWD": "/prometheus",
"reloadConfigSuccess": true,
"lastConfigTime": "2020-10-29T02:39:47Z",
"corruptionCount": 0,
"goroutineCount": 41,
"GOMAXPROCS": 2,
"GOGC": "",
"GODEBUG": "",
"storageRetention": "15d"
}
}
返回TSDB相关的一些信息
[root@david ~]$ curl -s 'http://127.0.0.1:9090/api/v1/status/tsdb' | jq
{
"status": "success",
"data": {
"headStats": {
"numSeries": 1640,
"chunkCount": 9886,
"minTime": 1603951200000,
"maxTime": 1603959549668
},
"seriesCountByMetricName": [
{
"name": "prometheus_http_request_duration_seconds_bucket",
"value": 90
},
{
"name": "prometheus_http_response_size_bytes_bucket",
"value": 81
......
规则
[root@david ~]$ curl -s 'http://127.0.0.1:9090/api/v1/rules' | jq
{
"status": "success",
"data": {
"groups": [
{
"name": "example",
"file": "/etc/prometheus/prometheus.rules.yml",
"rules": [
{
"name": "job_service:rpc_durations_seconds_count:avg_rate5m",
"query": "avg by(job, service) (rate(rpc_durations_seconds_count[5m]))",
"health": "ok",
"evaluationTime": 0.0005387,
"lastEvaluation": "2020-10-29T08:22:09.7348335Z",
"type": "recording"
}
],
"interval": 15,
"evaluationTime": 0.0005477,
"lastEvaluation": "2020-10-29T08:22:09.7348282Z"
}
]
}
}
警报
[root@david ~]$ curl -s 'http://127.0.0.1:9090/api/v1/alerts' | jq
{
"status": "success",
"data": {
"alerts": []
}
}
本文由博客群发一文多发等运营工具平台 OpenWrite 发布