20201028prometheus查询语句学习

https://prometheus.fuckcloudnative.io/di-san-zhang-prometheus/di-4-jie-cha-xun/examples

简单的示例

简单的时间序列选择

返回度量指标prometheus_http_requests_total的所有时间序列样本数据

prometheus_http_requests_total

#返回内容
prometheus_http_requests_total{code="200",handler="/api/v1/label/:name/values",instance="localhost:9090",job="prometheus"}	1
prometheus_http_requests_total{code="200",handler="/api/v1/query",instance="localhost:9090",job="prometheus"}	15
prometheus_http_requests_total{code="200",handler="/graph",instance="localhost:9090",job="prometheus"}	1
prometheus_http_requests_total{code="200",handler="/metrics",instance="localhost:9090",job="prometheus"}	7
prometheus_http_requests_total{code="200",handler="/static/*filepath",instance="localhost:9090",job="prometheus"}	2

返回度量指标为prometheus_http_requests_total,且标签包含job="prometheus"handler="/metrics"的所有时间序列样本数据:

prometheus_http_requests_total{job="prometheus",handler="/metrics"}

#返回结果
prometheus_http_requests_total{code="200",handler="/metrics",instance="localhost:9090",job="prometheus"}	13

返回度量指标为prometheus_http_requests_total,且标签包含job="prometheus"handler="/metrics"的最近2分钟内的序列样本数据:

prometheus_http_requests_total{job="prometheus",handler="/metrics"}[2m]

#返回结果 15s收集一次,2min收集8次,没毛病
#注意:这是一个区间向量,不能在graph页面展示
prometheus_http_requests_total{code="200",handler="/metrics",instance="localhost:9090",job="prometheus"}	26 @1603868399.722
27 @1603868414.722
28 @1603868429.722
29 @1603868444.722
30 @1603868459.722
31 @1603868474.722
32 @1603868489.722
33 @1603868504.722

使用正则表达式:匹配标签handler的值是/api开头的所有时间序列

prometheus_http_requests_total{handler=~"/api.*"}

#返回结果
prometheus_http_requests_total{code="200",handler="/api/v1/label/:name/values",instance="localhost:9090",job="prometheus"}	1
prometheus_http_requests_total{code="200",handler="/api/v1/query",instance="localhost:9090",job="prometheus"}	22
prometheus_http_requests_total{code="400",handler="/api/v1/query_range",instance="localhost:9090",job="prometheus"}	1

使用正则表达式:返回指标名称是prometheus_http_requests_total,且http状态码不是200的所有时间序列

prometheus_http_requests_total{code!~"200"}

#返回结果
prometheus_http_requests_total{code="400",handler="/api/v1/query_range",instance="localhost:9090",job="prometheus"}	1
使用函数,操作符等

返回指标prometheus_http_requests_total过去5分钟内的http请求平均增长率((最后一个值 - 第一个值)/300s):

rate(prometheus_http_requests_total{handler="/metrics"}[5m])

#返回结果  这个返回结果的单位是 请求个数/每秒
{code="200",handler="/metrics",instance="localhost:9090",job="prometheus"}	0.06666666666666667

返回度量指标prometheus_http_requests_total过去5分钟内的http请求数的平均增长率之和,维度是job (通俗点的理解:将prometheus_http_requests_total该指标过去5分钟的时间序列按job名字分组,每组下面的时间序列算出平均增长速率并求和,得到每个组的平均增长率)

sum(rate(prometheus_http_requests_total[5m])) by (job)

#结果如下 没办法,所有时间序列的标签都是job="prometheus",因为我们只部署了一个prometheus
{job="prometheus"}	0.09473684210526315

返回node节点已使用的内存,单位是MiB,并向下取整

floor((node_memory_MemTotal_bytes - node_memory_MemFree_bytes) /1024 /1024)

#返回结果
{instance="172.17.0.3:9100",job="node-exporter-1"}	1637

初识PromQL

表达式语言数据类型

在Prometheus的表达式语言中,表达式或子表达式包括以下四种类型之一:

  • 瞬时向量(instant vector):一组时间序列,每个时间序列包含一个样本数据,他们共享相同是时间戳
  • 区间向量(Range vector):一组时间序列,每个试卷序列包含一段时间范围的样本数据
  • 标量(Scalar):一个浮点型的数据值
  • 字符串(String):一个简单的字符串
时间序列过滤器
瞬时向量过滤器(花括号)

通过标签来进行过滤,支持正则表达式,例如返回指标名称是prometheus_http_requests_total,且http状态码不是200的所有时间序列

prometheus_http_requests_total{code!~"200"}

运算符说明:

  • = 等于
  • != 不等于
  • =~ 正则匹配
  • !~ 正则不匹配

所有的 PromQL 表达式必须至少包含一个指标名称,或者一个不会匹配到空字符串的标签过滤器。

区间向量过滤器(方括号)

获取某一段时间内的时间序列。例如返回指标prometheus_http_requests_total过去5分钟内的http请求平均增长率

prometheus_http_requests_total[5m]

时间单位说明:

  • s 秒
  • m 分
  • h 小时
  • d 天
  • w 周
  • y 年
时间位移操作(offset)

以当前时间为基准,获取一段时间之前的样本数据。例如 返回指标prometheus_http_requests_total5分钟前的样本数据

prometheus_http_requests_total offset 5m

还可以对区间向量操作,例如 返回指标prometheus_http_requests_total5分钟之前 5分钟内的样本数据(有点绕)

prometheus_http_requests_total[5m] offset 5m

操作符

二元运算符
算术二元运算符
  • 加 +
  • 减 -
  • 乘 *
  • 除 /
  • 取余 %
  • 幂 ^
布尔运算符
  • == 相等
  • != 不等
  • 大于 >
  • 小于 <
  • 大于等于 >=
  • 小于等于 <=

加布尔运算符与不加布尔运算符的区别:

#表达式1
prometheus_http_requests_total{handler="/metrics"} offset 5s > 100
#返回结果1
prometheus_http_requests_total{code="200",handler="/metrics",instance="localhost:9090",job="prometheus"}	305

#表达式2
prometheus_http_requests_total{handler="/metrics"} offset 5s > bool 100
#返回结果2
{code="200",handler="/metrics",instance="localhost:9090",job="prometheus"}	1
集合运算符
  • and (交集)
  • or (并集)
  • unless 排除 (差集)
匹配模式

https://prometheus.fuckcloudnative.io/di-san-zhang-prometheus/di-4-jie-cha-xun/operators

两边的样本标签不一致,如何处理等

一对一
一对多
聚合操作
  • sum 求和
  • min 求最小值
  • max 求最大值
  • avg 求平均值
  • stddev 标准差
  • stdvar 标准差异
  • count 计数
  • count_values 对value进行计数
  • bottomk 样本最小的k个元素
  • topk 样本最大的k个元素
  • quantile 分布统计

这些操作符被用于聚合所有标签维度,或者通过without或者by子语句来保留不通的维度

without用于从计算结果中移除列举的标签,而保留其他标签。by正好相反,结果向量中只保留列出的标签,其他标签则移除。通过without和by可以按照样本的问题对数据进行聚合。

二元运算符的优先级
1. ^
2. * / %
3. + -
4.== != > < >= <=
5.and unless
6.or

PromQL内置函数

  • absent 如果指标有样本数据,就返回no data,没有样本数据,就返回1
  • ceil 向上取整
  • count 计算个数
  • floor向下取整
  • histogram_quantile 计算分位数
  • increase 获取增长量
  • irate 瞬时增长率。通过最后两个样本计算,rate函数是通过第一个和最后一个样本计算
  • label_replace 为了能够让客户端的图标更具有可读性,可以通过label_replace函数为时间序列添加额外的标签
#具体参数
label_replace(v instant-vector, dst_label string, replacement string, src_label string, regex string)
#案例
label_replace(up, "host", "$1", "instance",  "(.*):.*")
#该函数会依次对up指标里的每一条时间序列进行处理,通过正则表达式"(.*):.*"匹配所有标签,将instance字符串替换为host
#输出结果
up{host="localhost",instance="localhost:9090",job="prometheus"}   1
  • predict_linear 预测时间序列 v 在 t 秒后的值
  • rate 平均增长率
  • sort 排序
  • sum 求和
  • _over_time()
  • avg_over_time(range-vector) : 区间向量内每个度量指标的平均值。
    min_over_time(range-vector) : 区间向量内每个度量指标的最小值。
    max_over_time(range-vector) : 区间向量内每个度量指标的最大值。
    sum_over_time(range-vector) : 区间向量内每个度量指标的求和。
    count_over_time(range-vector) : 区间向量内每个度量指标的样本数据个数。
    quantile_over_time(scalar, range-vector) : 区间向量内每个度量指标的样本数据值分位数,φ-quantile (0 ≤ φ ≤ 1)。
    stddev_over_time(range-vector) : 区间向量内每个度量指标的总体标准差。
    stdvar_over_time(range-vector) : 区间向量内每个度量指标的总体标准方差。

HTTP API中使用PromQL


Prometheus当前稳定的HTTP API可以通过/api/v1访问

API响应格式

Prometheus API使用了JSON格式的响应内容。调用成功返回2XX状态码。失败状态码有下面几种:

  • 404 Bad Request:当参数错误或缺失时
  • 422 Unprocessable Entity:当表达式无法执行时
  • 503 Service Unavailable:当请求超时或被中断时候

JSON格式:

{
  "status": "success" | "error",
  "data": <data>,

  // Only set if status is "error". The data field may still hold
  // additional data.
  "errorType": "<string>",
  "error": "<string>"
}
瞬时数据查询
GET /api/v1/query

参数:

  • query=: PromQL表达式
  • time=: 指定用于计算PromQL的时间戳。可选。默认使用当前系统时间
  • timeout=: 超时设置。可选参数,默认情况下使用-query,timeout的全局设置

计算表达式up当前的计算结果

prometheus java api 读取数据 prometheus查询语句_时间序列

[root@david ~]$ curl -s  'http://127.0.0.1:9090/api/v1/query?query=up' | jq
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "__name__": "up",
          "group": "production",
          "instance": "172.17.0.5:8080",
          "job": "example-random"
        },
        "value": [
          1603954696.79,
          "1"
        ]
      },
      {
        "metric": {
          "__name__": "up",
          "group": "production",
          "instance": "172.17.0.5:8081",
          "job": "example-random"
        },
        "value": [
          1603954696.79,
          "1"
        ]
      },
      {
        "metric": {
          "__name__": "up",
          "group": "test",
          "instance": "172.17.0.5:8082",
          "job": "example-random"
        },
        "value": [
          1603954696.79,
          "1"
        ]
      },
      {
        "metric": {
          "__name__": "up",
          "instance": "172.17.0.3:9100",
          "job": "node-exporter-1"
        },
        "value": [
          1603954696.79,
          "1"
        ]
      },
      {
        "metric": {
          "__name__": "up",
          "instance": "localhost:9090",
          "job": "prometheus"
        },
        "value": [
          1603954696.79,
          "1"
        ]
      }
    ]
  }
}
响应数据类型
{
  "resultType": "matrix" | "vector" | "scalar" | "string",
  "result": <value>
}
  • vector:瞬时向量
  • matrix:区间向量
  • scalar:标量
  • string:字符串
区间数据查询
GET /api/v1/query_range

请求参数:

  • query=:PromQL表达式
  • tart=:起始时间戳
  • end=: 结束时间戳
  • step=:查询步长
  • timeout:超时设置。可选参数

演示,查询2分钟内的prometheus请求次数,间隔为30s:

#注意,prometheus用utc时间;时间戳用date +%s获取即可
[root@david ~]$ endtime=$(date +%s --utc);starttime=$(date +%s -d "-2 min" --utc)
[root@david ~]$ curl --location --request GET "http://127.0.0.1:9090/api/v1/query_range?query=prometheus_http_requests_total&start=${starttime}&end=${endtime}&step=30"
{"status":"success","data":{"resultType":"matrix","result":[{"metric":{"__name__":"prometheus_http_requests_total","code":"200","handler":"/api/v1/label/:name/values","instance":"localhost:9090","job":"prometheus"},"values":[[1603958295,"1"],[1603958325,"1"],[1603958355,"1"],[1603958385,"1"],[1603958415,"1"]]},{"metric":{"__name__":"prometheus_http_requests_total","code":"200","handler":"/api/v1/query","instance":"localhost:9090","job":"prometheus"},"values":[[1603958295,"34"],[1603958325,"34"],[1603958355,"34"],[1603958385,"34"],[1603958415,"34"]]},{"metric":{"__name__":"prometheus_http_requests_total","code":"200","handler":"/api/v1/query_range","instance":"localhost:9090","job":"prometheus"},"values":[[1603958295,"3"],[1603958325,"4"],[1603958355,"4"],[1603958385,"4"],[1603958415,"4"]]},{"metric":{"__name__":"prometheus_http_requests_total","code":"200","handler":"/metrics","instance":"localhost:9090","job":"prometheus"},"values":[[1603958295,"1273"],[1603958325,"1275"],[1603958355,"1277"],[1603958385,"1279"],[1603958415,"1281"]]},{"metric":{"__name__":"prometheus_http_requests_total","code":"200","handler":"/static/*filepath","instance":"localhost:9090","job":"prometheus"},"values":[[1603958295,"2"],[1603958325,"2"],[1603958355,"2"],[1603958385,"2"],[1603958415,"2"]]},{"metric":{"__name__":"prometheus_http_requests_total","code":"400","handler":"/api/v1/query","instance":"localhost:9090","job":"prometheus"},"values":[[1603958295,"9"],[1603958325,"9"],[1603958355,"9"],[1603958385,"9"],[1603958415,"9"]]},{"metric":{"__name__":"prometheus_http_requests_total","code":"400","handler":"/api/v1/query_range","instance":"localhost:9090","job":"prometheus"},"values":[[1603958295,"7"],[1603958325,"7"],[1603958355,"7"],[1603958385,"7"],[1603958415,"7"]]},{"metric":{"__name__":"prometheus_http_requests_total","code":"404","handler":"/static/*filepath","instance":"localhost:9090","job":"prometheus"},"values":[[1603958295,"2"],[1603958325,"2"],[1603958355,"2"],[1603958385,"2"],[1603958415,"2"]]}]}}
查询元数据

通过标签选择器查询对应的时间序列

GET /api/v1/series

请求参数:

  • <series_selector>: 表示标签选择器是series_selector。必须至少提供一个match[]` 参数。
  • start=<rfc3339 | unix_timestamp> : 起始时间戳。
  • end=<rfc3339 | unix_timestamp> : 结束时间戳。
查询标签值
GET /api/v1/label/<label_name>/values
[root@david ~]$ curl -s 'http://127.0.0.1:9090/api/v1/label/handler/values' | jq
{
  "status": "success",
  "data": [
    "/",
    "/-/reload",
    "/alerts",
    "/api/v1/label/:name/values",
    "/api/v1/query",
    "/api/v1/query_range",
    "/api/v1/series",
    "/graph",
    "/metrics",
    "/static/*filepath",
    "/targets"
  ]
}
获取所有标签名称
[root@david ~]$ curl -s 'http://127.0.0.1:9090/api/v1/labels' | jq
{
  "status": "success",
  "data": [
    "__name__",
    "address",
    "branch",
    "broadcast",
    "call",
    "cause",
    "code",
    "collector",
    "config",
    "cpu",
    "device",
    "dialer_name",
    "domainname",
    "duplex",
    "endpoint",
    "event",
    "fstype",
    "goversion",
    "group",
    "handler",
    "instance",
    "interval",
    "ip",
    "job",
    "le",
    "listener_name",
    "machine",
    "mode",
    "mountpoint",
    "name",
    "nodename",
    "operstate",
    "path",
    "quantile",
    "queue",
    "reason",
    "release",
    "revision",
    "role",
    "rule_group",
    "scrape_job",
    "service",
    "slice",
    "sysname",
    "type",
    "version"
  ]
}
查询目标target
[root@david ~]$ curl -s 'http://127.0.0.1:9090/api/v1/targets' | jq
{
  "status": "success",
  "data": {
    "activeTargets": [
      {
        "discoveredLabels": {
          "__address__": "172.17.0.5:8082",
          "__metrics_path__": "/metrics",
          "__scheme__": "http",
          "group": "test",
          "job": "example-random"
        },
        "labels": {
          "group": "test",
          "instance": "172.17.0.5:8082",
          "job": "example-random"
        },
        "scrapePool": "example-random",
        "scrapeUrl": "http://172.17.0.5:8082/metrics",
        "globalUrl": "http://172.17.0.5:8082/metrics",
        "lastError": "",
        "lastScrape": "2020-10-29T08:16:43.5058997Z",
        "lastScrapeDuration": 0.0018196,
        "health": "up"
      },
      {
        "discoveredLabels": {
          "__address__": "172.17.0.5:8080",
          "__metrics_path__": "/metrics",
          "__scheme__": "http",
          "group": "production",
          "job": "example-random"
        },
        "labels": {
          "group": "production",
          "instance": "172.17.0.5:8080",
          "job": "example-random"
        },
        "scrapePool": "example-random",
        "scrapeUrl": "http://172.17.0.5:8080/metrics",
        "globalUrl": "http://172.17.0.5:8080/metrics",
        "lastError": "",
        "lastScrape": "2020-10-29T08:16:40.5601044Z",
        "lastScrapeDuration": 0.0102266,
        "health": "up"
      },
      {
        "discoveredLabels": {
          "__address__": "172.17.0.5:8081",
          "__metrics_path__": "/metrics",
          "__scheme__": "http",
          "group": "production",
          "job": "example-random"
        },
        "labels": {
          "group": "production",
          "instance": "172.17.0.5:8081",
          "job": "example-random"
        },
        "scrapePool": "example-random",
        "scrapeUrl": "http://172.17.0.5:8081/metrics",
        "globalUrl": "http://172.17.0.5:8081/metrics",
        "lastError": "",
        "lastScrape": "2020-10-29T08:16:44.3559787Z",
        "lastScrapeDuration": 0.0029326,
        "health": "up"
      },
      {
        "discoveredLabels": {
          "__address__": "172.17.0.3:9100",
          "__metrics_path__": "/metrics",
          "__scheme__": "http",
          "job": "node-exporter-1"
        },
        "labels": {
          "instance": "172.17.0.3:9100",
          "job": "node-exporter-1"
        },
        "scrapePool": "node-exporter-1",
        "scrapeUrl": "http://172.17.0.3:9100/metrics",
        "globalUrl": "http://172.17.0.3:9100/metrics",
        "lastError": "",
        "lastScrape": "2020-10-29T08:16:32.9358176Z",
        "lastScrapeDuration": 0.0147601,
        "health": "up"
      },
      {
        "discoveredLabels": {
          "__address__": "localhost:9090",
          "__metrics_path__": "/metrics",
          "__scheme__": "http",
          "job": "prometheus"
        },
        "labels": {
          "instance": "localhost:9090",
          "job": "prometheus"
        },
        "scrapePool": "prometheus",
        "scrapeUrl": "http://localhost:9090/metrics",
        "globalUrl": "http://40c6141831f7:9090/metrics",
        "lastError": "",
        "lastScrape": "2020-10-29T08:16:44.7169385Z",
        "lastScrapeDuration": 0.003988,
        "health": "up"
      }
    ],
    "droppedTargets": []
  }
}
查村prometheus服务器运行信息
[root@david ~]$ curl -s 'http://127.0.0.1:9090/api/v1/status/runtimeinfo' | jq
{
  "status": "success",
  "data": {
    "startTime": "2020-10-29T02:39:47.5187166Z",
    "CWD": "/prometheus",
    "reloadConfigSuccess": true,
    "lastConfigTime": "2020-10-29T02:39:47Z",
    "corruptionCount": 0,
    "goroutineCount": 41,
    "GOMAXPROCS": 2,
    "GOGC": "",
    "GODEBUG": "",
    "storageRetention": "15d"
  }
}
返回TSDB相关的一些信息
[root@david ~]$ curl -s 'http://127.0.0.1:9090/api/v1/status/tsdb' | jq
{
  "status": "success",
  "data": {
    "headStats": {
      "numSeries": 1640,
      "chunkCount": 9886,
      "minTime": 1603951200000,
      "maxTime": 1603959549668
    },
    "seriesCountByMetricName": [
      {
        "name": "prometheus_http_request_duration_seconds_bucket",
        "value": 90
      },
      {
        "name": "prometheus_http_response_size_bytes_bucket",
        "value": 81
        ......
规则
[root@david ~]$ curl -s 'http://127.0.0.1:9090/api/v1/rules' | jq
{
  "status": "success",
  "data": {
    "groups": [
      {
        "name": "example",
        "file": "/etc/prometheus/prometheus.rules.yml",
        "rules": [
          {
            "name": "job_service:rpc_durations_seconds_count:avg_rate5m",
            "query": "avg by(job, service) (rate(rpc_durations_seconds_count[5m]))",
            "health": "ok",
            "evaluationTime": 0.0005387,
            "lastEvaluation": "2020-10-29T08:22:09.7348335Z",
            "type": "recording"
          }
        ],
        "interval": 15,
        "evaluationTime": 0.0005477,
        "lastEvaluation": "2020-10-29T08:22:09.7348282Z"
      }
    ]
  }
}
警报
[root@david ~]$ curl -s 'http://127.0.0.1:9090/api/v1/alerts' | jq
{
  "status": "success",
  "data": {
    "alerts": []
  }
}

本文由博客群发一文多发等运营工具平台 OpenWrite 发布