Exporter(Linux主机监控)

  由于 Linux 操作系统自身并不支持 Prometheus,所以 Prometheus 官方提供了 Go 语言编写的 Node exporter 来实现对 linux 操作系统主机的监控数据采集。它提供了系统内几乎所有的标准指标,如 CPU、内存、磁盘空间、磁盘I/O、系统负载和网络带宽。另外它还提供了由内核公开的大量额外监控指标,从负载平均到主板温度等。

  在安装之前,首先在官方下载页面 https://github.com/prometheus/node_exporter/releases 找到最新 Node exporter 版本,下载最新版本中特定平台的二进制文件,如下:

Quartz 监控页面_Quartz 监控页面

一、部署 Node exporter

  我这里都是kubernetes环境,就不讲二进制部署了。以下是 DaemonSet 运行的 Node export 配置清单 yaml 文件,Node exporter 版本:node-exporter:v1.3.1:

kind: DaemonSet
apiVersion: apps/v1
metadata:
  name: node-exporter
  namespace: kubesphere-monitoring-system
  labels:
    app.kubernetes.io/component: exporter
    app.kubernetes.io/name: node-exporter
    app.kubernetes.io/part-of: kube-prometheus
    app.kubernetes.io/version: 1.3.1
  annotations:
    deprecated.daemonset.template.generation: '1'
    kubectl.kubernetes.io/last-applied-configuration: >
      {"apiVersion":"apps/v1","kind":"DaemonSet","metadata":{"annotations":{},"labels":{"app.kubernetes.io/component":"exporter","app.kubernetes.io/name":"node-exporter","app.kubernetes.io/part-of":"kube-prometheus","app.kubernetes.io/version":"1.3.1"},"name":"node-exporter","namespace":"kubesphere-monitoring-system"},"spec":{"selector":{"matchLabels":{"app.kubernetes.io/component":"exporter","app.kubernetes.io/name":"node-exporter","app.kubernetes.io/part-of":"kube-prometheus"}},"template":{"metadata":{"labels":{"app.kubernetes.io/component":"exporter","app.kubernetes.io/name":"node-exporter","app.kubernetes.io/part-of":"kube-prometheus","app.kubernetes.io/version":"1.3.1"}},"spec":{"affinity":{"nodeAffinity":{"requiredDuringSchedulingIgnoredDuringExecution":{"nodeSelectorTerms":[{"matchExpressions":[{"key":"node-role.kubernetes.io/edge","operator":"DoesNotExist"}]}]}}},"containers":[{"args":["--web.listen-address=127.0.0.1:9100","--path.procfs=/host/proc","--path.sysfs=/host/sys","--path.rootfs=/host/root","--no-collector.wifi","--no-collector.hwmon","--collector.filesystem.ignored-mount-points=^/(dev|proc|sys|var/lib/docker/.+)($|/)","--collector.filesystem.ignored-fs-types=^(autofs|binfmt_misc|cgroup|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|mqueue|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|sysfs|tracefs)$"],"image":"registry.cn-beijing.aliyuncs.com/kubesphereio/node-exporter:v1.3.1","name":"node-exporter","resources":{"limits":{"cpu":1,"memory":"500Mi"},"requests":{"cpu":"102m","memory":"180Mi"}},"volumeMounts":[{"mountPath":"/host/proc","name":"proc","readOnly":true},{"mountPath":"/host/sys","name":"sys","readOnly":true},{"mountPath":"/host/root","mountPropagation":"HostToContainer","name":"root","readOnly":true}]},{"args":["--logtostderr","--secure-listen-address=[$(IP)]:9100","--tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256","--upstream=http://127.0.0.1:9100/"],"env":[{"name":"IP","valueFrom":{"fieldRef":{"fieldPath":"status.podIP"}}}],"image":"registry.cn-beijing.aliyuncs.com/kubesphereio/kube-rbac-proxy:v0.11.0","name":"kube-rbac-proxy","ports":[{"containerPort":9100,"hostPort":9100,"name":"https"}],"resources":{"limits":{"cpu":1,"memory":"100Mi"},"requests":{"cpu":"10m","memory":"20Mi"}},"securityContext":{"runAsGroup":65532,"runAsNonRoot":true,"runAsUser":65532}}],"hostNetwork":true,"hostPID":true,"nodeSelector":{"kubernetes.io/os":"linux"},"securityContext":{"runAsNonRoot":true,"runAsUser":65534},"serviceAccountName":"node-exporter","tolerations":[{"operator":"Exists"}],"volumes":[{"hostPath":{"path":"/proc"},"name":"proc"},{"hostPath":{"path":"/sys"},"name":"sys"},{"hostPath":{"path":"/"},"name":"root"}]}}}}
spec:
  selector:
    matchLabels:
      app.kubernetes.io/component: exporter
      app.kubernetes.io/name: node-exporter
      app.kubernetes.io/part-of: kube-prometheus
  template:
    metadata:
      creationTimestamp: null
      labels:
        app.kubernetes.io/component: exporter
        app.kubernetes.io/name: node-exporter
        app.kubernetes.io/part-of: kube-prometheus
        app.kubernetes.io/version: 1.3.1
    spec:
      volumes:
        - name: proc
          hostPath:
            path: /proc
            type: ''
        - name: sys
          hostPath:
            path: /sys
            type: ''
        - name: root
          hostPath:
            path: /
            type: ''
      containers:
        - name: node-exporter
          image: 'registry.cn-beijing.aliyuncs.com/kubesphereio/node-exporter:v1.3.1'
          args:
            - '--web.listen-address=127.0.0.1:9100'
            - '--path.procfs=/host/proc'
            - '--path.sysfs=/host/sys'
            - '--path.rootfs=/host/root'
            - '--no-collector.wifi'
            - '--no-collector.hwmon'
            - >-
              --collector.filesystem.ignored-mount-points=^/(dev|proc|sys|var/lib/docker/.+)($|/)
            - >-
              --collector.filesystem.ignored-fs-types=^(autofs|binfmt_misc|cgroup|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|mqueue|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|sysfs|tracefs)$
          resources:
            limits:
              cpu: '1'
              memory: 500Mi
            requests:
              cpu: 102m
              memory: 180Mi
          volumeMounts:
            - name: proc
              readOnly: true
              mountPath: /host/proc
            - name: sys
              readOnly: true
              mountPath: /host/sys
            - name: root
              readOnly: true
              mountPath: /host/root
              mountPropagation: HostToContainer
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          imagePullPolicy: IfNotPresent
        - name: kube-rbac-proxy
          image: >-
            registry.cn-beijing.aliyuncs.com/kubesphereio/kube-rbac-proxy:v0.11.0
          args:
            - '--logtostderr'
            - '--secure-listen-address=[$(IP)]:9100'
            - >-
              --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256
            - '--upstream=http://127.0.0.1:9100/'
          ports:
            - name: https
              hostPort: 9100
              containerPort: 9100
              protocol: TCP
          env:
            - name: IP
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: status.podIP
          resources:
            limits:
              cpu: '1'
              memory: 100Mi
            requests:
              cpu: 10m
              memory: 20Mi
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          imagePullPolicy: IfNotPresent
          securityContext:
            runAsUser: 65532
            runAsGroup: 65532
            runAsNonRoot: true
      restartPolicy: Always
      terminationGracePeriodSeconds: 30
      dnsPolicy: ClusterFirst
      nodeSelector:
        kubernetes.io/os: linux
      serviceAccountName: node-exporter
      serviceAccount: node-exporter
      hostNetwork: true
      hostPID: true
      securityContext:
        runAsUser: 65534
        runAsNonRoot: true
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: node-role.kubernetes.io/edge
                    operator: DoesNotExist
      schedulerName: default-scheduler
      tolerations:
        - operator: Exists
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
      maxSurge: 0
  revisionHistoryLimit: 10

二、与 Prometheus 集成

  Node exporter 和 Prometheus 启动后,没有经过配置文件配置,他们还是没有进行对接关联,此时,两个程序是各自独立运行的应用程序。

  现在需要将已部署好的 node_exporter 添加到 Prometheus 服务器中。在 Prometheus 主机目录中,找到主配置文件,使用其中的静态配置功能 static_configs 来采集 node_exporter 提供的数据。

~ # cat /etc/prometheus/prometheus.yml 
# my global config
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["localhost:9090"]
~ #

  在默认的配置文件的基础上,重新编辑 /etc/prometheus/prometheus.yaml 文件,添加 job 与 node_exporter 进行关联的参考配置文件内容如下:

scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["localhost:9090"]
  - job_name: "node_exporter"
    static_configs:
      - targets: ["192.168.2.121:9100"]

  配置完成后,需要我们重新启动 prometheus 或 进行动态热加载操作,使操作修改后的配置文件加载生效。

  • 1、首先,可以在 Prometheus UI 首页点开 "status" 中的 "Targets",如下图所示:
  • 2、进入 Targets 页面后,可以在列表中看到刚才配置好的 node_exporter 的状态为 "UP",说明 Prometheus 最后一次从 Node exporter 中采集数据是成功的,此刻被监控的服务器主机工作状态是正常的,如下图所示:
  • 3、我们也可以在 Prometheus UI 提供的 graph 页面,在搜索框中查找,这里就不说了。
  • 4、metrics 查看:
  • CPU 数据采集。
  • 内存数据采集:数据源来源于 /proc/meminfo 文件。
  • 磁盘数据采集:数据来源于 /proc/diskstats 文件。
  • 文件系统数据采集。
  • 网络数据采集。