单独prometheus监控k8s集群 k8s如何监控node状态

转载

mob64ca140beea5 2024-03-25 20:50:29

文章标签 单独prometheus监控k8s集群 kubernetes Group 服务发现正则匹配 文章分类 云原生云计算

25.prometheus监控k8s集群

一、node-exporter

node_exporter抓取用于采集服务器节点的各种运行指标，比如 conntrack，cpu，diskstats，filesystem，loadavg，meminfo，netstat等
更多查看：https://github.com/prometheus/node_exporter

1. Daemon Set部署node-exporter

拉取镜像docker pull prom/node-exporter:v1.1.2vi node-exporter-dm.yaml

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-exporter
  namespace: kube-mon
  labels:
    name: node-exporter
spec:
  selector:
    matchLabels:
      name: node-exporter
  template:
    metadata:
      labels:
        name: node-exporter
    spec:
      hostPID: true    # 使用主机PID namespace
      hostIPC: true    # 使用主机IPC namespace
      hostNetwork: true    # 使用主机net namespace
      containers:
      - name: node-exporter
        image: harbor.hzwod.com/k8s/prom/node-exporter:v1.1.2
        ports:
        - containerPort: 9100
        resources:
          requests:
            cpu: 150m
#        securityContext:
#          privileged: true
        args:
        - --path.rootfs
        - /host
        volumeMounts:
        - name: rootfs
          mountPath: /host
      tolerations:
      - key: "node-role.kubernetes.io/master"
        operator: "Exists"
        effect: "NoSchedule"
      volumes:
        - name: rootfs
          hostPath:
            path: /

hostPID: true、hostIPC: true、hostNetwork: true使node-export容器和主机共享PID、IPC、NET命名空间，以能使用主机的命令等资源
注意，因和主机共享了net namespace ，则containerPort: 9100会直接暴露到主机的9001端口，该端口将作为metrics的服务入口
挂载主机的/目录到容器/host目录，指定参数--path.rootfs=/host，使容器能找到并通过主机的这些文件获取主机的信息，如/proc/stat能获取cpu信息、/proc/meminfo能获取内存信息
tolerations 为pod添加容忍，允许该pod能运行在master节点上，因为我们希望master节点也能被监控，若有其他污点node再同理处理

kubectl apply -f node-exporter-dm.yaml 异常

查看
kube-apiserver -h
找到这条说明

给kube-apiserver添加该启动参数
--allow-privileged=true
允许容器请求特权模式

或去掉上面的
securityContext.privileged: true
这个配置（TODO有什么影响暂时还不知）

检查metrics
curl http://172.10.10.100:9100/metrics 我们能看到能多指标信息

此时每个节点都有一个metrics接口，我们可以在prometheus上为每个node都配置上监控，但是若我们增加了一个node是不是就需要修改一次prometheus配置，有没有简单的方式能自动发现node呢？接下来看一看prometheus的服务发现

2. 服务发现

在 Kubernetes 下，Promethues 通过与 Kubernetes API 集成，目前主要支持5中服务发现模式，分别是：Node、Service、Pod、Endpoints、Ingress。

a. node发现

添加prometheus config

- job_name: 'kubernetes-nodes'
  kubernetes_sd_configs:
  - role: node

kubernetes_sd_configs是prometheus提供的Kubernetes API服务发现配置
role可以是node、service、pod、endpoints、ingress，不同的role支持不同的meta labels
更多信息可以查看官方文档：kubernetes_sd_config

除了kubernetes_sd_config prometheus还有还有很多其他选项prometheus configuration

reload prometheus后查看targets，发现自动发现生效了，但是接口都400了

单独prometheus监控k8s集群 k8s如何监控node状态_kubernetes_03

b. 使用`relabel_config`调整服务发现的Endpoint

我们发现自动发现node后，prometheus自动寻找的端口是10250，而且还不通，这是为什么呢
10250端口实际上是旧版本kubelet提供的只读数据统一接口，现在版本的kubelet（此文版本：v1.17.16）已经修改为10255
而我们希望此处自动发现node的监听端口是我们node-export提供的9100端口（即使要使用kubelet自带的metrics也要修改成10255端口，下文配置cAdvisor时会用到）

kubelet启动后自动开启10255端口，可以通过curl http://[nodeIP]:10255/metrics查看监控信息

我们也可以通过relabel_configs来介入修改此处的Endpoint的端口或其他信息
修改prometheus.yaml 的kubernetes-nodes job配置

- job_name: 'kubernetes-nodes'
  kubernetes_sd_configs:
  - role: node
  relabel_configs:
  - action: replace    # 替换动作
    source_labels: [__address__]    # 数组，指定多个label串联被regex匹配
    target_label: __address__    # 替换的目标label
    regex: '(.*):10250'    # 正则匹配source_labels指定的labels串联值
    replacement: '${1}:9100'    # 为目标label替换后的值

action: replace 动作为替换
__address__
replacement: '${1}:9100' ${1}为引用regex正则表达式的第一个匹配组
更多信息查看relabel_configs

官网关于__address__的一段描述
The __address__ label is set to the <host>:<port> address of the target. After relabeling, the instance label is set to the value of __address__ by default if it was not set during relabeling. The __scheme__ and __metrics_path__ labels are set to the scheme and metrics path of the target respectively. The __param_<name> label is set to the value of the first passed URL parameter called <name>

再添加 labelmap 添加kubernetes node的label作为prometheus的Labels，便于后续监控数据的筛选

- action: labelmap
    regex: __meta_kubernetes_node_label_(.*)

更新prometheus.yaml并reload后，查看prometheus

单独prometheus监控k8s集群 k8s如何监控node状态_Group_04

c. 完整的prometheus.yaml

我们看一下完整的prometheus configmap（prometheus.yam使用configmap方式储存在etcd中）
prometheus-cm.yaml

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
  namespace: kube-mon
data:
  prometheus.yml: |
    global:
      scrape_interval: 15s
      scrape_timeout: 15s
    scrape_configs:
    - job_name: 'prometheus'
      static_configs:
      - targets: ['localhost:9090']
    - job_name: 'coredns'
      static_configs:
      - targets: ['kube-dns.kube-system:9153']
    - job_name: 'traefik'
      static_configs:
        - targets: ['traefiktcp.default:8180']
    - job_name: 'kubernetes-nodes'
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - action: replace    # 替换动作
        source_labels: [__address__]    # 数组，指定多个label串联被regex匹配
        target_label: __address__    # 替换的目标label
        regex: '(.*):10250'    # 正则匹配source_labels指定的labels串联值
        replacement: '${1}:9100'    # 为目标label替换后的值
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.*)

3. 配置grafana展示节点监控信息

前面我们已经安装好grafana且配置好了prometheus数据源，我们现在配置grafana模板监控展示nodeexport信息

下载模板：https://grafana.com/api/dashboards/8919/revisions/24/download

单独prometheus监控k8s集群 k8s如何监控node状态_服务发现_05

二、kube-state-metrics + cAdvisor

1. 配置prometheus监控cAdvisor

cAdvisor作为kubelet内置的一部分程序可以直接使用

- job_name: 'k8s-cadvisor'
      metrics_path: /metrics/cadvisor
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - source_labels: [__address__]
        regex: '(.*):10250'
        replacement: '${1}:10255'
        target_label: __address__
        action: replace
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      metric_relabel_configs:
      - source_labels: [instance]
        separator: ;
        regex: (.+)
        target_label: node
        replacement: $1
        action: replace
      - source_labels: [pod_name]
        separator: ;
        regex: (.+)
        target_label: pod
        replacement: $1
        action: replace
      - source_labels: [container_name]
        separator: ;
        regex: (.+)
        target_label: container
        replacement: $1
        action: replace

单独prometheus监控k8s集群 k8s如何监控node状态_Group_06

2. 部署kube-state-metrics

https://github.com/kubernetes/kube-state-metrics/tree/master/examples/standard

本节部署kube-state-metrics的namespace:kube-mon
kube-state-metrics版本为v1.9.8

下载镜像
docker pull quay.mirrors.ustc.edu.cn/coreos/kube-state-metrics:v1.9.8
cluster-role-binding.yaml

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    app.kubernetes.io/name: kube-state-metrics
    app.kubernetes.io/version: 1.9.8
  name: kube-state-metrics
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: kube-state-metrics
subjects:
- kind: ServiceAccount
  name: kube-state-metrics
  namespace: kube-mon
---
apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    app.kubernetes.io/name: kube-state-metrics
    app.kubernetes.io/version: 1.9.8
  name: kube-state-metrics
  namespace: kube-mon

cluster-role.yaml

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    app.kubernetes.io/name: kube-state-metrics
    app.kubernetes.io/version: 1.9.8
  name: kube-state-metrics
rules:
- apiGroups:
  - ""
  resources:
  - configmaps
  - secrets
  - nodes
  - pods
  - services
  - resourcequotas
  - replicationcontrollers
  - limitranges
  - persistentvolumeclaims
  - persistentvolumes
  - namespaces
  - endpoints
  verbs:
  - list
  - watch
- apiGroups:
  - apps
  resources:
  - statefulsets
  - daemonsets
  - deployments
  - replicasets
  verbs:
  - list
  - watch
- apiGroups:
  - batch
  resources:
  - cronjobs
  - jobs
  verbs:
  - list
  - watch
- apiGroups:
  - autoscaling
  resources:
  - horizontalpodautoscalers
  verbs:
  - list
  - watch
- apiGroups:
  - authentication.k8s.io
  resources:
  - tokenreviews
  verbs:
  - create
- apiGroups:
  - authorization.k8s.io
  resources:
  - subjectaccessreviews
  verbs:
  - create
- apiGroups:
  - policy
  resources:
  - poddisruptionbudgets
  verbs:
  - list
  - watch
- apiGroups:
  - certificates.k8s.io
  resources:
  - certificatesigningrequests
  verbs:
  - list
  - watch
- apiGroups:
  - storage.k8s.io
  resources:
  - storageclasses
  - volumeattachments
  verbs:
  - list
  - watch
- apiGroups:
  - admissionregistration.k8s.io
  resources:
  - mutatingwebhookconfigurations
  - validatingwebhookconfigurations
  verbs:
  - list
  - watch
- apiGroups:
  - networking.k8s.io
  resources:
  - networkpolicies
  - ingresses
  verbs:
  - list
  - watch
- apiGroups:
  - coordination.k8s.io
  resources:
  - leases
  verbs:
  - list
  - watch

deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/name: kube-state-metrics
    app.kubernetes.io/version: 1.9.8
  name: kube-state-metrics
  namespace: kube-mon
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: kube-state-metrics
  template:
    metadata:
      labels:
        app.kubernetes.io/name: kube-state-metrics
        app.kubernetes.io/version: 1.9.8
    spec:
      containers:
      - image: harbor.hzwod.com/k8s/kube-state-metrics:v1.9.8
        livenessProbe:
          httpGet:
            path: /healthz
            port: 8080
          initialDelaySeconds: 5
          timeoutSeconds: 5
        name: kube-state-metrics
        ports:
        - containerPort: 8080
          name: http-metrics
        - containerPort: 8081
          name: telemetry
        readinessProbe:
          httpGet:
            path: /
            port: 8081
          initialDelaySeconds: 5
          timeoutSeconds: 5
        securityContext:
          runAsUser: 65534
      nodeSelector:
        kubernetes.io/os: linux
      serviceAccountName: kube-state-metrics

service.yaml

apiVersion: v1
kind: Service
metadata:
  annotations:
    prometheus.io/scraped: "true"
  labels:
    app.kubernetes.io/name: kube-state-metrics
    app.kubernetes.io/version: 1.9.8
  name: kube-state-metrics
  namespace: kube-mon
spec:
  clusterIP: None
  ports:
  - name: http-metrics
    port: 8080
    targetPort: http-metrics
  - name: telemetry
    port: 8081
    targetPort: telemetry
  selector:
    app.kubernetes.io/name: kube-state-metrics

kubectl apply -f . 应用这些资源启动kube-state-metrics容器及服务

3. 配置prometheus获取kube-state-metrics监控信息

prometheus.yaml 添加入如下job

- job_name: kube-state-metrics
      kubernetes_sd_configs:
      - role: endpoints
        namespaces:
          names:
          - kube-mon
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_name]
        regex: kube-state-metrics
        replacement: $1
        action: keep
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: k8s_namespace
      - source_labels: [__meta_kubernetes_service_name]
        action: replace
        target_label: k8s_sname