k8s jvm容器感知 k8s 容器监控

转载

mob64ca13feda16 2024-07-30 15:21:24

文章标签 k8s jvm容器感知 Group docker 数据 文章分类 云原生云计算

===============================================

2021/4/10_第4次修改 ccb_warlock

更新说明：

2021/4/10:

1.补充了alertmanager触发告警时收到邮件的截图；

2.增加了cadvisor、kube-state-metrics、prometheus、grafana、alertmanager的功能描述；

2021/2/16:

1.增补了kube-state-metrics镜像的获取教程；

2.补全了prometheus、grafana的内容；

2021/2/15:

1.增加了grafana和部分标题；

===============================================

在很多年前整理过的容器监控方案（）中，曾经采用在docker swarm中运行cAdvisor、Prometheus、Grafana来实现对容器与宿主机的监控。因为懂docker，上个月又被当作运维要求通过在kubernetes上实现监控系统的解决方案，于是我需要实现在kubernetes上运行这套解决方案。

在使用grafana的demo时，了解到监控k8s资源有个比cAdvisor更好用的服务Kube-state-metrics。

cAdvisor：采集os指标、docker指标的数据（kubelet已集成）

Kube-state-metrics：采集kubernetes指标的数据

Prometheus：检索、存储数据

Grafana：可视化通过检索服务得到的数据（例如Prometheus）

Alertmanager：根据Prometheus的触发告警策略

一、部署kubernetes

centos7可以参考：

PS.写教程时部署的版本是v1.20.1

二、创建命名空间

kubectl create namespace monit

k8s jvm容器感知 k8s 容器监控_docker

三、部署cAdvisor

因为在kubernetes上运行，而kubelet已经集成了cAdvisor，所以不需要额外安装，直接使用kubelet即可。

四、部署Kube-state-metrics

4.1 创建编排脚本

# 创建目录

mkdir -p /opt/yaml

# 创建编排脚本

vi /opt/yaml/kube-state-metrics.yaml

将下面的内容保存到kube-state-metrics.yaml中，wq保存。

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    app.kubernetes.io/name: kube-state-metrics
    app.kubernetes.io/version: v2.0.0-beta
  name: kube-state-metrics
rules:
- apiGroups:
  - ""
  resources:
  - configmaps
  - secrets
  - nodes
  - pods
  - services
  - resourcequotas
  - replicationcontrollers
  - limitranges
  - persistentvolumeclaims
  - persistentvolumes
  - namespaces
  - endpoints
  verbs:
  - list
  - watch
- apiGroups:
  - extensions
  resources:
  - daemonsets
  - deployments
  - replicasets
  - ingresses
  verbs:
  - list
  - watch
- apiGroups:
  - apps
  resources:
  - statefulsets
  - daemonsets
  - deployments
  - replicasets
  verbs:
  - list
  - watch
- apiGroups:
  - batch
  resources:
  - cronjobs
  - jobs
  verbs:
  - list
  - watch
- apiGroups:
  - autoscaling
  resources:
  - horizontalpodautoscalers
  verbs:
  - list
  - watch
- apiGroups:
  - authentication.k8s.io
  resources:
  - tokenreviews
  verbs:
  - create
- apiGroups:
  - authorization.k8s.io
  resources:
  - subjectaccessreviews
  verbs:
  - create
- apiGroups:
  - policy
  resources:
  - poddisruptionbudgets
  verbs:
  - list
  - watch
- apiGroups:
  - certificates.k8s.io
  resources:
  - certificatesigningrequests
  verbs:
  - list
  - watch
- apiGroups:
  - storage.k8s.io
  resources:
  - storageclasses
  - volumeattachments
  verbs:
  - list
  - watch
- apiGroups:
  - admissionregistration.k8s.io
  resources:
  - mutatingwebhookconfigurations
  - validatingwebhookconfigurations
  verbs:
  - list
  - watch
- apiGroups:
  - networking.k8s.io
  resources:
  - networkpolicies
  verbs:
  - list
  - watch

---
apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    app.kubernetes.io/name: kube-state-metrics
    app.kubernetes.io/version: v2.0.0-beta
  name: kube-state-metrics
  namespace: monit

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    app.kubernetes.io/name: kube-state-metrics
    app.kubernetes.io/version: v2.0.0-beta
  name: kube-state-metrics
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: kube-state-metrics
subjects:
- kind: ServiceAccount
  name: kube-state-metrics
  namespace: monit

---
apiVersion: v1
kind: Service
metadata:
  annotations:
    prometheus.io/scrape: 'true'
  labels:
    app.kubernetes.io/name: kube-state-metrics
    app.kubernetes.io/version: v2.0.0-beta
  name: kube-state-metrics
  namespace: monit
spec:
  type: NodePort
  ports:
  - name: http-metrics
    port: 8080
    targetPort: http-metrics
    #nodePort: 30001
  - name: telemetry
    port: 8081
    targetPort: telemetry
    #nodePort: 30002
  selector:
    app.kubernetes.io/name: kube-state-metrics

---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/name: kube-state-metrics
    app.kubernetes.io/version: v2.0.0-beta
  name: kube-state-metrics
  namespace: monit
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: kube-state-metrics
  template:
    metadata:
      labels:
        app.kubernetes.io/name: kube-state-metrics
        app.kubernetes.io/version: v2.0.0-beta
    spec:
      containers:
      - image: k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.0.0-beta
        livenessProbe:
          httpGet:
            path: /healthz
            port: 8080
          initialDelaySeconds: 5
          timeoutSeconds: 5
        name: kube-state-metrics
        ports:
        - containerPort: 8080
          name: http-metrics
        - containerPort: 8081
          name: telemetry
        readinessProbe:
          httpGet:
            path: /
            port: 8081
          initialDelaySeconds: 5
          timeoutSeconds: 5
      nodeSelector:
        beta.kubernetes.io/os: linux
      serviceAccountName: kube-state-metrics

PS.获取kube-state-metrics镜像，请参考：

4.2 部署

# 执行编排脚本

cd /opt/yaml
kubectl apply -f kube-state-metrics.yaml

k8s jvm容器感知 k8s 容器监控_docker_02

k8s jvm容器感知 k8s 容器监控_数据_03

五、部署Prometheus

5.1 创建数据持久化目录

mkdir -p /opt/vol/prometheus/data

5.2 创建编排脚本

# 创建目录

mkdir -p /opt/yaml

# 创建编排脚本

vi /opt/yaml/prometheus.yaml

将下面的内容保存到prometheus.yaml中，wq保存。

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: warlock
  namespace: monit
rules:
- apiGroups: [""]
  resources:
  - nodes
  - nodes/proxy
  - nodes/metrics
  - services
  - services/proxy
  - endpoints
  - endpoints/proxy
  - pods
  - pods/proxy
  verbs: ["get", "list", "watch"]

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: warlock
  namespace: monit

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: warlock
  namespace: monit
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: warlock
subjects:
- kind: ServiceAccount
  name: warlock
  namespace: monit

---
apiVersion: v1
kind: Service
metadata:
  name: prometheus-service
  namespace: monit
  labels:
    app: prometheus-service
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: EnsureExists
spec:
  type: NodePort
  ports:
  - port: 9090
    targetPort: 9090
    nodePort: 9090
  selector:
    app: prometheus-deployment

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
  namespace: monit
data:
  prometheus.yml: |
    global:
      scrape_interval:     15s
      evaluation_interval: 15s
    
    alerting:
      alertmanagers:
      - static_configs:
        - targets:
          - alertmanager-service:9093
    
    rule_files:
      - "node.yml"
    
    scrape_configs:
      - job_name: 'prometheus'
        static_configs:
          - targets: ['localhost:9090']

      - job_name: 'k8s-cadvisor'
        metrics_path: /metrics/cadvisor
        kubernetes_sd_configs:
        - role: node
        relabel_configs:
        - action: labelmap
          regex: __meta_kubernetes_node_label_(.+)
        scheme: https
        tls_config:
          insecure_skip_verify: true
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        metric_relabel_configs:
        - source_labels: [instance]
          separator: ;
          regex: (.+)
          target_label: node
          replacement: $1
          action: replace
        - source_labels: [pod_name]
          separator: ;
          regex: (.+)
          target_label: pod
          replacement: $1
          action: replace
        - source_labels: [container_name]
          separator: ;
          regex: (.+)
          target_label: container
          replacement: $1
          action: replace
      
      - job_name: kube-state-metrics
        kubernetes_sd_configs:
        - role: endpoints
          namespaces:
            names:
            - monit
        relabel_configs:
        - source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_name]
          regex: kube-state-metrics
          replacement: $1
          action: keep
        - action: labelmap
          regex: __meta_kubernetes_service_label_(.+)
        - source_labels: [__meta_kubernetes_namespace]
          action: replace
          target_label: k8s_namespace
        - source_labels: [__meta_kubernetes_service_name]
          action: replace
          target_label: k8s_sname

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-node
  namespace: monit
data:
  node.yml: |
    groups:
    - name: node
      rules:
      - alert: PrometheusEndpointDown
        expr: up == 0
        for: 10s
        labels:
          source: prometheus
        annotations:
          title: "Endpoint({{$labels.instance}}) Down"
          content: "The endpoint({{$labels.instance}}) of target({{$labels.job}}) has been down for more than 10 seconds."

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus-deployment
  namespace: monit
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus-deployment
  template:
    metadata:
      labels:
        app: prometheus-deployment
    spec:
      serviceAccountName: warlock
      securityContext:
        runAsUser: 0
      volumes:
      - name: config
        projected:
          sources:
          - configMap:
              name: prometheus-config
          - configMap:
              name: prometheus-node
      - name: data-vol
        hostPath:
          path: /opt/vol/prometheus/data
      containers:
      - name: prometheus
        image: prom/prometheus:v2.24.1
        imagePullPolicy: IfNotPresent # Always
        env:
        - name: TZ
          value: "Asia/Shanghai"
        volumeMounts:
        - name: config
          mountPath: "/etc/prometheus/prometheus.yml"
          subPath: prometheus.yml
          readOnly: true
        - name: config
          mountPath: "/etc/prometheus/node.yml"
          subPath: node.yml
          readOnly: true
        - name: data-vol
          mountPath: /prometheus
        ports:
          - containerPort: 9090

5.3 部署

# 执行编排脚本

cd /opt/yaml
kubectl apply -f prometheus.yaml

k8s jvm容器感知 k8s 容器监控_Group_04

k8s jvm容器感知 k8s 容器监控_数据_05

k8s jvm容器感知 k8s 容器监控_k8s jvm容器感知_06

六、部署Grafana

6.1 创建数据持久化目录

mkdir -p /opt/vol/grafana

6.2 创建编排脚本

# 创建目录

mkdir -p /opt/yaml

# 创建编排脚本

vi /opt/yaml/grafana.yaml

将下面的内容保存到grafana.yaml中，wq保存。

apiVersion: v1
kind: Service
metadata:
  name: grafana-service
  namespace: monit
  labels:
    app: grafana-service
spec:
  type: NodePort
  ports:
  - port: 3000
    targetPort: 3000
    nodePort: 3000
  selector:
    app: grafana-deployment

---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: grafana-pv
  namespace: monit
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/opt/vol/grafana"

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: grafana-pvc
  namespace: monit
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: "10Gi"

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: grafana-deployment
  namespace: monit
spec:
  replicas: 1
  selector:
    matchLabels:
      app: grafana-deployment
  template:
    metadata:
      labels:
        app: grafana-deployment
    spec:
      volumes:
      - name: grafana-pvc
        persistentVolumeClaim:
          claimName: grafana-pvc 
      containers:
      - name: grafana
        image: grafana/grafana:7.4.1
        imagePullPolicy: IfNotPresent # Always
        env:
        - name: TZ
          value: "Asia/Shanghai"
        volumeMounts:
        - name: grafana-pvc
          mountPath: /var/lib/grafana
        ports:
          - containerPort: 3000
      initContainers:
      - name: init-chown-data
        image: busybox:1.33.0
        imagePullPolicy: IfNotPresent # Always
        command: ["chown", "-R", "472:472", "/var/lib/grafana"]
        volumeMounts:
          - name: grafana-pvc
            mountPath: /var/lib/grafana

6.3 部署

# 执行编排脚本

cd /opt/yaml
kubectl apply -f grafana.yaml

k8s jvm容器感知 k8s 容器监控_数据_07

k8s jvm容器感知 k8s 容器监控_k8s jvm容器感知_08

6.4 登录grafana

使用初始用户名/密码（admin/admin）登录。

k8s jvm容器感知 k8s 容器监控_Group_09

6.5 配置数据源

1）进入数据源信息的界面，点击“Add data source”

k8s jvm容器感知 k8s 容器监控_k8s jvm容器感知_10

2）选择“Prometheus”

k8s jvm容器感知 k8s 容器监控_k8s jvm容器感知_11

3）填写prometheus服务的URL，点击“Save & Test”（如果服务正常，则会提示）

k8s jvm容器感知 k8s 容器监控_Group_12

k8s jvm容器感知 k8s 容器监控_数据_13

6.6 导入仪表盘

选用的仪表盘模板（https://grafana.com/grafana/dashboards/13105）

由于作者使用的kube-state-metrics是v1.9.7，而查看文档（https://github.com/kubernetes/kube-state-metrics）只适用于1.16版本的kubernetes。

对于1.17以上的kubernetes使用的kube-state-metrics:v2.0.0-beta，查看日志2.x修改了部分参数。

k8s jvm容器感知 k8s 容器监控_k8s jvm容器感知_14

直接使用原作者的仪表盘会导致很多数据无法呈现，故我对该仪表盘的某些参数针对最新版本的参数进行了修改后基本可以呈现数据，下面就针对修改后的仪表盘描述操作。

1）获取仪表盘文件

链接：https://pan.baidu.com/s/1BYnaczAeIRuJAK6LI8T7GQ

提取码：vvcp

2）导入该文件

k8s jvm容器感知 k8s 容器监控_数据_15

k8s jvm容器感知 k8s 容器监控_数据_16

3）查看

k8s jvm容器感知 k8s 容器监控_数据_17

七、部署Alertmanager

7.1 配置发件邮箱

本例中以邮件告警（qq邮箱）的方式进行下面的配置。

1）开启smtp服务

“设置”-“账户”中，开启“POP3/SMTP服务”

k8s jvm容器感知 k8s 容器监控_docker_18

2）点击“生成授权码”，根据步骤获取邮箱授权码

k8s jvm容器感知 k8s 容器监控_数据_19

7.2 创建编排脚本

# 创建目录

mkdir -p /opt/yaml

# 创建编排脚本

vi /opt/yaml/alertmanager.yaml

# 修改邮箱相关的配置，将下面的内容保存到alertmanager.yaml中，wq保存。

apiVersion: v1
kind: ConfigMap
metadata:
  name: alertmanager-config
  namespace: monit
data:
  alertmanager.yml: |
    global:
      resolve_timeout: 5m
      smtp_smarthost: '<smtp服务器:端口>'
      smtp_from: '<发件邮箱>'
      smtp_auth_username: '<发件邮箱>'
      smtp_auth_password: '<邮箱授权码>'
      smtp_require_tls: false

    route:
      group_by: ['alertname']
      group_wait: 30s
      group_interval: 30s
      repeat_interval: 1h
      receiver: 'mail'
    
    receivers:
    - name: 'mail'
      email_configs:
      - to: '<收件邮箱>'

---
apiVersion: v1
kind: Service
metadata:
  name: alertmanager-service
  namespace: monit
  labels:
    app: alertmanager-service
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: EnsureExists
spec:
  type: NodePort
  ports:
  - port: 9093
    targetPort: 9093
    nodePort: 9093
  selector:
    app: alertmanager-deployment

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: alertmanager-deployment
  namespace: monit
spec:
  replicas: 1
  selector:
    matchLabels:
      app: alertmanager-deployment
  template:
    metadata:
      labels:
        app: alertmanager-deployment
    spec:
      volumes:
      - name: config
        configMap:
          name: alertmanager-config
      containers:
      - name: alertmanager
        image: prom/alertmanager:v0.21.0
        imagePullPolicy: IfNotPresent # Always
        env:
        - name: TZ
          value: "Asia/Shanghai"
        volumeMounts:
        - name: config
          mountPath: "/etc/alertmanager"
          readOnly: true
        ports:
          - containerPort: 9093

7.3 部署

# 执行编排脚本

cd /opt/yaml
kubectl apply -f alertmanager.yaml

k8s jvm容器感知 k8s 容器监控_docker_20

k8s jvm容器感知 k8s 容器监控_k8s jvm容器感知_21

k8s jvm容器感知 k8s 容器监控_k8s jvm容器感知_22

7.4 模拟触发告警

在之前的prometheus配置中，增加了一个告警规则（如果有目标服务挂了，进行告警）

k8s jvm容器感知 k8s 容器监控_数据_23

接着通过更换不存在的kube-state-metrics镜像，来触发该告警规则。

# 拷贝一份试验用的yaml脚本

cd /opt/yaml
cp kube-state-metrics.yaml kube-state-metrics-test.yaml

# 更换不存在的镜像

cd /opt/yaml
sed -i 's/kube-state-metrics:v2.0.0-beta$/kube-state-metrics:abcd/g' kube-state-metrics-test.yaml

# 重新部署kube-state-metrics

cd /opt/yaml
kubectl delete -f kube-state-metrics.yaml
kubectl create -f kube-state-metrics-test.yaml

由于拉取不到这个tag为abcd的镜像，故该服务的pod无法启动，于是触发告警规则。

k8s jvm容器感知 k8s 容器监控_k8s jvm容器感知_24

k8s jvm容器感知 k8s 容器监控_docker_25

k8s jvm容器感知 k8s 容器监控_数据_26

目标邮箱就会收到告警邮件如下：

k8s jvm容器感知 k8s 容器监控_Group_27

这样整个监控容器与集群参数的demo基本就已经实现，应用于项目还得根据实际选择需要的参数和规则进行配置。

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：机器学习动态训练过程

下一篇：nestjs 分布式开发 nodejs 分布式任务

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯