目录

1.新建命名空间monitor

2.部署

2.1部署cadvisor

2.2部署node_exporter

2.3部署prometheus

2.4部署rbac权限

2.5.部署 metrics

2.6部署grafana

3.测试监控效果


准备工作:

Cluster集群节点介绍:

master:192.168.136.21(以下所步骤都在该节点进行)

worker:192.168.136.22

worker:192.168.136.23

##vim缩进混乱,冒号模式下,:set paste进入黏贴模式,:set nopaste退出黏贴模式(默认)。##

1.新建命名空间monitor

kubectl create ns monitor

apache nifi grafana 集成 grafana集群部署_vim

apache nifi grafana 集成 grafana集群部署_云计算_02

拉取cadvisor镜像,由于官方的镜像在在谷歌镜像中,国内无法访问,我这里直接用别人的,直接拉取即可,注意镜像名是 lagoudocker/cadvisor:v0.37.0。

docker pull lagoudocker/cadvisor:v0.37.0 

apache nifi grafana 集成 grafana集群部署_docker_03

2.部署

新建 /opt/cadvisor_prome_gra 目录,配置文件较多,单独新建一个目录。

2.1部署cadvisor

部署cadvisor的DaemonSet资源,DaemonSet资源可以保证集群内的每一个节点运行同一组相同的pod,即使是新加入的节点也会自动创建对应的pod。

 vim case1-daemonset-deploy-cadvisor.yaml

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: cadvisor
  namespace: monitor
spec:
  selector:
    matchLabels:
      app: cAdvisor
  template:
    metadata:
      labels:
        app: cAdvisor
    spec:
      tolerations:    #污点容忍,忽略master的NoSchedule
        - effect: NoSchedule
          key: node-role.kubernetes.io/master
      hostNetwork: true
      restartPolicy: Always   # 重启策略
      containers:
      - name: cadvisor
        image: lagoudocker/cadvisor:v0.37.0
        imagePullPolicy: IfNotPresent  # 镜像策略
        ports:
        - containerPort: 8080
        volumeMounts:
          - name: root
            mountPath: /rootfs
          - name: run
            mountPath: /var/run
          - name: sys
            mountPath: /sys
          - name: docker
            mountPath: /var/lib/containerd
      volumes:
      - name: root
        hostPath:
          path: /
      - name: run
        hostPath:
          path: /var/run
      - name: sys
        hostPath:
          path: /sys
      - name: docker
        hostPath:
          path: /var/lib/containerd

kubectl apply -f case1-daemonset-deploy-cadvisor.yaml

kubectl get pod -n monitor -owide 查询

因为有三个节点,所以会有三个pod,如果后期加入工作节点,DaemonSet会自动添加。 

apache nifi grafana 集成 grafana集群部署_kubernetes_04

测试cadvisor  <masterIP>:<8080>

apache nifi grafana 集成 grafana集群部署_docker_05

apache nifi grafana 集成 grafana集群部署_kubernetes_06

2.2部署node_exporter

部署node-exporter的DaemonSet资源和Service资源。

vim case2-daemonset-deploy-node-exporter.yaml

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-exporter
  namespace: monitor
  labels:
    k8s-app: node-exporter
spec:
  selector:
    matchLabels:
        k8s-app: node-exporter
  template:
    metadata:
      labels:
        k8s-app: node-exporter
    spec:
      tolerations:
        - effect: NoSchedule
          key: node-role.kubernetes.io/master
      containers:
      - image: prom/node-exporter:v1.3.1 
        imagePullPolicy: IfNotPresent
        name: prometheus-node-exporter
        ports:
        - containerPort: 9100
          hostPort: 9100
          protocol: TCP
          name: metrics
        volumeMounts:
        - mountPath: /host/proc
          name: proc
        - mountPath: /host/sys
          name: sys
        - mountPath: /host
          name: rootfs
        args:
        - --path.procfs=/host/proc
        - --path.sysfs=/host/sys
        - --path.rootfs=/host
      volumes:
        - name: proc
          hostPath:
            path: /proc
        - name: sys
          hostPath:
            path: /sys
        - name: rootfs
          hostPath:
            path: /
      hostNetwork: true
      hostPID: true
---
apiVersion: v1
kind: Service
metadata:
  annotations:
    prometheus.io/scrape: "true"
  labels:
    k8s-app: node-exporter
  name: node-exporter
  namespace: monitor
spec:
  type: NodePort
  ports:
  - name: http
    port: 9100
    nodePort: 39100
    protocol: TCP
  selector:
    k8s-app: node-exporter

 kubectl get pod -n monitor

 

apache nifi grafana 集成 grafana集群部署_kubernetes_07

<nodeIP>:<9100>

apache nifi grafana 集成 grafana集群部署_prometheus_08

2.3部署prometheus

prometheus资源包括ConfigMap资源、Deployment资源、Service资源。

vim case3-1-prometheus-cfg.yaml

---
kind: ConfigMap
apiVersion: v1
metadata:
  labels:
    app: prometheus
  name: prometheus-config
  namespace: monitor 
data:
  prometheus.yml: |
    global:
      scrape_interval: 15s
      scrape_timeout: 10s
      evaluation_interval: 1m
    scrape_configs:
    - job_name: 'kubernetes-node'
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - source_labels: [__address__]
        regex: '(.*):10250'
        replacement: '${1}:9100'
        target_label: __address__
        action: replace
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
    - job_name: 'kubernetes-node-cadvisor'
      kubernetes_sd_configs:
      - role:  node
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - target_label: __address__
        replacement: kubernetes.default.svc:443
      - source_labels: [__meta_kubernetes_node_name]
        regex: (.+)
        target_label: __metrics_path__
        replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
    - job_name: 'kubernetes-apiserver'
      kubernetes_sd_configs:
      - role: endpoints
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
        action: keep
        regex: default;kubernetes;https
    - job_name: 'kubernetes-service-endpoints'
      kubernetes_sd_configs:
      - role: endpoints
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
        action: replace
        target_label: __scheme__
        regex: (https?)
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
        action: replace
        target_label: __address__
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_service_name]
        action: replace
        target_label: kubernetes_service_name

注意case3-2配置文件中的k8s-master记得更改,不能改成本地主机ip(原因未知)

设置192.168.136.21(k8s-master)节点为prometheus数据存放路径 /data/prometheus。

apache nifi grafana 集成 grafana集群部署_云计算_09

vim case3-2-prometheus-deployment.yaml

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus-server
  namespace: monitor
  labels:
    app: prometheus
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus
      component: server
    #matchExpressions:
    #- {key: app, operator: In, values: [prometheus]}
    #- {key: component, operator: In, values: [server]}
  template:
    metadata:
      labels:
        app: prometheus
        component: server
      annotations:
        prometheus.io/scrape: 'false'
    spec:
      nodeName: k8s-master
      serviceAccountName: monitor
      containers:
      - name: prometheus
        image: prom/prometheus:v2.31.2
        imagePullPolicy: IfNotPresent
        command:
          - prometheus
          - --config.file=/etc/prometheus/prometheus.yml
          - --storage.tsdb.path=/prometheus
          - --storage.tsdb.retention=720h
        ports:
        - containerPort: 9090
          protocol: TCP
        volumeMounts:
        - mountPath: /etc/prometheus/prometheus.yml
          name: prometheus-config
          subPath: prometheus.yml
        - mountPath: /prometheus/
          name: prometheus-storage-volume
      volumes:
        - name: prometheus-config
          configMap:
            name: prometheus-config
            items:
              - key: prometheus.yml
                path: prometheus.yml
                mode: 0644
        - name: prometheus-storage-volume
          hostPath:
           path: /data/prometheusdata
           type: Directory

创建sa和clusterrolebinding

kubectl create serviceaccount monitor -n monitor

kubectl create clusterrolebinding monitor-clusterrolebinding -n monitor --clusterrole=cluster-admin --serviceaccount=monitor:monitor

kubectl apply -f case3-2-prometheus-deployment.yaml

apache nifi grafana 集成 grafana集群部署_docker_10

 case3-2这一步有大坑,用“k8s-master"可以,但是用“192.168.136.21”就不可以!Deployment和pod一直起不来,查看pod的日志显示找不到“192.168.136.21”主机,改成“k8s-master”也不行,几天后突然就好了,期间有关过机。(原因未知)

 

apache nifi grafana 集成 grafana集群部署_prometheus_11

apache nifi grafana 集成 grafana集群部署_云计算_12

vim case3-3-prometheus-svc.yaml

---
apiVersion: v1
kind: Service
metadata:
  name: prometheus
  namespace: monitor
  labels:
    app: prometheus
spec:
  type: NodePort
  ports:
    - port: 9090
      targetPort: 9090
      nodePort: 30090
      protocol: TCP
  selector:
    app: prometheus
    component: server

kubectl apply -f case3-3-prometheus-svc.yaml

apache nifi grafana 集成 grafana集群部署_vim_13

2.4部署rbac权限

包括Secret资源、ServiceAccount资源、ClusterRole资源、ClusterRoleBinding资源,ServiceAccount是服务账户,ClusterRole是权限规则,ClusterRoleBinding是将ServiceAccount和ClusterRole进行绑定。

pod和 apiserver 的认证信息通过 secret 进行定义,由于认证信息属于敏感信息,所以需要保存在secret 资源当中,并以存储卷的方式挂载到 Pod 当中。从而让 Pod 内运行的应用通过对应的secret 中的信息来连接 apiserver,并完成认证。

rbac权限管理是k8s的一套认证系统,上面只是简单讲解,

vim case4-prom-rbac.yaml

apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus
  namespace: monitor

---
apiVersion: v1
kind: Secret
type: kubernetes.io/service-account-token
metadata:
  name: monitor-token
  namespace: monitor
  annotations:
    kubernetes.io/service-account.name: "prometheus"
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus
rules:
- apiGroups:
  - ""
  resources:
  - nodes
  - services
  - endpoints
  - pods
  - nodes/proxy
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - "extensions"
  resources:
    - ingresses
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - configmaps
  - nodes/metrics
  verbs:
  - get
- nonResourceURLs:
  - /metrics
  verbs:
  - get
---
#apiVersion: rbac.authorization.k8s.io/v1beta1
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: monitor

kubectl apply -f case4-prom-rbac.yaml

apache nifi grafana 集成 grafana集群部署_docker_14

apache nifi grafana 集成 grafana集群部署_docker_15

2.5.部署 metrics

包括Deployment资源、Service资源、ServiceAccount资源、ClusterRole资源、ClusterRoleBinding资源。

注意是部署在kube-system!

apache nifi grafana 集成 grafana集群部署_vim_16

vim case5-kube-state-metrics-deploy.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: kube-state-metrics
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kube-state-metrics
  template:
    metadata:
      labels:
        app: kube-state-metrics
    spec:
      serviceAccountName: kube-state-metrics
      containers:
      - name: kube-state-metrics
        image: registry.cn-hangzhou.aliyuncs.com/zhangshijie/kube-state-metrics:v2.6.0 
        ports:
        - containerPort: 8080

---
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: kube-state-metrics
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: kube-state-metrics
rules:
- apiGroups: [""]
  resources: ["nodes", "pods", "services", "resourcequotas", "replicationcontrollers", "limitranges", "persistentvolumeclaims", "persistentvolumes", "namespaces", "endpoints"]
  verbs: ["list", "watch"]
- apiGroups: ["extensions"]
  resources: ["daemonsets", "deployments", "replicasets"]
  verbs: ["list", "watch"]
- apiGroups: ["apps"]
  resources: ["statefulsets"]
  verbs: ["list", "watch"]
- apiGroups: ["batch"]
  resources: ["cronjobs", "jobs"]
  verbs: ["list", "watch"]
- apiGroups: ["autoscaling"]
  resources: ["horizontalpodautoscalers"]
  verbs: ["list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: kube-state-metrics
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: kube-state-metrics
subjects:
- kind: ServiceAccount
  name: kube-state-metrics
  namespace: kube-system

---
apiVersion: v1
kind: Service
metadata:
  annotations:
    prometheus.io/scrape: 'true'
  name: kube-state-metrics
  namespace: kube-system
  labels:
    app: kube-state-metrics
spec:
  type: NodePort
  ports:
  - name: kube-state-metrics
    port: 8080
    targetPort: 8080
    nodePort: 31666
    protocol: TCP
  selector:
    app: kube-state-metrics

 kubectl apply -f case5-kube-state-metrics-deploy.yaml

apache nifi grafana 集成 grafana集群部署_docker_17

apache nifi grafana 集成 grafana集群部署_云计算_18

apache nifi grafana 集成 grafana集群部署_docker_19

2.6部署grafana

grafana图形界面对接prometheus数据源,包括Deployment资源、Service资源。

vim grafana-enterprise.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: grafana-enterprise
  namespace: monitor
spec:
  replicas: 1
  selector:
    matchLabels:
      app: grafana-enterprise
  template:
    metadata:
      labels:
        app: grafana-enterprise
    spec:
      containers:
      - image: grafana/grafana
        imagePullPolicy: Always
        #command:
        #  - "tail"
        #  - "-f"
        #  - "/dev/null"
        securityContext:
          allowPrivilegeEscalation: false
          runAsUser: 0
        name: grafana
        ports:
        - containerPort: 3000
          protocol: TCP
        volumeMounts:
        - mountPath: "/var/lib/grafana"
          name: data
        resources:
          requests:
            cpu: 100m
            memory: 100Mi
          limits:
            cpu: 500m
            memory: 2500Mi
      volumes:
      - name: data
        emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
  name: grafana
  namespace: monitor
spec:
  type: NodePort
  ports:
  - port: 80
    targetPort: 3000
    nodePort: 31000
  selector:
    app: grafana-enterprise

kubectl apply -f grafana-enterprise.yaml

apache nifi grafana 集成 grafana集群部署_云计算_20

apache nifi grafana 集成 grafana集群部署_prometheus_21

apache nifi grafana 集成 grafana集群部署_云计算_22

账号admin 密码admin

添加数据源data sources,命名为prometheus,注意端口号30090

apache nifi grafana 集成 grafana集群部署_vim_23

 添加模板13332,还可以添加其他模板,例如:14981、13824、14518。

点击左侧“+”号,选择“import”导入模板。

apache nifi grafana 集成 grafana集群部署_docker_24

 模板13332

apache nifi grafana 集成 grafana集群部署_vim_25

cadvisor模板编号14282,此处有个bug尚未解决,可以监控集群内所有容器的性能资源,但如果选中其中一个容器就无法显示数据。(应该是可以解决的)。

apache nifi grafana 集成 grafana集群部署_docker_26

 现在显示的是pod的ID,不方便管理员浏览,为了方便显示成pod的name,模板右侧的“设置图标”,选择“Variables”,选择第二个,将“name”改成“pod”即可。

apache nifi grafana 集成 grafana集群部署_prometheus_27

apache nifi grafana 集成 grafana集群部署_云计算_28

  仪表台的每一个板块也需要更改,点击板块标题,选择“Edit”,“name”改成“pod”。

apache nifi grafana 集成 grafana集群部署_云计算_29

apache nifi grafana 集成 grafana集群部署_vim_30

3.测试监控效果

新建名为nginx01的deployment任务,测试监控结果。

vim nginx01.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx01
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx01
  template:
    metadata:
      labels:
        app: nginx01
    spec:
      containers:
      - name: nginx
        image: nginx:1.7.9

 kubectl apply -f nginx01.yaml 

apache nifi grafana 集成 grafana集群部署_vim_31

apache nifi grafana 集成 grafana集群部署_prometheus_32

出现两个nginx01,因为设置了2个副本。

apache nifi grafana 集成 grafana集群部署_docker_33

 至此,cadvisor+prometheus+grafana集群监控部署完成。