搭建Heapster+InfluxDB+Grafana集群性能监控平台

  在大规模容器集群中,需要对所有node和全部容器进行性能监控。kubernetes建议使用一套工具来实现集群性能数据的采集、存储和展示:heapster、InfluxDB和Grafana。

heapster:对集群中各个Node上cAdvisor的数据采集汇聚的系统,通过访问每个node上kubelet的API,再通过kubelet调用cAdvisor的API来采集该节点上所有容器的性能数据。Heapster对性能书库进行聚合,并将结果保存到后端存储系统中。

InfluxDB:是分布式时序数据库(每条记录都带有时间戳属性),主要用于实时数据采集、事件跟踪记录、存储时间图表、原始数据等。InfluxDB提供了REST API用于数据的存储和查询。

Grafana:通过dashboard将InfluxDB中的时序数据展现成图表或曲线等形式,便于运维人员查看集群的运行状态。

  基于Heapster+InfluxDB+Grafana集群监控系统总体框架如图所示

k8s hdfs持久化 hdfs on k8s_2d

   Heapster+InfluxDB+Grafana均以Pod的形式启动和运行。

1. 上传相关的镜像到私有仓库

[root@kub_master ~]# docker load -i docker_heapster.tar.gz 
c12ecfd4861d: Loading layer [==================================================>] 130.9 MB/130.9 MB
5f70bf18a086: Loading layer [==================================================>] 1.024 kB/1.024 kB
998608e2fcd4: Loading layer [==================================================>] 45.16 MB/45.16 MB
591569fa6c34: Loading layer [==================================================>] 126.5 MB/126.5 MB
0b2fe2c6ef6b: Loading layer [==================================================>] 136.2 MB/136.2 MB
f9f3fb66a490: Loading layer [==================================================>] 322.9 MB/322.9 MB
6e2e798f8998: Loading layer [==================================================>]  2.56 kB/2.56 kB
21ac53bc7cd6: Loading layer [==================================================>] 5.632 kB/5.632 kB
7f96c89af577: Loading layer [==================================================>] 79.98 MB/79.98 MB
4371d588893a: Loading layer [==================================================>] 150.2 MB/150.2 MB
Loaded image: docker.io/kubernetes/heapster:canary
[root@kub_master ~]# docker load -i docker_heapster_influxdb.tar.gz 
8ceab61e5aa8: Loading layer [==================================================>] 197.2 MB/197.2 MB
3c84ae1bbde2: Loading layer [==================================================>] 208.9 kB/208.9 kB
e8061ac24ae3: Loading layer [==================================================>] 4.608 kB/4.608 kB
5f70bf18a086: Loading layer [==================================================>] 1.024 kB/1.024 kB
58484cf9c5e7: Loading layer [==================================================>] 63.49 MB/63.49 MB
07d2297acddc: Loading layer [==================================================>] 4.608 kB/4.608 kB
Loaded image: docker.io/kubernetes/heapster_influxdb:v0.5
[root@kub_master ~]# docker images |grep heapster_influxdb
docker.io/kubernetes/heapster_influxdb                             v0.5                a47993810aac        5 years ago         251 MB
[root@kub_master ~]# docker load -i docker_heapster_grafana.tar.gz 
c69ae1aa4698: Loading layer [==================================================>]   131 MB/131 MB
5f70bf18a086: Loading layer [==================================================>] 1.024 kB/1.024 kB
75a5b97e491c: Loading layer [==================================================>] 127.1 MB/127.1 MB
e188e2340071: Loading layer [==================================================>] 16.92 MB/16.92 MB
5e43af080be6: Loading layer [==================================================>] 55.81 kB/55.81 kB
7ecd917a174c: Loading layer [==================================================>] 4.096 kB/4.096 kB
Loaded image: docker.io/kubernetes/heapster_grafana:v2.6.0
[root@kub_master ~]# docker images |grep heapster_grafana
docker.io/kubernetes/heapster_grafana                              v2.6.0              4fe73eb13e50        4 years ago         267 MB

#上传以上三款镜像至私有仓库

[root@kub_master ~]# docker tag docker.io/kubernetes/heapster:canary 192.168.0.212:5000/heapster:canary
[root@kub_master ~]# docker push 192.168.0.212:5000/heapster:canary
The push refers to a repository [192.168.0.212:5000/heapster]
5f70bf18a086: Mounted from kubernetes-dashboard-amd64 
4371d588893a: Pushed 
7f96c89af577: Pushed 
21ac53bc7cd6: Pushed 
6e2e798f8998: Pushed 
f9f3fb66a490: Pushed 
0b2fe2c6ef6b: Pushed 
591569fa6c34: Pushed 
998608e2fcd4: Pushed 
c12ecfd4861d: Pushed 
canary: digest: sha256:a024ec78a53b4b9b0bd13b12a6aec29913359e17d30c8d0c35d0a510d10a9d75 size: 4276
[root@kub_master ~]# docker tag docker.io/kubernetes/heapster_grafana:v2.6.0 192.168.0.212:5000/heapster_grafana:v2.6.0
[root@kub_master ~]# docker push 192.168.0.212:5000/heapster_grafana:v2.6.0
The push refers to a repository [192.168.0.212:5000/heapster_grafana]
5f70bf18a086: Mounted from heapster 
7ecd917a174c: Pushed 
5e43af080be6: Pushed 
e188e2340071: Pushed 
75a5b97e491c: Pushed 
c69ae1aa4698: Pushed 
v2.6.0: digest: sha256:1299d1ebb518416f90895a29fc58ad87149af610c8d5d9376c3379d65b6d9568 size: 2811
[root@kub_master ~]# docker tag docker.io/kubernetes/heapster_influxdb:v0.5 192.168.0.212:5000/heapster_influxdb:v0.5
[root@kub_master ~]# docker push 192.168.0.212:5000/heapster_influxdb:v0.5
The push refers to a repository [192.168.0.212:5000/heapster_influxdb]
5f70bf18a086: Mounted from heapster_grafana 
07d2297acddc: Pushed 
58484cf9c5e7: Pushed 
e8061ac24ae3: Pushed 
3c84ae1bbde2: Pushed 
8ceab61e5aa8: Pushed 
v0.5: digest: sha256:32cdba763a3fab92deeb8074b47ace45ee4a15d537e318ceae89f1e2dd069b53 size: 3012

2. 下载heapster资料包

[root@kub_master ~]# cd k8s/
[root@kub_master k8s]# mkdir heapster
[root@kub_master k8s]# cd heapster/
[root@kub_master heapster]# wget https://www.qstack.com.cn/heapster-influxdb.zip
--2020-09-28 20:29:42--  https://www.qstack.com.cn/heapster-influxdb.zip
Resolving www.qstack.com.cn (www.qstack.com.cn)... 180.96.32.89
Connecting to www.qstack.com.cn (www.qstack.com.cn)|180.96.32.89|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2636 (2.6K) [application/zip]
Saving to: ‘heapster-influxdb.zip’

100%[===============================================================================================>] 2,636       --.-K/s   in 0s      

2020-09-28 20:29:43 (40.6 MB/s) - ‘heapster-influxdb.zip’ saved [2636/2636]

[root@kub_master heapster]# unzip heapster-influxdb.zip 
Archive:  heapster-influxdb.zip
   creating: heapster-influxdb/
  inflating: heapster-influxdb/grafana-service.yaml  
  inflating: heapster-influxdb/heapster-controller.yaml  
  inflating: heapster-influxdb/heapster-service.yaml  
  inflating: heapster-influxdb/influxdb-grafana-controller.yaml  
  inflating: heapster-influxdb/influxdb-service.yaml  
[root@kub_master heapster]# cd heapster-influxdb
[root@kub_master heapster-influxdb]# ll
total 20
-rw-r--r-- 1 root root  414 Sep 14  2016 grafana-service.yaml
-rw-r--r-- 1 root root  682 Jul  1  2019 heapster-controller.yaml
-rw-r--r-- 1 root root  249 Sep 14  2016 heapster-service.yaml
-rw-r--r-- 1 root root 1605 Jul  1  2019 influxdb-grafana-controller.yaml
-rw-r--r-- 1 root root  259 Sep 14  2016 influxdb-service.yaml

3. 部署Heapster容器

[root@kub_master heapster-influxdb]# vim heapster-controller.yaml 
[root@kub_master heapster-influxdb]# cat heapster-controller.yaml
apiVersion: v1
kind: ReplicationController
metadata:
  labels:
    k8s-app: heapster
    name: heapster
    version: v6
  name: heapster
  namespace: kube-system
spec:
  replicas: 1
  selector:
    k8s-app: heapster
    version: v6
  template:
    metadata:
      labels:
        k8s-app: heapster
        version: v6
    spec:
      nodeSelector:
         kubernetes.io/hostname: 192.168.0.212
      containers:
      - name: heapster
        image: 192.168.0.212:5000/heapster:canary
        imagePullPolicy: IfNotPresent
        command:
        - /heapster
        - --source=kubernetes:http://192.168.0.212:8080?inClusterConfig=false     #配置采集来源,为master URL地址
        - --sink=influxdb:http://monitoring-influxdb:8086                         #配置后端存储系统,使用influxdb数据库
[root@kub_master heapster-influxdb]# vim heapster-service.yaml 
[root@kub_master heapster-influxdb]# cat heapster-service.yaml 
apiVersion: v1
kind: Service
metadata:
  labels:
    kubernetes.io/cluster-service: 'true'
    kubernetes.io/name: Heapster
  name: heapster
  namespace: kube-system
spec:
  ports:
  - port: 80
    targetPort: 8082
  selector:
    k8s-app: heapster
[root@kub_master heapster-influxdb]# kubectl create -f heapster-controller.yaml
replicationcontroller "heapster" created
[root@kub_master heapster-influxdb]# kubectl create -f heapster-service.yaml 
service "heapster" created
[root@kub_master heapster-influxdb]# kubectl get all --namespace=kube-system
NAME                                 DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deploy/kube-dns                      1         1         1            1           2d
deploy/kubernetes-dashboard-latest   1         1         1            1           1d

NAME          DESIRED   CURRENT   READY     AGE
rc/heapster   1         1         1         26s

NAME                       CLUSTER-IP        EXTERNAL-IP   PORT(S)         AGE
svc/heapster               192.168.211.92    <none>        80/TCP          11s
svc/kube-dns               192.168.230.254   <none>        53/UDP,53/TCP   2d
svc/kubernetes-dashboard   192.168.108.11    <none>        80/TCP          1d

NAME                                        DESIRED   CURRENT   READY     AGE
rs/kube-dns-4072910292                      1         1         1         2d
rs/kubernetes-dashboard-latest-3255858758   1         1         1         1d

NAME                                              READY     STATUS    RESTARTS   AGE
po/heapster-jhr57                                 1/1       Running   0          26s
po/kube-dns-4072910292-4qb6c                      4/4       Running   0          2d
po/kubernetes-dashboard-latest-3255858758-fqsbc   1/1       Running   0          1d

4. 部署InfluxDB和Grafana

#InfluxDB和Grafana的RC定义:
[root@kub_master heapster-influxdb]# vim influxdb-grafana-controller.yaml 
[root@kub_master heapster-influxdb]# cat influxdb-grafana-controller.yaml 
apiVersion: v1
kind: ReplicationController
metadata:
  labels:
    name: influxGrafana
  name: influxdb-grafana
  namespace: kube-system
spec:
  replicas: 1
  selector:
    name: influxGrafana
  template:
    metadata:
      labels:
        name: influxGrafana
    spec:
      containers:
      - name: influxdb
        image: 192.168.0.212:5000/heapster_influxdb:v0.5
        imagePullPolicy: IfNotPresent
        volumeMounts:
        - mountPath: /data
          name: influxdb-storage
      - name: grafana
        imagePullPolicy: IfNotPresent
        image: 192.168.0.212:5000/heapster_grafana:v2.6.0
        env:
          - name: INFLUXDB_SERVICE_URL
            value: http://monitoring-influxdb:8086
            # The following env variables are required to make Grafana accessible via
            # the kubernetes api-server proxy. On production clusters, we recommend
            # removing these env variables, setup auth for grafana, and expose the grafana
            # service using a LoadBalancer or a public IP.
          - name: GF_AUTH_BASIC_ENABLED
            value: "false"
          - name: GF_AUTH_ANONYMOUS_ENABLED
            value: "true"
          - name: GF_AUTH_ANONYMOUS_ORG_ROLE
            value: Admin
          - name: GF_SERVER_ROOT_URL
            value: /api/v1/proxy/namespaces/kube-system/services/monitoring-grafana/
        volumeMounts:
        - mountPath: /var
          name: grafana-storage
      nodeSelector:
         kubernetes.io/hostname: 192.168.0.212
      volumes:
      - name: influxdb-storage
        emptyDir: {}
      - name: grafana-storage
        emptyDir: {}
#InfluxDB Service 定义
[root@kub_master heapster-influxdb]# vim influxdb-service.yaml 
[root@kub_master heapster-influxdb]# cat influxdb-service.yaml 
apiVersion: v1
kind: Service
metadata:
  labels: null
  name: monitoring-influxdb
  namespace: kube-system
spec:
  ports:
  - name: http
    port: 8083
    targetPort: 8083
  - name: api
    port: 8086
    targetPort: 8086
  selector:
    name: influxGrafana
#Grafana Service定义
[root@kub_master heapster-influxdb]# vim grafana-service.yaml 
[root@kub_master heapster-influxdb]# cat grafana-service.yaml 
apiVersion: v1
kind: Service
metadata:
  labels:
    kubernetes.io/cluster-service: 'true'
    kubernetes.io/name: monitoring-grafana
  name: monitoring-grafana
  namespace: kube-system
spec:
  # In a production setup, we recommend accessing Grafana through an external Loadbalancer
  # or through a public IP. 
  # type: LoadBalancer
  ports:
  - port: 80
    targetPort: 3000
  selector:
    name: influxGrafana
[root@kub_master heapster-influxdb]# kubectl create -f influxdb-grafana-controller.yaml 
replicationcontroller "influxdb-grafana" created
[root@kub_master heapster-influxdb]# kubectl get all --namespace=kube-system
NAME                                 DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deploy/kube-dns                      1         1         1            1           2d
deploy/kubernetes-dashboard-latest   1         1         1            1           1d

NAME                  DESIRED   CURRENT   READY     AGE
rc/heapster           1         1         1         4m
rc/influxdb-grafana   1         1         1         5s

NAME                       CLUSTER-IP        EXTERNAL-IP   PORT(S)         AGE
svc/heapster               192.168.211.92    <none>        80/TCP          4m
svc/kube-dns               192.168.230.254   <none>        53/UDP,53/TCP   2d
svc/kubernetes-dashboard   192.168.108.11    <none>        80/TCP          1d

NAME                                        DESIRED   CURRENT   READY     AGE
rs/kube-dns-4072910292                      1         1         1         2d
rs/kubernetes-dashboard-latest-3255858758   1         1         1         1d

NAME                                              READY     STATUS    RESTARTS   AGE
po/heapster-jhr57                                 1/1       Running   0          4m
po/influxdb-grafana-thgp6                         2/2       Running   0          5s
po/kube-dns-4072910292-4qb6c                      4/4       Running   0          2d
po/kubernetes-dashboard-latest-3255858758-fqsbc   1/1       Running   0          1d
[root@kub_master heapster-influxdb]# kubectl create -f influxdb-service.yaml 
service "monitoring-influxdb" created
[root@kub_master heapster-influxdb]# kubectl create -f grafana-service.yaml 
service "monitoring-grafana" created
[root@kub_master heapster-influxdb]# kubectl get all --namespace=kube-system
NAME                                 DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deploy/kube-dns                      1         1         1            1           2d
deploy/kubernetes-dashboard-latest   1         1         1            1           1d

NAME                  DESIRED   CURRENT   READY     AGE
rc/heapster           1         1         1         5m
rc/influxdb-grafana   1         1         1         44s

NAME                       CLUSTER-IP        EXTERNAL-IP   PORT(S)             AGE
svc/heapster               192.168.211.92    <none>        80/TCP              5m
svc/kube-dns               192.168.230.254   <none>        53/UDP,53/TCP       2d
svc/kubernetes-dashboard   192.168.108.11    <none>        80/TCP              1d
svc/monitoring-grafana     192.168.15.46     <none>        80/TCP              3s
svc/monitoring-influxdb    192.168.188.130   <none>        8083/TCP,8086/TCP   10s

NAME                                        DESIRED   CURRENT   READY     AGE
rs/kube-dns-4072910292                      1         1         1         2d
rs/kubernetes-dashboard-latest-3255858758   1         1         1         1d

NAME                                              READY     STATUS    RESTARTS   AGE
po/heapster-jhr57                                 1/1       Running   0          5m
po/influxdb-grafana-thgp6                         2/2       Running   0          44s
po/kube-dns-4072910292-4qb6c                      4/4       Running   0          2d
po/kubernetes-dashboard-latest-3255858758-fqsbc   1/1       Running   0          1d

5. 访问dashboard

k8s hdfs持久化 hdfs on k8s_k8s hdfs持久化_02

k8s hdfs持久化 hdfs on k8s_5e_03

6. 通过cAdvisor页面查看容器的运行状态

    在kubernetes系统中,cAdvisor已被默认集成到了kubelet组件内,当kubelet服务启动时,它会自动启动cAdvisor服务,然后cAdvisor会实时采集所在节点的性能指标及在节点上运行的容器的性能指标。kubelet的启动参数--cadvisor-port可自定义cAdvisor对外提供服务的端口号,默认为4194.。

   cAdvisor提供了web页面可供浏览器访问。如监控node2节点的性能指标

[root@kub_node2 ~]# vim /etc/kubernetes/kubelet 
[root@kub_node2 ~]# cat /etc/kubernetes/kubelet 
###
# kubernetes kubelet (minion) config

# The address for the info server to serve on (set to 0.0.0.0 or "" for all interfaces)
KUBELET_ADDRESS="--address=0.0.0.0"

# The port for the info server to serve on
KUBELET_PORT="--port=10250"

# You may leave this blank to use the actual hostname
KUBELET_HOSTNAME="--hostname-override=192.168.0.208"

# location of the api-server
KUBELET_API_SERVER="--api-servers=http://192.168.0.212:8080"

# pod infrastructure container
KUBELET_POD_INFRA_CONTAINER="--pod-infra-container-image=192.168.0.212:5000/pod-infrastructure:latest"

# Add your own!
KUBELET_ARGS="--cluster_dns=192.168.230.254 --cluster_domain=cluster.local --cadvisor-port=4194"
[root@kub_node2 ~]# systemctl restart kubelet

在浏览器中输入http://192.168.0.212:4194来访问cAdvisor的监控页面。

cAdvisor的主页显示了主机的实时运行状态,包括cpu使用情况、内存使用情况、网络吞吐量及文件系统使用情况等信息。

k8s hdfs持久化 hdfs on k8s_2d_04

k8s hdfs持久化 hdfs on k8s_5e_05

k8s hdfs持久化 hdfs on k8s_docker_06

k8s hdfs持久化 hdfs on k8s_docker_07

   容器的性能数据对于集群监控非常有用,系统管理员可以根据cAdvisor提供的数据进行分析和警告。但是cAdvisor是在每台Node上运行的,只能采集本机的性能指标数据。