1. 通过cAdvisor获取监控指标

cAdvisor是Kubernetes的生态中为容器监控数据采集的Agent,已经集成到Kubernetes中不需要单独部署了。

Kubernetes 1.7.3之前,cAdvisor的metrics数据集成在kubelet的metrics中,通过节点开放的4194端口获取数据
Kubernetes 1.7.3之后,cAdvisor的metrics被从kubelet的metrics独立出来了,在prometheus采集的时候变成两个scrape的job。网上很多文档记录都说在node节点会开放4194端口,可以通过该端口获取cAdvisor的metrics数据,新版本kubelet中的cadvisor没有对外开放4194端口,只能通过apiserver提供的api做代理获取监控指标metrics。

  • cAdvisor收集的监控指标类型
    cAdvisor能够获取当前节点上运行的所有容器的资源使用情况。监控指标key的前缀是container_*
container_cpu_*
container_fs_*
container_memeory_*
container_network_*
container_spec_*
container_last_seen,  
container_scrape_error, 
container_start_seconds, 
container_tasks_state
  • API

项目

API

prometheus配置

备注

cAdvisor的metric

/api/v1/nodes/{node}/proxy/metrics/cadvisor

replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor

  • cAdvisor 获取metrics接口测试
kubectl get --raw "/api/v1/nodes/ejucsnode-shqs-1/proxy/metrics/cadvisor"

或者如下方式:

kubectl proxy --port=6080
curl http://localhost:6080/api/v1/nodes/ejucsnode-shqs-1/proxy/metrics/cadvisor
		
	# HELP cadvisor_version_info A metric with a constant '1' value labeled by kernel version, OS version, docker version, cadvisor version & cadvisor revision.
	# TYPE cadvisor_version_info gauge
	cadvisor_version_info{cadvisorRevision="",cadvisorVersion="",dockerVersion="18.06.1-ce",kernelVersion="3.10.0-862.el7.x86_64",osVersion="CentOS Linux 7 (Core)"} 1
	# HELP container_cpu_cfs_periods_total Number of elapsed enforcement period intervals.
	# TYPE container_cpu_cfs_periods_total counter
	container_cpu_cfs_periods_total{container_name="",id="/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-poda5666056_3349_11e9_a0fa_14187763656c.slice",image="",name="",namespace="kube-system",pod_name="prometheus-576b4fb6bb-4947p"} 849686
	container_cpu_cfs_periods_total{container_name="prometheus",id="/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-poda5666056_3349_11e9_a0fa_14187763656c.slice/docker-58c340d3f0bbff761882a8a53127dc4e95b962514106748baa4a7dc01129962b.scope",image="sha256:690f4cf8dee25c239aa517a16dd73392ecd81485b29cad881d901a99b5b1a303",name="k8s_prometheus_prometheus-576b4fb6bb-4947p_kube-system_a5666056-3349-11e9-a0fa-14187763656c_0",namespace="kube-system",pod_name="prometheus-576b4fb6bb-4947p"} 849710
	# HELP container_cpu_cfs_throttled_periods_total Number of throttled period intervals.
	# TYPE container_cpu_cfs_throttled_periods_total counter
	container_cpu_cfs_throttled_periods_total{container_name="",id="/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-poda5666056_3349_11e9_a0fa_14187763656c.slice",image="",name="",namespace="kube-system",pod_name="prometheus-576b4fb6bb-4947p"} 10576
	container_cpu_cfs_throttled_periods_total{container_name="prometheus",id="/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-poda5666056_3349_11e9_a0fa_14187763656c.slice/docker-58c340d3f0bbff761882a8a53127dc4e95b962514106748baa4a7dc01129962b.scope",image="sha256:690f4cf8dee25c239aa517a16dd73392ecd81485b29cad881d901a99b5b1a303",name="k8s_prometheus_prometheus-576b4fb6bb-4947p_kube-system_a5666056-3349-11e9-a0fa-14187763656c_0",namespace="kube-system",pod_name="prometheus-576b4fb6bb-4947p"} 10266
	# HELP container_cpu_cfs_throttled_seconds_total Total time duration the container has been throttled.
	# TYPE container_cpu_cfs_throttled_seconds_total counter
	container_cpu_cfs_throttled_seconds_total{container_name="",id="/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-poda5666056_3349_11e9_a0fa_14187763656c.slice",image="",name="",namespace="kube-system",pod_name="prometheus-576b4fb6bb-4947p"} 16523.995575912
	container_cpu_cfs_throttled_seconds_total{container_name="prometheus",id="/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-poda5666056_3349_11e9_a0fa_14187763656c.slice/docker-58c340d3f0bbff761882a8a53127dc4e95b962514106748baa4a7dc01129962b.scope",image="sha256:690f4cf8dee25c239aa517a16dd73392ecd81485b29cad881d901a99b5b1a303",name="k8s_prometheus_prometheus-576b4fb6bb-4947p_kube-system_a5666056-3349-11e9-a0fa-14187763656c_0",namespace="kube-system",pod_name="prometheus-576b4fb6bb-4947p"} 10673.627579073

	... ... ... ... #省略很多

2. 通过kubelet获取监控指标

  • kubelet收集的监控指标类型
    待补充
  • API

项目

API

prometheus配置

备注

kubelnet的metric

/api/v1/nodes/{node}/proxy/metrics

replacement: /api/v1/nodes/${1}/proxy/metrics

  • kubelet 获取metrics接口测试
kubectl proxy --port=6080
curl http://localhost:6080/api/v1/nodes/ejucsnode-shqs-1/proxy/metrics
	# HELP apiserver_audit_event_total Counter of audit events generated and sent to the audit backend.
	# TYPE apiserver_audit_event_total counter
	apiserver_audit_event_total 0
	# HELP apiserver_audit_requests_rejected_total Counter of apiserver requests rejected due to an error in audit logging backend.
	# TYPE apiserver_audit_requests_rejected_total counter
	apiserver_audit_requests_rejected_total 0
	... ... ... ... #省略很多
	# HELP apiserver_storage_data_key_generation_latencies_microseconds Latencies in microseconds of data encryption key(DEK) generation operations.
	# TYPE apiserver_storage_data_key_generation_latencies_microseconds histogram
	apiserver_storage_data_key_generation_latencies_microseconds_bucket{le="5"} 0
	apiserver_storage_data_key_generation_latencies_microseconds_bucket{le="10"} 0
	apiserver_storage_data_key_generation_latencies_microseconds_bucket{le="20"} 0
	apiserver_storage_data_key_generation_latencies_microseconds_bucket{le="40"} 0

	... ... ... ... #省略很多

3. node_exporter

Prometheus提供的NodeExporter项目可以提取主机节点的关键度量指标,通过Kubernetes的DeamonSet模式可以在各主机节点上部署一个NodeExporter实例,实现对主机性能指标数据的监控。

  • 定义文件 prometheus-node-exporter-daemonset.yaml
cat prometheus-node-exporter-daemonset.yaml 

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: prometheus-node-exporter
  namespace: kube-system
  labels:
    app: prometheus-node-exporter
spec:
  template:
    metadata:
      name: prometheus-node-exporter
      labels:
        app: prometheus-node-exporter
    spec:
      containers:
      - image: prom/node-exporter:v0.17.0
        imagePullPolicy: IfNotPresent
        name: prometheus-node-exporter
        ports:
        - name: prom-node-exp
          #^ must be an IANA_SVC_NAME (at most 15 characters, ..)
          containerPort: 9100
          hostPort: 9100
      tolerations:
      - key: "node-role.kubernetes.io/master"
        effect: "NoSchedule"
      hostNetwork: true
      hostPID: true
      hostIPC: true
      restartPolicy: Always
---
apiVersion: v1
kind: Service
metadata:
  annotations:
    prometheus.io/scrape: 'true'
    prometheus.io/app-metrics: 'true'
    prometheus.io/app-metrics-path: '/metrics'
  name: prometheus-node-exporter
  namespace: kube-system
  labels:
    app: prometheus-node-exporter
spec:
  clusterIP: None
  ports:
    - name: prometheus-node-exporter
      port: 9100
      protocol: TCP
  selector:
    app: prometheus-node-exporter
  type: ClusterIP
  • 配置指令
kubectl apply -f prometheus-node-exporter-daemonset.yaml 
    daemonset.extensions/prometheus-node-exporter created
    service/prometheus-node-exporter created

kubectl get -f prometheus-node-exporter-daemonset.yaml 
	NAME                                            DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
	daemonset.extensions/prometheus-node-exporter   6         6         6       6            6           <none>          5m
	
	NAME                               TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)    AGE
	service/prometheus-node-exporter   ClusterIP   None         <none>        9100/TCP   5m
  • 验证
任意主机节点上:
netstat -pltn |grep 9100
	tcp6       0      0 :::9100                 :::*                    LISTEN      104168/node_exporte 

curl {nodeIP}:9100/metrics
	# HELP go_gc_duration_seconds A summary of the GC invocation durations.
	# TYPE go_gc_duration_seconds summary
	go_gc_duration_seconds{quantile="0"} 0.000117217
	go_gc_duration_seconds{quantile="0.25"} 0.000159431
	go_gc_duration_seconds{quantile="0.5"} 0.000200323
	.............#省略很多

4. kube-state-metrics

kube-state-metrics收集Kubernetes集群内部资源对象的监控指标,资源对象包括daemonset,deployment,job,namespace,node,pvc,pod_container,pod,replicaset,service,statefulset。

  • kube-state-metrics部署
  1. 部署文件下载
    kube-state-metrics的部署定义文件,github下载地址:kube-state-metrics,当前最新版本1.5.0
mkdir kube-state-metrics
cd kube-state-metrics
wget https://github.com/kubernetes/kube-state-metrics/archive/v1.5.0.zip
unzip v1.5.0.zip
cd kube-state-metrics-1.5.0/kubernetes/
tree
├── kube-state-metrics-cluster-role-binding.yaml
├── kube-state-metrics-cluster-role.yaml
├── kube-state-metrics-deployment.yaml
├── kube-state-metrics-role-binding.yaml
├── kube-state-metrics-role.yaml
├── kube-state-metrics-service-account.yaml
└── kube-state-metrics-service.yaml
  1. 部署文件修改
    kube-state-metrics默认部署在kube-system命名空间下,如果需要部署在其他namespace下,可以进行定制修改。
    kube-state-metrics-deployment.yaml文件中引用的2个docker image,可以改成国内可以访问的镜像源
image: quay.io/coreos/kube-state-metrics:v1.5.0
修改为:
mirrorgooglecontainers/kube-state-metrics:v1.5.0

k8s.gcr.io/addon-resizer:1.8.3
修改为(最新版本):
mirrorgooglecontainers/addon-resizer:1.8.4
  1. 部署
kubectl apply -f ./
	clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics unchanged
	clusterrole.rbac.authorization.k8s.io/kube-state-metrics unchanged
	deployment.apps/kube-state-metrics created
	rolebinding.rbac.authorization.k8s.io/kube-state-metrics unchanged
	role.rbac.authorization.k8s.io/kube-state-metrics-resizer unchanged
	serviceaccount/kube-state-metrics unchanged
	service/kube-state-metrics unchanged
kubectl get -f ./
	NAME                                                              AGE
	clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics   70s
	
	NAME                                                       AGE
	clusterrole.rbac.authorization.k8s.io/kube-state-metrics   70s
	
	NAME                                 READY   UP-TO-DATE   AVAILABLE   AGE
	deployment.apps/kube-state-metrics   1/1     1            1           50s
	
	NAME                                                       AGE
	rolebinding.rbac.authorization.k8s.io/kube-state-metrics   70s
	
	NAME                                                        AGE
	role.rbac.authorization.k8s.io/kube-state-metrics-resizer   70s
	
	NAME                                SECRETS   AGE
	serviceaccount/kube-state-metrics   1         70s
	
	NAME                         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)             AGE
	service/kube-state-metrics   ClusterIP   10.106.107.13   <none>        8080/TCP,8081/TCP   70s
  • 通过kube-state-metrics获取metrics接口测试
kubectl get svc kube-state-metrics -n kube-system 
	NAME                 TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)             AGE
	kube-state-metrics   ClusterIP   10.106.107.13   <none>        8080/TCP,8081/TCP   5m33s

curl 10.106.107.13:8080/metrics
	# HELP kube_configmap_info Information about configmap.
	# TYPE kube_configmap_info gauge
	kube_configmap_info{namespace="kube-system",configmap="kubernetes-dashboard-settings"} 1
	kube_configmap_info{namespace="kube-system",configmap="prometheus-config"} 1
	kube_configmap_info{namespace="kube-system",configmap="kubeadm-config"} 1
	kube_configmap_info{namespace="kube-public",configmap="cluster-info"} 1
	kube_configmap_info{namespace="kube-system",configmap="calico-config"} 1
	kube_configmap_info{namespace="kube-system",configmap="coredns"} 1
	kube_configmap_info{namespace="kube-system",configmap="extension-apiserver-authentication"} 1
	kube_configmap_info{namespace="kube-system",configmap="kube-proxy"} 1
	kube_configmap_info{namespace="kube-system",configmap="kubelet-config-1.13"} 1
	
	... ... #省略很多

curl -I 10.106.107.13:8081/healthz
	HTTP/1.1 200 OK
	Date: Wed, 20 Feb 2019 02:20:10 GMT
	Content-Length: 264
	Content-Type: text/html; charset=utf-8

5. blackbox-exporter

blackbox-exporter是一个黑盒探测工具,可以对服务的http、tcp、icmp等进行网络探测。github地址 blackbox-exporter,当前最新版本:v0.13.0

  • 部署定义文件
cat blackbox-exporter-configmap.yaml

apiVersion: v1
kind: ConfigMap
metadata:
  labels:
    app: blackbox-exporter
  name: blackbox-exporter
  namespace: kube-system
data:
  blackbox.yml: |-
    modules:
      http_2xx:
        prober: http
        timeout: 10s
        http:
          valid_http_versions: ["HTTP/1.1", "HTTP/2"]
          valid_status_codes: []
          method: GET
          preferred_ip_protocol: "ip4"
      http_post_2xx: 
        prober: http
        timeout: 10s
        http:
          valid_http_versions: ["HTTP/1.1", "HTTP/2"]
          method: POST
          preferred_ip_protocol: "ip4"
      tcp_connect:
        prober: tcp
        timeout: 10s
      icmp:
        prober: icmp
        timeout: 10s
        icmp:
          preferred_ip_protocol: "ip4"
cat blackbox-exporter-deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: blackbox-exporter
  namespace: kube-system
spec:
  selector:
    matchLabels:
      app: blackbox-exporter
  replicas: 1
  template:
    metadata:
      labels:
        app: blackbox-exporter
    spec:
      restartPolicy: Always
      containers:
      - name: blackbox-exporter
        image: prom/blackbox-exporter:v0.13.0
        imagePullPolicy: IfNotPresent
        ports:
        - name: blackbox-port
          containerPort: 9115
        readinessProbe:
          tcpSocket:
            port: 9115
          initialDelaySeconds: 5
          timeoutSeconds: 5
        resources:
          requests:
            memory: 50Mi
            cpu: 100m
          limits:
            memory: 60Mi
            cpu: 200m
        volumeMounts:
        - name: config
          mountPath: /etc/blackbox_exporter
        args:
        - --config.file=/etc/blackbox_exporter/blackbox.yml
        - --log.level=debug
        - --web.listen-address=:9115
      volumes:
      - name: config
        configMap:
          name: blackbox-exporter
      nodeSelector:
        node-role.kubernetes.io/master: ""
      tolerations:
      - key: "node-role.kubernetes.io/master"
      	operator: "Equal"
      	value: ""
        effect: "NoSchedule"
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: blackbox-exporter
  name: blackbox-exporter
  namespace: kube-system
  annotations:
    prometheus.io/scrape: 'true'
spec:
  type: ClusterIP
  selector:
    app: blackbox-exporter
  ports:
  - name: blackbox
    port: 9115
    targetPort: 9115
    protocol: TCP
  • 部署
kubectl apply -f blackbox-exporter-configmap.yaml
	configmap/blackbox-exporter created

kubectl apply -f blackbox-exporter-deployment.yaml 
	deployment.apps/blackbox-exporter created
	service/blackbox-exporter created

kubectl get -f blackbox-exporter-configmap.yaml 
	NAME                DATA   AGE
	blackbox-exporter   1      3m12s

kubectl get -f blackbox-exporter-deployment.yaml 
	NAME                                READY   UP-TO-DATE   AVAILABLE   AGE
	deployment.apps/blackbox-exporter   1/1     1            1           69s

	NAME                        TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
	service/blackbox-exporter   ClusterIP   10.109.48.146   <none>        9115/TCP   69s
  • 测试验证
# 验证(1),验证服务示范正常
curl 10.109.48.146:9115
	<html>
	    <head><title>Blackbox Exporter</title></head>
	    <body>
	    <h1>Blackbox Exporter</h1>
	    <p><a href="/probe?target=prometheus.io&module=http_2xx">Probe prometheus.io for http_2xx</a></p>
	    <p><a href="/probe?target=prometheus.io&module=http_2xx&debug=true">Debug probe prometheus.io for http_2xx</a></p>
	    <p><a href="/metrics">Metrics</a></p>
	    <p><a href="/config">Configuration</a></p>
	    <h2>Recent Probes</h2>
	    <table border='1'><tr><th>Module</th><th>Target</th><th>Result</th><th>Debug</th></table></body>
    </html>

# 验证(2)验证tcp探测,以grafana举例
kubectl get svc -n kube-system  -l app=blackbox-exporter
	NAME                TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
	blackbox-exporter   ClusterIP   10.109.48.146   <none>        9115/TCP   27h
kubectl describe svc monitoring-grafana -n kube-system  
	Name:              monitoring-grafana
	Namespace:         kube-system
	Labels:            <none>
	Annotations:       kubectl.kubernetes.io/last-applied-configuration:
	                     {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{"prometheus.io/scrape":"true","prometheus.io/tcp-probe":"true","prometheus....
	                   prometheus.io/scrape: true
	                   prometheus.io/tcp-probe: true
	                   prometheus.io/tcp-probe-port: 80
	Selector:          k8s-app=grafana
	Type:              ClusterIP
	IP:                10.99.65.209
	Port:              grafana  80/TCP
	TargetPort:        3000/TCP
	Endpoints:         192.168.1.6:3000
	Session Affinity:  None
	Events:            <none>	

curl '10.109.48.146:9115/probe?module=tcp_connect&target=monitoring-grafana.kube-system:80'
	# HELP probe_dns_lookup_time_seconds Returns the time taken for probe dns lookup in seconds
	# TYPE probe_dns_lookup_time_seconds gauge
	probe_dns_lookup_time_seconds 0.002059111
	# HELP probe_duration_seconds Returns how long the probe took to complete in seconds
	# TYPE probe_duration_seconds gauge
	probe_duration_seconds 0.002815779
	# HELP probe_failed_due_to_regex Indicates if probe failed due to regex
	# TYPE probe_failed_due_to_regex gauge
	probe_failed_due_to_regex 0
	# HELP probe_ip_protocol Specifies whether probe ip protocol is IP4 or IP6
	# TYPE probe_ip_protocol gauge
	probe_ip_protocol 4
	# HELP probe_success Displays whether or not the probe was a success
	# TYPE probe_success gauge
	probe_success 1

到这里,监控Kubernetes集群的相关exporter已经配置完成,下一步就是部署prometheus收集这些exporter的监控指标。