1. 通过cAdvisor获取监控指标
cAdvisor是Kubernetes的生态中为容器监控数据采集的Agent,已经集成到Kubernetes中不需要单独部署了。
Kubernetes 1.7.3之前,cAdvisor的metrics数据集成在kubelet的metrics中,通过节点开放的4194端口获取数据
Kubernetes 1.7.3之后,cAdvisor的metrics被从kubelet的metrics独立出来了,在prometheus采集的时候变成两个scrape的job。网上很多文档记录都说在node节点会开放4194端口,可以通过该端口获取cAdvisor的metrics数据,新版本kubelet中的cadvisor没有对外开放4194端口,只能通过apiserver提供的api做代理获取监控指标metrics。
- cAdvisor收集的监控指标类型
cAdvisor能够获取当前节点上运行的所有容器的资源使用情况。监控指标key的前缀是container_*
container_cpu_*
container_fs_*
container_memeory_*
container_network_*
container_spec_*
container_last_seen,
container_scrape_error,
container_start_seconds,
container_tasks_state
- API
项目 | API | prometheus配置 | 备注 |
cAdvisor的metric | /api/v1/nodes/{node}/proxy/metrics/cadvisor | replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor |
- cAdvisor 获取metrics接口测试
kubectl get --raw "/api/v1/nodes/ejucsnode-shqs-1/proxy/metrics/cadvisor"
或者如下方式:
kubectl proxy --port=6080
curl http://localhost:6080/api/v1/nodes/ejucsnode-shqs-1/proxy/metrics/cadvisor
# HELP cadvisor_version_info A metric with a constant '1' value labeled by kernel version, OS version, docker version, cadvisor version & cadvisor revision.
# TYPE cadvisor_version_info gauge
cadvisor_version_info{cadvisorRevision="",cadvisorVersion="",dockerVersion="18.06.1-ce",kernelVersion="3.10.0-862.el7.x86_64",osVersion="CentOS Linux 7 (Core)"} 1
# HELP container_cpu_cfs_periods_total Number of elapsed enforcement period intervals.
# TYPE container_cpu_cfs_periods_total counter
container_cpu_cfs_periods_total{container_name="",id="/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-poda5666056_3349_11e9_a0fa_14187763656c.slice",image="",name="",namespace="kube-system",pod_name="prometheus-576b4fb6bb-4947p"} 849686
container_cpu_cfs_periods_total{container_name="prometheus",id="/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-poda5666056_3349_11e9_a0fa_14187763656c.slice/docker-58c340d3f0bbff761882a8a53127dc4e95b962514106748baa4a7dc01129962b.scope",image="sha256:690f4cf8dee25c239aa517a16dd73392ecd81485b29cad881d901a99b5b1a303",name="k8s_prometheus_prometheus-576b4fb6bb-4947p_kube-system_a5666056-3349-11e9-a0fa-14187763656c_0",namespace="kube-system",pod_name="prometheus-576b4fb6bb-4947p"} 849710
# HELP container_cpu_cfs_throttled_periods_total Number of throttled period intervals.
# TYPE container_cpu_cfs_throttled_periods_total counter
container_cpu_cfs_throttled_periods_total{container_name="",id="/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-poda5666056_3349_11e9_a0fa_14187763656c.slice",image="",name="",namespace="kube-system",pod_name="prometheus-576b4fb6bb-4947p"} 10576
container_cpu_cfs_throttled_periods_total{container_name="prometheus",id="/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-poda5666056_3349_11e9_a0fa_14187763656c.slice/docker-58c340d3f0bbff761882a8a53127dc4e95b962514106748baa4a7dc01129962b.scope",image="sha256:690f4cf8dee25c239aa517a16dd73392ecd81485b29cad881d901a99b5b1a303",name="k8s_prometheus_prometheus-576b4fb6bb-4947p_kube-system_a5666056-3349-11e9-a0fa-14187763656c_0",namespace="kube-system",pod_name="prometheus-576b4fb6bb-4947p"} 10266
# HELP container_cpu_cfs_throttled_seconds_total Total time duration the container has been throttled.
# TYPE container_cpu_cfs_throttled_seconds_total counter
container_cpu_cfs_throttled_seconds_total{container_name="",id="/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-poda5666056_3349_11e9_a0fa_14187763656c.slice",image="",name="",namespace="kube-system",pod_name="prometheus-576b4fb6bb-4947p"} 16523.995575912
container_cpu_cfs_throttled_seconds_total{container_name="prometheus",id="/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-poda5666056_3349_11e9_a0fa_14187763656c.slice/docker-58c340d3f0bbff761882a8a53127dc4e95b962514106748baa4a7dc01129962b.scope",image="sha256:690f4cf8dee25c239aa517a16dd73392ecd81485b29cad881d901a99b5b1a303",name="k8s_prometheus_prometheus-576b4fb6bb-4947p_kube-system_a5666056-3349-11e9-a0fa-14187763656c_0",namespace="kube-system",pod_name="prometheus-576b4fb6bb-4947p"} 10673.627579073
... ... ... ... #省略很多
2. 通过kubelet获取监控指标
- kubelet收集的监控指标类型
待补充 - API
项目 | API | prometheus配置 | 备注 |
kubelnet的metric | /api/v1/nodes/{node}/proxy/metrics | replacement: /api/v1/nodes/${1}/proxy/metrics |
- kubelet 获取metrics接口测试
kubectl proxy --port=6080
curl http://localhost:6080/api/v1/nodes/ejucsnode-shqs-1/proxy/metrics
# HELP apiserver_audit_event_total Counter of audit events generated and sent to the audit backend.
# TYPE apiserver_audit_event_total counter
apiserver_audit_event_total 0
# HELP apiserver_audit_requests_rejected_total Counter of apiserver requests rejected due to an error in audit logging backend.
# TYPE apiserver_audit_requests_rejected_total counter
apiserver_audit_requests_rejected_total 0
... ... ... ... #省略很多
# HELP apiserver_storage_data_key_generation_latencies_microseconds Latencies in microseconds of data encryption key(DEK) generation operations.
# TYPE apiserver_storage_data_key_generation_latencies_microseconds histogram
apiserver_storage_data_key_generation_latencies_microseconds_bucket{le="5"} 0
apiserver_storage_data_key_generation_latencies_microseconds_bucket{le="10"} 0
apiserver_storage_data_key_generation_latencies_microseconds_bucket{le="20"} 0
apiserver_storage_data_key_generation_latencies_microseconds_bucket{le="40"} 0
... ... ... ... #省略很多
3. node_exporter
Prometheus提供的NodeExporter项目可以提取主机节点的关键度量指标,通过Kubernetes的DeamonSet模式可以在各主机节点上部署一个NodeExporter实例,实现对主机性能指标数据的监控。
- 定义文件 prometheus-node-exporter-daemonset.yaml
cat prometheus-node-exporter-daemonset.yaml
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: prometheus-node-exporter
namespace: kube-system
labels:
app: prometheus-node-exporter
spec:
template:
metadata:
name: prometheus-node-exporter
labels:
app: prometheus-node-exporter
spec:
containers:
- image: prom/node-exporter:v0.17.0
imagePullPolicy: IfNotPresent
name: prometheus-node-exporter
ports:
- name: prom-node-exp
#^ must be an IANA_SVC_NAME (at most 15 characters, ..)
containerPort: 9100
hostPort: 9100
tolerations:
- key: "node-role.kubernetes.io/master"
effect: "NoSchedule"
hostNetwork: true
hostPID: true
hostIPC: true
restartPolicy: Always
---
apiVersion: v1
kind: Service
metadata:
annotations:
prometheus.io/scrape: 'true'
prometheus.io/app-metrics: 'true'
prometheus.io/app-metrics-path: '/metrics'
name: prometheus-node-exporter
namespace: kube-system
labels:
app: prometheus-node-exporter
spec:
clusterIP: None
ports:
- name: prometheus-node-exporter
port: 9100
protocol: TCP
selector:
app: prometheus-node-exporter
type: ClusterIP
- 配置指令
kubectl apply -f prometheus-node-exporter-daemonset.yaml
daemonset.extensions/prometheus-node-exporter created
service/prometheus-node-exporter created
kubectl get -f prometheus-node-exporter-daemonset.yaml
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.extensions/prometheus-node-exporter 6 6 6 6 6 <none> 5m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/prometheus-node-exporter ClusterIP None <none> 9100/TCP 5m
- 验证
任意主机节点上:
netstat -pltn |grep 9100
tcp6 0 0 :::9100 :::* LISTEN 104168/node_exporte
curl {nodeIP}:9100/metrics
# HELP go_gc_duration_seconds A summary of the GC invocation durations.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 0.000117217
go_gc_duration_seconds{quantile="0.25"} 0.000159431
go_gc_duration_seconds{quantile="0.5"} 0.000200323
.............#省略很多
4. kube-state-metrics
kube-state-metrics收集Kubernetes集群内部资源对象的监控指标,资源对象包括daemonset,deployment,job,namespace,node,pvc,pod_container,pod,replicaset,service,statefulset。
- kube-state-metrics部署
- 部署文件下载
kube-state-metrics的部署定义文件,github下载地址:kube-state-metrics,当前最新版本1.5.0
mkdir kube-state-metrics
cd kube-state-metrics
wget https://github.com/kubernetes/kube-state-metrics/archive/v1.5.0.zip
unzip v1.5.0.zip
cd kube-state-metrics-1.5.0/kubernetes/
tree
├── kube-state-metrics-cluster-role-binding.yaml
├── kube-state-metrics-cluster-role.yaml
├── kube-state-metrics-deployment.yaml
├── kube-state-metrics-role-binding.yaml
├── kube-state-metrics-role.yaml
├── kube-state-metrics-service-account.yaml
└── kube-state-metrics-service.yaml
- 部署文件修改
kube-state-metrics默认部署在kube-system命名空间下,如果需要部署在其他namespace下,可以进行定制修改。
kube-state-metrics-deployment.yaml文件中引用的2个docker image,可以改成国内可以访问的镜像源
image: quay.io/coreos/kube-state-metrics:v1.5.0
修改为:
mirrorgooglecontainers/kube-state-metrics:v1.5.0
k8s.gcr.io/addon-resizer:1.8.3
修改为(最新版本):
mirrorgooglecontainers/addon-resizer:1.8.4
- 部署
kubectl apply -f ./
clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics unchanged
clusterrole.rbac.authorization.k8s.io/kube-state-metrics unchanged
deployment.apps/kube-state-metrics created
rolebinding.rbac.authorization.k8s.io/kube-state-metrics unchanged
role.rbac.authorization.k8s.io/kube-state-metrics-resizer unchanged
serviceaccount/kube-state-metrics unchanged
service/kube-state-metrics unchanged
kubectl get -f ./
NAME AGE
clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics 70s
NAME AGE
clusterrole.rbac.authorization.k8s.io/kube-state-metrics 70s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/kube-state-metrics 1/1 1 1 50s
NAME AGE
rolebinding.rbac.authorization.k8s.io/kube-state-metrics 70s
NAME AGE
role.rbac.authorization.k8s.io/kube-state-metrics-resizer 70s
NAME SECRETS AGE
serviceaccount/kube-state-metrics 1 70s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kube-state-metrics ClusterIP 10.106.107.13 <none> 8080/TCP,8081/TCP 70s
- 通过kube-state-metrics获取metrics接口测试
kubectl get svc kube-state-metrics -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-state-metrics ClusterIP 10.106.107.13 <none> 8080/TCP,8081/TCP 5m33s
curl 10.106.107.13:8080/metrics
# HELP kube_configmap_info Information about configmap.
# TYPE kube_configmap_info gauge
kube_configmap_info{namespace="kube-system",configmap="kubernetes-dashboard-settings"} 1
kube_configmap_info{namespace="kube-system",configmap="prometheus-config"} 1
kube_configmap_info{namespace="kube-system",configmap="kubeadm-config"} 1
kube_configmap_info{namespace="kube-public",configmap="cluster-info"} 1
kube_configmap_info{namespace="kube-system",configmap="calico-config"} 1
kube_configmap_info{namespace="kube-system",configmap="coredns"} 1
kube_configmap_info{namespace="kube-system",configmap="extension-apiserver-authentication"} 1
kube_configmap_info{namespace="kube-system",configmap="kube-proxy"} 1
kube_configmap_info{namespace="kube-system",configmap="kubelet-config-1.13"} 1
... ... #省略很多
curl -I 10.106.107.13:8081/healthz
HTTP/1.1 200 OK
Date: Wed, 20 Feb 2019 02:20:10 GMT
Content-Length: 264
Content-Type: text/html; charset=utf-8
5. blackbox-exporter
blackbox-exporter是一个黑盒探测工具,可以对服务的http、tcp、icmp等进行网络探测。github地址 blackbox-exporter,当前最新版本:v0.13.0
- 部署定义文件
cat blackbox-exporter-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
labels:
app: blackbox-exporter
name: blackbox-exporter
namespace: kube-system
data:
blackbox.yml: |-
modules:
http_2xx:
prober: http
timeout: 10s
http:
valid_http_versions: ["HTTP/1.1", "HTTP/2"]
valid_status_codes: []
method: GET
preferred_ip_protocol: "ip4"
http_post_2xx:
prober: http
timeout: 10s
http:
valid_http_versions: ["HTTP/1.1", "HTTP/2"]
method: POST
preferred_ip_protocol: "ip4"
tcp_connect:
prober: tcp
timeout: 10s
icmp:
prober: icmp
timeout: 10s
icmp:
preferred_ip_protocol: "ip4"
cat blackbox-exporter-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: blackbox-exporter
namespace: kube-system
spec:
selector:
matchLabels:
app: blackbox-exporter
replicas: 1
template:
metadata:
labels:
app: blackbox-exporter
spec:
restartPolicy: Always
containers:
- name: blackbox-exporter
image: prom/blackbox-exporter:v0.13.0
imagePullPolicy: IfNotPresent
ports:
- name: blackbox-port
containerPort: 9115
readinessProbe:
tcpSocket:
port: 9115
initialDelaySeconds: 5
timeoutSeconds: 5
resources:
requests:
memory: 50Mi
cpu: 100m
limits:
memory: 60Mi
cpu: 200m
volumeMounts:
- name: config
mountPath: /etc/blackbox_exporter
args:
- --config.file=/etc/blackbox_exporter/blackbox.yml
- --log.level=debug
- --web.listen-address=:9115
volumes:
- name: config
configMap:
name: blackbox-exporter
nodeSelector:
node-role.kubernetes.io/master: ""
tolerations:
- key: "node-role.kubernetes.io/master"
operator: "Equal"
value: ""
effect: "NoSchedule"
---
apiVersion: v1
kind: Service
metadata:
labels:
app: blackbox-exporter
name: blackbox-exporter
namespace: kube-system
annotations:
prometheus.io/scrape: 'true'
spec:
type: ClusterIP
selector:
app: blackbox-exporter
ports:
- name: blackbox
port: 9115
targetPort: 9115
protocol: TCP
- 部署
kubectl apply -f blackbox-exporter-configmap.yaml
configmap/blackbox-exporter created
kubectl apply -f blackbox-exporter-deployment.yaml
deployment.apps/blackbox-exporter created
service/blackbox-exporter created
kubectl get -f blackbox-exporter-configmap.yaml
NAME DATA AGE
blackbox-exporter 1 3m12s
kubectl get -f blackbox-exporter-deployment.yaml
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/blackbox-exporter 1/1 1 1 69s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/blackbox-exporter ClusterIP 10.109.48.146 <none> 9115/TCP 69s
- 测试验证
# 验证(1),验证服务示范正常
curl 10.109.48.146:9115
<html>
<head><title>Blackbox Exporter</title></head>
<body>
<h1>Blackbox Exporter</h1>
<p><a href="/probe?target=prometheus.io&module=http_2xx">Probe prometheus.io for http_2xx</a></p>
<p><a href="/probe?target=prometheus.io&module=http_2xx&debug=true">Debug probe prometheus.io for http_2xx</a></p>
<p><a href="/metrics">Metrics</a></p>
<p><a href="/config">Configuration</a></p>
<h2>Recent Probes</h2>
<table border='1'><tr><th>Module</th><th>Target</th><th>Result</th><th>Debug</th></table></body>
</html>
# 验证(2)验证tcp探测,以grafana举例
kubectl get svc -n kube-system -l app=blackbox-exporter
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
blackbox-exporter ClusterIP 10.109.48.146 <none> 9115/TCP 27h
kubectl describe svc monitoring-grafana -n kube-system
Name: monitoring-grafana
Namespace: kube-system
Labels: <none>
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"v1","kind":"Service","metadata":{"annotations":{"prometheus.io/scrape":"true","prometheus.io/tcp-probe":"true","prometheus....
prometheus.io/scrape: true
prometheus.io/tcp-probe: true
prometheus.io/tcp-probe-port: 80
Selector: k8s-app=grafana
Type: ClusterIP
IP: 10.99.65.209
Port: grafana 80/TCP
TargetPort: 3000/TCP
Endpoints: 192.168.1.6:3000
Session Affinity: None
Events: <none>
curl '10.109.48.146:9115/probe?module=tcp_connect&target=monitoring-grafana.kube-system:80'
# HELP probe_dns_lookup_time_seconds Returns the time taken for probe dns lookup in seconds
# TYPE probe_dns_lookup_time_seconds gauge
probe_dns_lookup_time_seconds 0.002059111
# HELP probe_duration_seconds Returns how long the probe took to complete in seconds
# TYPE probe_duration_seconds gauge
probe_duration_seconds 0.002815779
# HELP probe_failed_due_to_regex Indicates if probe failed due to regex
# TYPE probe_failed_due_to_regex gauge
probe_failed_due_to_regex 0
# HELP probe_ip_protocol Specifies whether probe ip protocol is IP4 or IP6
# TYPE probe_ip_protocol gauge
probe_ip_protocol 4
# HELP probe_success Displays whether or not the probe was a success
# TYPE probe_success gauge
probe_success 1
到这里,监控Kubernetes集群的相关exporter已经配置完成,下一步就是部署prometheus收集这些exporter的监控指标。