pod通过两类探针来检查容器的健康状态。分别是LivenessProbe(存活性探测)和ReadinessProbe(就绪型探测)
- LivenessProbe探针(存活性探测)
用于判断容器是否健康(Running状态)并反馈给kubelet。
有不少应用程序长时间持续运行后会逐渐转为不可用的状态,并且仅能通过重启操作恢复,kubernetes的容器存活性探测机制可发现诸如此类问题,并依据探测结果结合重启策略触发后的行为。
存活性探测是隶属于容器级别的配置,kubelet可基于它判定何时需要重启一个容器。
如果一个容器不包含LivenessProbe探针,那么kubelet认为该容器的LivenessProbe探针返回的值永远是Success。 - ReadinessProbe探针(就绪型探测)
用于判断容器服务是否可用(Ready状态),达到Ready状态的Pod才可以接收请求。
对于被Service管理的Pod,Service与Pod Endpoint的关联关系也将基于Pod是否Ready进行设置。
如果在运行过程中Ready状态变为False,则系统自动将其从Service的后端Endpoint列表中隔离出去,后续再把恢复到Ready状态的Pod加回后端Endpoint列表。
这样就能保证客户端在访问Service时不会被转发到服务不可用的Pod示例上。
探针的实现方式
LivenessProbe和ReadinessProbe均可配置以下三种探针实现方式:
- ExecAction
通过在目标容器中执行由用户自定义的命令来判定容器的健康状态,即在容器内部执行一个命令,如果该命令的返回码为0,则表明容器健康。
在下面的例子中,通过执行“cat /tmp/health” 命令来判断一个容器运行十分正常。
在该pod运行后。将在创建/tmp/health 文件10s后删除该文件。
LivenessProbe健康检查的初始探测时间(initialDelaySeconds)为15s,探测结果为Fail,将导致kubelet 杀掉该容器并重启它:
vim liveness-exec.yaml
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-exec
spec:
containers:
- name: liveness
image: busybox
args: ["/bin/sh","-c","echo ok > /tmp/health; sleep 10; rm -rf /tmp/health; sleep 600"]
livenessProbe:
exec:
command: ["cat","/tmp/health"]
initialDelaySeconds: 15
timeoutSeconds: 1
创建pod:
[root@bogon ~]# kubectl create -f liveness-exec.yaml
pod/liveness-exec created
[root@bogon ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
dapi-test-pod 0/1 Completed 0 8h
dapi-test-pod-container-vars 1/1 Running 0 7h8m
dapi-test-pod-volume 1/1 Running 0 4h49m
liveness-exec 1/1 Running 0 7s
查看详情
kubectl describe pods/liveness-exec
会发现restart 字样
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/liveness-exec to server01
Normal Pulled 8s (x3 over 2m31s) kubelet, server01 Successfully pulled image "busybox"
Normal Created 8s (x3 over 2m31s) kubelet, server01 Created container liveness
Normal Started 8s (x3 over 2m31s) kubelet, server01 Started container liveness
Warning Unhealthy <invalid> (x9 over 2m11s) kubelet, server01 Liveness probe failed: cat: can't open '/tmp/health': No such file or directory
Normal Killing <invalid> (x3 over 112s) kubelet, ****server01 Container liveness failed liveness probe, will be restarted****
Normal Pulling <invalid> (x4 over 2m36s) kubelet, server01 Pulling image "busybox"
- TCPSocketAction
通过容器的IP地址和端口号进行TCP检查,如果能够建立TCP连接,则表明容器健康。
在下面的例子中,通过与容器内的localhost:80端口建立tcp连接进行健康检查:
vim pod-with-healthcheck.yaml
apiVersion: v1
kind: Pod
metadata:
name: pod-with-healthcheck
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
livenessProbe:
tcpSocket:
port: 80
initialDelaySeconds: 30
timeoutSeconds: 1
创建pod
[root@bogon ~]# kubectl create -f pod-with-healthcheck.yaml
[root@bogon ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
pod-with-healthcheck 1/1 Running 0 45s
查看详情:
[root@bogon ~]# kubectl describe pod pod-with-healthcheck
Containers:
nginx:
Container ID: docker://19f987e8146d926fd28e04d1fa9677d60cb80f20e42a36fc20fba346412113c5
Image: nginx
Image ID: docker-pullable://nginx@sha256:a93c8a0b0974c967aebe868a186e5c205f4d3bcb5423a56559f2f9599074bbcd
Port: 80/TCP
Host Port: 0/TCP
State: Running
Started: Sat, 27 Jun 2020 21:11:23 +0800
Ready: True
Restart Count: 0
Liveness: tcp-socket :80 delay=30s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-btm7g (ro)
- HTTPGetAction
通过容器的ip地址,端口号及路径调用HTTPGet方法,如果响应的状态码大于等于200且小于400,则认为容器健康
在下面的例子中,kubelet 定时发送HTTP请求到localhost:80/_status/healthz来进行容器应用的监控检查:
cat pod-http-get-action.yaml
apiVersion: v1
kind: Pod
metadata:
name: pod-http-get-action
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
livenessProbe:
httpGet:
path: /_status/healthz
port: 80
initialDelaySeconds: 30
timeoutSeconds: 1
查看详情
kubectl describe pod pod-http-get-action
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/pod-http-get-action to server01
Normal Pulling <invalid> (x2 over 30s) kubelet, server01 Pulling image "nginx"
Normal Killing <invalid> kubelet, server01 Container nginx failed liveness probe, will be restarted
Normal Pulled <invalid> (x2 over 29s) kubelet, server01 Successfully pulled image "nginx"
Normal Created <invalid> (x2 over 29s) kubelet, server01 Created container nginx
Normal Started <invalid> (x2 over 28s) kubelet, server01 Started container nginx
Warning Unhealthy <invalid> (x4 over <invalid>) kubelet, server01 Liveness probe failed: HTTP probe failed with statuscode: 404
该pod 已经重启了3次
[root@bogon ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
pod-http-get-action 1/1 Running 3 3m28s
对于每种探测方式,都需要设置 initialDelaySeconds 和timeoutSeconds 两个参数。他们的含义分别是:
- initialDelaySeconds: 启动容器后进行首次监控检查的等待时间,单位为s.
- timeoutSeconds:健康检查发送请求后等待响应的超时时间,单位为s.
当超时发生时,kubelet会认为容器已经无法提供服务,将会重启该容器。