(一)、探针概述

探针是由kubelet 对容器执行的定期诊断,并不是由 Master 节点发起的探测,而是由每一个 Node 所在的 kubelet 进行探测,这样可以减轻 Master 节点系统负载压力。
kubelet 要执行诊断,kubelet 调用由容器实现的 Handler (处理程序)。一共有三种类型的处理程序:

  • ExecAction: 在容器内执行指定命令。如果命令退出时返回码为 0 则认为诊断成功,非 0 都属于不成功。
  • TCPSocketAction: 对容器的 IP 地址上的指定端口执行 TCP 检查。如果端口打开,则诊断被认为是成功的。
  • HTTPGetAction: 对容器的 IP 地址上指定端口和路径执行 HTTP Get 请求。如果响应的状态码大于等于 200 且小于 400,则诊断被认为是成功的。

每次探测都将获得以下三种结果之一:

Success(成功):容器通过了诊断。
Failure(失败):容器未通过诊断。
Unknown(未知):诊断失败,因此不会采取任何行动。

探针有两种方式:

  • liveness Probe:是一个存活探测器,会随着 Pod 生命周期一直循环检测。探测容器是否正在运行,如果存活探测失败,则 kubelet 会杀死容器,然后根据其容器的重启策略来对容器进行操作,默认为 Always,则 Pod 进行重启。
  • readiness Probe:是一个就绪探测器,检测 Pod 是否到达就绪状态,只有达到就绪状态了才可以对外提供服务,当一个 Pod 内的所有容器都准备好了,才能把这个 Pod 看作就绪了. 应用场景:就绪探针可以应用在 Service 后端的 Pod 探测上,在 Pod 还没有准备好的时候,会从 Service 的负载均衡器中剔除。

(二)探针实战

2.1、就绪探针-HTTP Get

(1)、创建yaml语句和创建pod

[root@k8s-master probe]# kubectl apply -f  readiness-probe.yml
pod/readiness-httpget created
[root@k8s-master probe]# cat readiness-probe.yml
apiVersion: v1
kind: Pod
metadata:
  name: readiness-httpget
  namespace: test
spec:
  containers:
  - name: readiness-httpget-container
    image: nginx
    imagePullPolicy: IfNotPresent
    readinessProbe:
      httpGet:
        path: /index1.html
        port: 80
      initialDelaySeconds: 3
      periodSeconds: 3
[root@k8s-master probe]# kubectl get pod -ntest
NAME                READY   STATUS    RESTARTS   AGE
readiness-httpget   1/1     Running   0          51s

(2)、查看pod及详细信息

[root@k8s-master probe]# kubectl get pod -ntest -owide
NAME                READY   STATUS    RESTARTS   AGE     IP            NODE         NOMINATED NODE   READINESS GATES
readiness-httpget   0/1     Running   0          4m24s   10.244.1.91   k8s-node01   <none>           <none>
[root@k8s-master probe]# curl 10.244.1.91/index1.heml
<html>
<head><title>404 Not Found</title></head>
<body>
<center>404 Not Found</center>
<hr><center>nginx/1.21.4</center>
</body>
[root@k8s-master probe]# kubectl describe pod/readiness-httpget -ntest
Name:         readiness-httpget
Namespace:    test
Priority:     0
Node:         k8s-node01/192.168.41.211
Start Time:   Thu, 20 Jan 2022 03:18:27 -0500
Labels:       <none>
Annotations:  <none>
Status:       Running
IP:           10.244.1.91
IPs:
  IP:  10.244.1.91
Containers:
  readiness-httpget-container:
    Container ID:   docker://12fe35cb83a09da564159e6c7916fc3625930f5e7edeace693a5682bcded9a59
    Image:          nginx
    Image ID:       docker-pullable://nginx@sha256:366e9f1ddebdb844044c2fafd13b75271a9f620819370f8971220c2b330a9254
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Thu, 20 Jan 2022 03:18:28 -0500
    Ready:          False
    Restart Count:  0
    Readiness:      http-get http://:80/index1.html delay=3s timeout=1s period=3s #success=1 #failure=3
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-8r24v (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  kube-api-access-8r24v:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  5m41s                  default-scheduler  Successfully assigned test/readiness-httpget to k8s-node01
  Normal   Pulled     5m41s                  kubelet            Container image "nginx" already present on machine
  Normal   Created    5m41s                  kubelet            Created container readiness-httpget-container
  Normal   Started    5m40s                  kubelet            Started container readiness-httpget-container
  Warning  Unhealthy  38s (x104 over 5m35s)  kubelet            Readiness probe failed: HTTP probe failed with statuscode: 404

(3)、进入pod创建index1.html页面

[root@k8s-master probe]# kubectl exec -it pod/readiness-httpget -n test /bin/sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
# cd /usr/share/nginx/html
# ls
50x.html  index.html
# echo "hello world" >> index1.html
# cat index1.html
hello world
# exit

(4)、再次查看pod的状态和页面

[root@k8s-master probe]# kubectl get pod -ntest -owide
NAME                READY   STATUS    RESTARTS   AGE   IP            NODE         NOMINATED NODE   READINESS GATES
readiness-httpget   1/1     Running   0          12m   10.244.1.91   k8s-node01   <none>           <none>
[root@k8s-master probe]# curl 10.244.1.91/index1.html
hello world

2.2、存活探针-TCP

(1)、yaml相关的语句和创建pod

[root@k8s-master probe]# cat liveness-top.yml
apiVersion: v1
kind: Pod
metadata:
  name: liveness-tcp
  namespace: test
spec:
  containers:
  - name: liveness-tcp-container
    image: nginx
    imagePullPolicy: IfNotPresent
    readinessProbe:
      tcpSocket:
        port: 8080
      initialDelaySeconds: 3
      periodSeconds: 3
[root@k8s-master probe]# kubectl apply -f liveness-top.yml
pod/liveness-tcp created
[root@k8s-master probe]# kubectl get pod -ntest -owide
NAME                READY   STATUS    RESTARTS   AGE   IP            NODE         NOMINATED NODE   READINESS GATES
liveness-tcp        0/1     Running   0          13s   10.244.1.92   k8s-node01   <none>           <none>

(2)、查看pod的详细信息

[root@k8s-master probe]# kubectl describe pod/liveness-tcp -ntest
Name:         liveness-tcp
Namespace:    test
Priority:     0
Node:         k8s-node01/192.168.41.211
Start Time:   Thu, 20 Jan 2022 04:05:48 -0500
Labels:       <none>
Annotations:  <none>
Status:       Running
IP:           10.244.1.92
IPs:
  IP:  10.244.1.92
Containers:
  liveness-tcp-container:
    Container ID:   docker://86e6f23e1ad66ab35158ebaa1dfbcf8a48b4afed99a52dfb8616436570592971
    Image:          nginx
    Image ID:       docker-pullable://nginx@sha256:366e9f1ddebdb844044c2fafd13b75271a9f620819370f8971220c2b330a9254
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Thu, 20 Jan 2022 04:05:49 -0500
    Ready:          False
    Restart Count:  0
    Readiness:      tcp-socket :8080 delay=3s timeout=1s period=3s #success=1 #failure=3
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-z2k8t (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  kube-api-access-z2k8t:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                 From               Message
  ----     ------     ----                ----               -------
  Normal   Scheduled  96s                 default-scheduler  Successfully assigned test/liveness-tcp to k8s-node01
  Normal   Pulled     96s                 kubelet            Container image "nginx" already present on machine
  Normal   Created    96s                 kubelet            Created container liveness-tcp-container
  Normal   Started    96s                 kubelet            Started container liveness-tcp-container
  Warning  Unhealthy  31s (x22 over 91s)  kubelet            Readiness probe failed: dial tcp 10.244.1.92:8080: connect: connection refused

2.3、存活探针-HTTPGet

(1)、liveness-httpget.yml

[root@k8s-master probe]# cat  liveness-httpget.yml
apiVersion: v1
kind: Pod
metadata:
  name: liveness-httpget
  namespace: test
spec:
  containers:
  - name: liveness-tcp-container
    image: nginx
    imagePullPolicy: IfNotPresent
    ports:
      - name: http
        containerPort: 80
    livenessProbe:
      httpGet:
        port: http
        path: /index.html
      initialDelaySeconds: 3
      periodSeconds: 3
[root@k8s-master probe]# kubectl apply -f liveness-httpget.yml
pod/liveness-httpget created
[root@k8s-master probe]# kubectl get pod -ntest
NAME                READY   STATUS    RESTARTS   AGE
liveness-httpget    1/1     Running   0          3s

(2)、将pod容器里的index.html文件删除,查看效果

[root@k8s-master probe]# kubectl exec -it pod/liveness-httpget -ntest -- rm -rf /usr/share/nginx/html/index.html
[root@k8s-master probe]# kubectl get pod -ntest -owide
NAME                READY   STATUS    RESTARTS      AGE    IP            NODE         NOMINATED NODE   READINESS GATES
liveness-httpget    1/1     Running   1 (14s ago)   6m2s   10.244.1.93   k8s-node01   <none>           <none>

重启次数变为1,由于liveness没有检测到index.html文件则会删除掉主容器,主容器一旦被删除pod也会进行重启。

2.4、存活探针-exec

(1)、liveness-exec.yml文件及pod的创建

[root@k8s-master probe]# cat  liveness-exec.yml
apiVersion: v1
kind: Pod
metadata:
name: liveness-exec
namespace: test
spec:
containers:
- name: liveness-exec-container
image: nginx
imagePullPolicy: IfNotPresent
command: ["/bin/sh","-c","touch /tmp/live; sleep 60; rm -rf /tmp/live; sleep 3600"]
livenessProbe:
exec:
command: ["test","e","/tmp/live"]
initialDelaySeconds: 3
periodSeconds: 3
[root@k8s-master probe]# kubectl apply -f liveness-exec.yml
pod/liveness-exec created

(2)、查看pod的状态,加上-w来进行实时查看

[root@k8s-master probe]# kubectl get pod -ntest -w
NAME                READY   STATUS    RESTARTS      AGE
liveness-exec       1/1     Running   2 (27s ago)   112s

从上边可以看到RESTARTS已经重启了2次了,他还会循环重启,由于重新创建pod的时候/tmp/live就会被创建,单1分钟后还会被删除。

2.5、就绪探针+存活探针 --HTTPGet

(1)、live-read-httpget.yml文件和创建pod

[root@k8s-master probe]# cat  liv-read-httpget.yml
apiVersion: v1
kind: Pod
metadata:
name: live-read-httpget
namespace: test
spec:
containers:
- name: liveness-exec-container
image: nginx
imagePullPolicy: IfNotPresent
ports:
- name: http
containerPort: 80
readinessProbe:
httpGet:
path: /index1.html
port: 80
initialDelaySeconds: 3
periodSeconds: 3
livenessProbe:
httpGet:
port: http
path: /index.html
initialDelaySeconds: 3
periodSeconds: 3
[root@k8s-master probe]# kubectl apply -f liv-read-httpget.yml
pod/live-read-httpget created
[root@k8s-master probe]# kubectl get pod -ntest
NAME                READY   STATUS             RESTARTS      AGE
live-read-httpget   0/1     Running            0             9s

虽然是running状态,但是还没read,因为就绪探针没有检测到index.html文件的存在。

(2)、进入容器手动创建测试

[root@k8s-master probe]# kubectl exec -it pod/live-read-httpget -ntest  -- touch /usr/share/nginx/html/index.html
[root@k8s-master probe]# kubectl exec -it pod/live-read-httpget -ntest  -- touch /usr/share/nginx/html/index1.html
[root@k8s-master probe]# kubectl get pod -ntest
NAME                READY   STATUS    RESTARTS      AGE
live-read-httpget   1/1     Running   0             5m53s

创建完成后,再次查看pod的状态,现在已经变为read了

(3)、再次测试liveness探针,删除index.html文件

[root@k8s-master probe]# kubectl exec -it pod/live-read-httpget -ntest  -- rm -rf  /usr/share/nginx/html/index.html
[root@k8s-master probe]# kubectl get pod -ntest
NAME                READY   STATUS             RESTARTS        AGE
live-read-httpget   0/1     Running            1 (41s ago)     8m56s

发下read的状态变为0/1并重启了1次

(三)、相关配置

使用启动探测器保护慢启动容器
有时候,会有一些现有的应用程序在启动时需要较多的初始化时间。 要不影响对引起探测死锁的快速响应,这种情况下,设置存活探测参数是要技巧的。 技巧就是使用一个命令来设置启动探测,针对HTTP 或者 TCP 检测,可以通过设置 failureThreshold * periodSeconds 参数来保证有足够长的时间应对糟糕情况下的启动时间。

startupProbe:
httpGet:
path: /test
port: liveness-port
failureThreshold: 30
periodSeconds: 10

应用程序可以有最多分钟的时间来完成它的启动。

还有很多配置字段,可以使用这些字段精确的控制存活和就绪检测的行为:

- initialDelaySeconds:容器启动后要等待多少秒后存活和就绪探测器才被初始化,默认是 0 秒,最小值是 0。
- periodSeconds:执行探测的时间间隔(单位是秒)。默认是 10 秒。最小值是 1。
- timeoutSeconds:探测的超时后等待多少秒。默认值是 1 秒。最小值是 1。
- successThreshold:探测器在失败后,被视为成功的最小连续成功数。默认值是 1。 存活和启动探测的这个值必须是 1。最小值是 1。
- failureThreshold:当探测失败时,Kubernetes 的重试次数。 存活探测情况下的放弃就意味着重新启动容器。 就绪探测情况下的放弃 Pod 会被打上未就绪的标签。默认值是 3。最小值是 1。