Pod资源对象

精选原创

SupperXue 2022-11-11 14:57:57 博主文章分类：kubernetes ©著作权

文章标签 pod资源清单 文章分类 运维

©著作权归作者所有：来自51CTO博客作者SupperXue的原创作品，请联系作者获取转载授权，否则将追究法律责任

在我们创建 pod 的时候，可能会用 yaml 文件来创建，yaml 文件里的内容就叫做资源清单

pod 的结构如下

Pod资源对象_pod资源清单

每个 pod 都可以包含一个或者多个容器，这些容器可以分为两类

用户程序所在的容器，数量可多可少
pause 容器：这是每个 pod 都会有的一个根容器，它的作用有两个

可以以它为依据，评估整个 pod 的健康状态
可以在根容器上设置 ip 地址，其他容器都使用此 ip（pod ip），以实现 pod 内部的网络通信

1.pod 定义yaml

下面是Pod的资源清单：

apiVersion: v1     #必选，版本号，例如v1
kind: Pod       　 #必选，资源类型，例如 Pod
metadata:       　 #必选，元数据
  name: string     #必选，Pod名称
  namespace: string  #Pod所属的命名空间,默认为"default"
  labels:       　　  #自定义标签列表
    - name: string      　          
spec:  #必选，Pod中容器的详细定义
  containers:  #必选，Pod中容器列表
  - name: string   #必选，容器名称
    image: string  #必选，容器的镜像名称
    imagePullPolicy: [ Always|Never|IfNotPresent ]  #获取镜像的策略 
    command: [string]   #容器的启动命令列表，如不指定，使用打包时使用的启动命令
    args: [string]      #容器的启动命令参数列表
    workingDir: string  #容器的工作目录
    volumeMounts:       #挂载到容器内部的存储卷配置
    - name: string      #引用pod定义的共享存储卷的名称，需用volumes[]部分定义的的卷名
      mountPath: string #存储卷在容器内mount的绝对路径，应少于512字符
      readOnly: boolean #是否为只读模式
    ports: #需要暴露的端口库号列表
    - name: string        #端口的名称
      containerPort: int  #容器需要监听的端口号
      hostPort: int       #容器所在主机需要监听的端口号，默认与Container相同
      protocol: string    #端口协议，支持TCP和UDP，默认TCP
    env:   #容器运行前需设置的环境变量列表
    - name: string  #环境变量名称
      value: string #环境变量的值
    resources: #资源限制和请求的设置
      limits:  #资源限制的设置
        cpu: string     #Cpu的限制，单位为core数，将用于docker run --cpu-shares参数
        memory: string  #内存限制，单位可以为Mib/Gib，将用于docker run --memory参数
      requests: #资源请求的设置
        cpu: string    #Cpu请求，容器启动的初始可用数量
        memory: string #内存请求,容器启动的初始可用数量
    lifecycle: #生命周期钩子
        postStart: #容器启动后立即执行此钩子,如果执行失败,会根据重启策略进行重启
        preStop: #容器终止前执行此钩子,无论结果如何,容器都会终止
    livenessProbe:  #对Pod内各容器健康检查的设置，当探测无响应几次后将自动重启该容器
      exec:       　 #对Pod容器内检查方式设置为exec方式
        command: [string]  #exec方式需要制定的命令或脚本
      httpGet:       #对Pod内个容器健康检查方法设置为HttpGet，需要制定Path、port
        path: string
        port: number
        host: string
        scheme: string
        HttpHeaders:
        - name: string
          value: string
      tcpSocket:     #对Pod内个容器健康检查方式设置为tcpSocket方式
         port: number
       initialDelaySeconds: 0       #容器启动完成后首次探测的时间，单位为秒
       timeoutSeconds: 0    　　    #对容器健康检查探测等待响应的超时时间，单位秒，默认1秒
       periodSeconds: 0     　　    #对容器监控检查的定期探测时间设置，单位秒，默认10秒一次
       successThreshold: 0
       failureThreshold: 0
       securityContext:
         privileged: false
  restartPolicy: [Always | Never | OnFailure]  #Pod的重启策略
  nodeName: <string> #设置NodeName表示将该Pod调度到指定到名称的node节点上
  nodeSelector: obeject #设置NodeSelector表示将该Pod调度到包含这个label的node上
  imagePullSecrets: #Pull镜像时使用的secret名称，以key：secretkey格式指定
  - name: string
  hostNetwork: false   #是否使用主机网络模式，默认为false，如果设置为true，表示使用宿主机网络
  volumes:   #在该pod上定义共享存储卷列表
  - name: string    #共享存储卷名称 （volumes类型有很多种）
    emptyDir: {}       #类型为emtyDir的存储卷，与Pod同生命周期的一个临时目录。为空值
    hostPath: string   #类型为hostPath的存储卷，表示挂载Pod所在宿主机的目录
      path: string      　　        #Pod所在宿主机的目录，将被用于同期中mount的目录
    secret:       　　　#类型为secret的存储卷，挂载集群与定义的secret对象到容器内部
      scretname: string  
      items:     
      - key: string
        path: string
    configMap:         #类型为configMap的存储卷，挂载预定义的configMap对象到容器内部
      name: string
      items:
      - key: string
        path: string

在这里，可通过一个命令来查看每种资源的可配置项 kubectl explain 资源类型.属性查看属性的子属性

# 查看 pod 的一级资源清单，内容过多，已省略
[root@dce-10-6-215-215 ~]# kubectl explain pod
KIND:     Pod
VERSION:  v1

DESCRIPTION:
     Pod is a collection of containers that can run on a host. This resource is
     created by clients and scheduled onto hosts.

FIELDS:
   apiVersion    <string>

# 查看 pod的 spec 下 的 containers 下有哪些字段，内容过多，已省略
[root@dce-10-6-215-215 ~]# kubectl explain pod.spec.containers
KIND:     Pod
VERSION:  v1

RESOURCE: containers <[]Object>

DESCRIPTION:
     List of containers belonging to the pod. Containers cannot currently be
     added or removed. There must be at least one container in a Pod. Cannot be
     updated.

     A single application container that you want to run within a pod.

FIELDS:
   args    <[]string>
     Arguments to the entrypoint. The docker image's CMD is used if this is not
     provided. Variable references $(VAR_NAME) are expanded using the

在 kubernetes 中基本所有资源的一级属性都是一样的，主要包含5部分：

apiVersion 版本，由 kubernetes 内部定义，版本号必须可以用 kubectl api-versions 查询到
kind 类型，由 kubernetes 内部定义，类型可以用 kubectl api-resources 查询到
metadata 元数据，主要是资源标识和说明，常用的有 name、namespace、labels 等
spec 描述，这是配置中最重要的一部分，里面是对各种资源配置的详细描述
status 状态信息，里面的内容不需要定义，由 kubernetes 自动生成

在上面的属性中，spec 是接下来研究的重点，继续看下它的常见子属性:

containers <[]Object> 容器列表，用于定义容器的详细信息
nodeName 根据 nodeName 的值将 pod 调度到指定的 Node 节点上
nodeSelector <map[]> 根据NodeSelector中定义的信息选择将该 Pod 调度到包含这些 label 的 Node 上
hostNetwork 是否使用主机网络模式，默认为 false，如果设置为 true，表示使用宿主机网络
volumes <[]Object> 存储卷，用于定义 Pod 上面挂在的存储信息
restartPolicy 重启策略，表示 Pod 在遇到故障的时候的处理策略

2.Pod常用命令

kubectl create -f nginx-01.yaml 
kubectl apply -f nginx-01.yaml 
kubectl get pod 
kubectl get pod -l name=nginx 
kubectl delete pod nginx 
kubectl delete pod –all 
kubectl get pod -o wide 
kubectl edit pod nginx 
kubectl get pod nginx -o yaml 
Kubectl delete pod –f nginx-01.yaml 
kubectl label pod nginx project=web 
kubectl annotate pod nginx project=web 
kubectl exec -it nginx /bin/bash 
kubectl cp default/nginx:/etc/nginx/nginx.conf ~/nginx.conf 
kubectl cp ~/aa default/nginx:/tmp
kubectl logs nginx

3.Pod生命周期

Pod资源对象_pod资源清单_02

4.Pod重启策

•Pod的重启策略RestartPolicy可能的值为 Always、OnFailure 和 Never，默认的是重启策略为 Always

•Always：当容器失效时，由kubelet自动重启

•OnFailure：当容器终止运行且退出码不为0时，由kubelet自动重启

•Never：不论容器运行状态如何都不会重启 - 转载请保留

5.Pod健康检查

ReadnessProbe：就绪性探测其存活性探测的方法可配置以下三种实现方式：

• ExecAction：在容器内执行指定命令。如果命令退出时返回码为 0 则表明容器健康

•TCPSocketAction：对指定端口上的容器的 IP 地址进行 TCP 检查。如果能够建立连接，则表明容器健康。

•HTTPGetAction：对指定的端口和路径上的容器的 IP 地址执行 HTTP Get 请求。如果响应的状态码大于等于200 且小于 400则表明容器健康

initialDelaySeconds和timeoutSeconds参数，分别表示首次检查等待时间以及超时时间。

periodSeconds: 15 #检查间隔时间

failureThreshold: 3最大失败次数

successThreshold: 1失败后测试成功的最小连接成功次数

5.1 readliness探针

5.1.1 readiness-exec探针示例：

apiVersion: v1 
kind: Pod 
metadata:
  labels: 
    test: readiness-exec 
  name: readiness-exec 
spec: 
  containers:
  - name: liveness
    image: busybox 
    args: 
    - /bin/sh 
    - -c 
    - echo ok > /tmp/health; sleep 10; rm -rf /tmp/health; sleep 600 
    readinessProbe: 
      exec: 
        command: 
        - cat 
        - /tmp/health 
      initialDelaySeconds: 15 
      timeoutSeconds: 1

5.1.2 readiness-http探针示例：

apiVersion: v1 
kind: Pod 
metadata:
  labels: 
    test: readiness-http
  name: readiness-http
spec: 
  containers:
  - name: nginx
    image: nginx
    ports:
    - containerPort: 80
    readinessProbe: 
      httpGet: 
        path: /_status/healthz
        port: 80
      initialDelaySeconds: 30
      timeoutSeconds: 1

5.1.3 readiness-tcp探针示例：

apiVersion: v1 
kind: Pod 
metadata:
  labels: 
    test: readiness-tcp
  name: readiness-tcp
spec: 
  containers:
  - name: nginx
    image: nginx
    ports:
    - containerPort: 80
    readinessProbe: 
      tcpSocket: 
        port: 80
      initialDelaySeconds: 30
      timeoutSeconds: 1

5.2 liveness探针

•LivenessProbe：存活性探测

5.2.1 liveness-exec探针示例

apiVersion: v1 
kind: Pod 
metadata:
  labels: 
    test: liveness-exec 
  name: liveness-exec 
spec: 
  containers:
  - name: liveness
    image: busybox 
    args: 
    - /bin/sh 
    - -c 
    - echo ok > /tmp/health; sleep 10; rm -rf /tmp/health; sleep 600 
    livenessProbe: 
      exec: 
        command: 
        - cat 
        - /tmp/health 
      initialDelaySeconds: 15 
      timeoutSeconds: 1

5.2.2 liveness-http探针示例

apiVersion: v1 
kind: Pod 
metadata:
  labels: 
    test: readiness-http
  name: readiness-http
spec: 
  containers:
  - name: nginx
    image: nginx
    ports:
    - containerPort: 80
    livenessProbe: 
      httpGet: 
        path: /_status/healthz
        port: 80
      initialDelaySeconds: 30
      timeoutSeconds: 1

5.2.3 liveness-tcp探针示例：

apiVersion: v1 
kind: Pod 
metadata:
  labels: 
    test: liveness-tcp
  name: liveness-tcp
spec: 
  containers:
  - name: nginx
    image: nginx
    ports:
    - containerPort: 80
    livenessProbe: 
      tcpSocket: 
        port: 80
      initialDelaySeconds: 30
      timeoutSeconds: 1

6.imagePullPolicy

三个选择Always、Never、IfNotPresent，每次启动时检查和更新（从registery）images的策略，

Always，每次都检查

Never，每次都不检查（不管本地是否有）

IfNotPresent，如果本地有就不检查，如果没有就拉取

7.资源管理

7.1 tomcat示例

本示例是对empty的volume和resources做了配置

apiVersion: v1 
kind: Pod 
metadata: 
  name: volume-pod 
spec: 
  containers: 
  - name: tomcat 
    image: tomcat 
    ports: 
    - containerPort: 8080 
    volumeMounts: 
    - name: app-logs 
      mountPath: /usr/local/tomcat/logs 
    resources: 
      requests:
        cpu: 0.1
        memory: 100Mi
      limits: 
        cpu: 0.2
        memory: 200Mi 
  - name: busybox 
    image: busybox 
    command: ["sh", "-c", "tail -f /logs/catalina*.log"] 
    volumeMounts: 
    - name: app-logs 
      mountPath: /logs 
  volumes: 
  - name: app-logs 
    emptyDir: {}

8.生命周期管理

• postStart : # 容器运行之前运行的任务

Start示例

apiVersion: v1 
kind: Pod 
metadata: 
  name: nginx 
  labels: name: nginx 
spec: 
  containers: 
  - name: nginx 
    image: nginx 
    ports: 
    - containerPort: 80 
    lifecycle: 
      postStart: 
        exec: 
          command: ["/bin/sh", "-c", "echo Hello from the postStart handler > /usr/share/message"]

• preStop :# 容器关闭之前运行的任务

Stop示例

nginx-preStop-exec.yaml示例

apiVersion: v1 
kind: Pod 
metadata: 
  name: nginx 
  labels: name: nginx 
spec: 
  containers: 
  - name: nginx 
    image: nginx 
    ports: 
    - containerPort: 80 
    lifecycle: 
      postStop: 
        exec: 
          command: ["/usr/sbin/nginx","-s","quit"]

8.2. nginx-preStop-httpGet.yaml示例

apiVersion: v1 
kind: Pod 
metadata: 
  name: nginx 
  labels: name: nginx 
spec: 
  containers: 
  - name: nginx 
    image: nginx 
    ports: 
    - containerPort: 80 
    lifecycle: 
      postStop: 
        httpGet:
          host: 192.168.4.170
          path: api/v2/devops/pkg/upload_hooks
          port: 8090

9.init Container

apiVersion: v1 
kind: Pod 
metadata: 
  name: myapp-pod 
  labels: 
    app: myapp 
spec: 
  containers: 
  - name: myapp-container 
    image: busybox 
    command: ['sh', '-c', 'echo The app is running! && sleep 3600'] 
  initContainers: 
  - name: init-myservice 
    image: busybox 
    command: ['sh', '-c', 'until nslookup myservice; do echo waiting for myservice; sleep 2; done;'] 
  - name: init-mydb 
    image: busybox 
    command: ['sh', '-c', 'until nslookup mydb; do echo waiting for mydb; sleep 2; done;']

---
kind: Service 
apiVersion: v1 
metadata: 
  name: myservice 
spec: 
  ports: 
  - protocol: TCP x
    port: 80 
    targetPort: 9376

--- 
kind: Service 
apiVersion: v1 
metadata: 
  name: mydb 
spec:
  ports: 
  - protocol: TCP
    port: 80 
    targetPort: 9377

10. nodeSelector

apiVersion: v1 
kind: Pod 
metadata: 
  name: nginx 
  labels: 
    name: nginx 
spec: 
  nodeSelector: 
    zone: node1 
  containers: 
  - name: nginx 
    image: nginx 
    ports: 
    - containerPort: 80 
      hostPort: 80

11. affinity

Affinity对应pod的pod.spec.affinity字段，下面有多种组合配置，需要先明白一些概念。

Affinity和Anti-affinity

亲和性分为正向的Affinity，表示愿意被分配至目标node，和反向的Anti-affinity，表示不愿意被分配至目标node。

NodeAffinity和PodAffinity

两个不同维度的亲和性策略，NodeAffinity是根据node的标签去决定是否分配。而PodAffinity是根据pod的标签去决定要不要和别的pod分配到一个node。

软策略和硬策略

软策略字段以preferred开头，表示尽量达到，没满足也没关系，多个软策略之间会有各自权重。硬策略字段以required开头，表示必须满足。

实际操作

因为本节的概念理解起来相对容易，所以直接上手操作。不过因为字段组合较多，建议多利用kubectl explain pod.spec.affinity查看各个字段的含义。

硬策略

通过下面的yaml文件test-nodeaffinity-hard.yaml来验证下node亲和性的硬策略

apiVersion: v1
kind: Pod
metadata:
  name: test-nodeaffinity-hard
  labels:
    app: app1
spec:
  containers:
    - name: mynginx
      image: mynginx:v2
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:       ###硬策略
        nodeSelectorTerms:
          - matchExpressions:
              - key: kubernetes.io/hostname     ###lablels
                operator: In            
                values:
                  - k8s-node3        ###lables_name

这里的matchExpressions是比配node的labels，底下的三个字段如下

字段	类型	说明
key	string	node的标签的key
operator	string	判断符号，可以选In/NotIn/Exists/DoesNotExist/Gt/Lt
values	list	一组值，结合上面的运算符号进行判断

node的labels可以用如下方式查看

[root@k8s-master affinity]# kubectl get node --show-labels
NAME         STATUS   ROLES    AGE   VERSION    LABELS
k8s-master   Ready    master   14d   v1.15.11   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-master,kubernetes.io/os=linux,node-role.kubernetes.io/master=
k8s-node1    Ready    <none>   14d   v1.15.11   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node1,kubernetes.io/os=linux

我这里有两个node，它们的hostname都不是k8s-nodes，在这种硬策略下该pod没有node可以分配

[root@k8s-master affinity]# kubectl apply -f test-nodeaffinity-hard.yaml
pod/test-nodeaffinity-hard created
[root@k8s-master affinity]# kubectl get pod -o wide
NAME                     READY   STATUS    RESTARTS   AGE   IP       NODE     NOMINATED NODE   READINESS GATES
test-nodeaffinity-hard   0/1     Pending   0          6s    <none>   <none>   <none>           <none>

这是修改下pod的信息

apiVersion: v1
kind: Pod
metadata:
  name: test-nodeaffinity-hard
  labels:
    app: app1
spec:
  containers:
    - name: mynginx
      image: mynginx:v2
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: kubernetes.io/hostname   ####修改了此值
                operator: In
                values:
                  - k8s-master

再启动，无论启动多少次都只会在master节点

[root@k8s-master affinity]# kubectl apply -f test-nodeaffinity-hard.yaml
pod/test-nodeaffinity-hard created
[root@k8s-master affinity]# kubectl get pod -o wide
NAME                     READY   STATUS    RESTARTS   AGE   IP            NODE         NOMINATED NODE   READINESS GATES
test-nodeaffinity-hard   1/1     Running   0          5s    10.244.0.19   k8s-master   <none>           <none>

我这里没有足够节点所以选择master节点，一般master节点是不会分配pod的

软策略

将上面的yaml文件稍微修改成如下的test-nodeaffinity-soft.yaml

apiVersion: v1
kind: Pod
metadata:
  name: test-nodeaffinity-soft
  labels:
    app: app1
spec:
  containers:
    - name: mynginx
      image: mynginx:v2
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:    ###软策略
        - weight: 1
          preference:
            matchExpressions:
              - key: kubernetes.io/hostname
                operator: In
                values:
                  - k8s-node3

这里虽然还是希望能被调度到node3，但是没有这么一个node存在的时候，还是能被成功调度的

[root@k8s-master affinity]# kubectl apply -f test-nodeaffinity-soft.yaml
pod/test-nodeaffinity-soft created
[root@k8s-master affinity]# kubectl get pod -o wide
NAME                     READY   STATUS    RESTARTS   AGE     IP             NODE         NOMINATED NODE   READINESS GATES
test-nodeaffinity-hard   1/1     Running   0          9m40s   10.244.0.19    k8s-master   <none>           <none>
test-nodeaffinity-soft   1/1     Running   0          5s      10.244.1.133   k8s-node1    <none>           <none>

因为有权重的存在，所以可以设置多条策略，按照权重高低进行选择。权重可配置1-100。权重值越大，越优先调度

当然硬策略和软策略可以一起配置，优先考虑硬策略。硬策略不满足,软策略也调度不了。

PodAffinity实际操作

pod与pod之间就开始出现Affinity和Anti-Affinity了

硬策略

目前已存在的两个pod如下，用作参照

[root@k8s-master affinity]# kubectl get pod --show-labels
NAME                     READY   STATUS    RESTARTS   AGE   LABELS
test-nodeaffinity-hard   1/1     Running   0          49m   app=app1
test-nodeaffinity-soft   1/1     Running   0          39m   app=app1

用如下的yaml文件test-podaffinity-hard.yaml去创建一个pod

apiVersion: v1
kind: Pod
metadata:
  name: test-podaffinity-hard
  labels:
    app: app2
spec:
  containers:
    - name: mynginx
      image: mynginx:v2
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchExpressions:
              - key: app
                operator: In
                values:
                  - app1
          topologyKey: kubernetes.io/hostname

注意，这里用的是Anti-Affinity，所以是不想和目标pod在一起。然后这里还有个topologyKey字段，表示node的一个label，表示要被创建的pod所在的node的这个label值和被比较pod所在的node这个label值要不一致才行。这个逻辑关系有点绕，就是先选出目标pod，然后根据目标pod所在node的label来决定去哪个node。

这里选出的pod是满足app: app1的pod，上面两个pod都满足，然后是查看node的hostname，上述两个pod已经把集群内的两个node都占据了，所以新的pod没有node可以被分配

[root@k8s-master affinity]# kubectl apply -f test-podaffinity-hard.yaml
pod/test-podaffinity-hard created
[root@k8s-master affinity]# kubectl get pod -o wide
NAME                     READY   STATUS    RESTARTS   AGE   IP             NODE         NOMINATED NODE   READINESS GATES
test-nodeaffinity-hard   1/1     Running   0          53m   10.244.0.19    k8s-master   <none>           <none>
test-nodeaffinity-soft   1/1     Running   0          43m   10.244.1.133   k8s-node1    <none>           <none>
test-podaffinity-hard    0/1     Pending   0          8s    <none>         <none>       <none>           <none>

修改其中一个pod的label为app:test

[root@k8s-master affinity]# kubectl edit pod test-nodeaffinity-hard
pod/test-nodeaffinity-hard edited
[root@k8s-master affinity]# kubectl get pod --show-labels -o wide
NAME                     READY   STATUS    RESTARTS   AGE   IP             NODE         NOMINATED NODE   READINESS GATES   LABELS
test-nodeaffinity-hard   1/1     Running   0          63m   10.244.0.19    k8s-master   <none>           <none>            app=test
test-nodeaffinity-soft   1/1     Running   0          54m   10.244.1.133   k8s-node1    <none>           <none>            app=app1
test-podaffinity-hard    1/1     Running   0          10m   10.244.0.20    k8s-master   <none>           <none>            app=app2

可以看到新的pod被创建了，而且是分配在master

软策略

再通过下面的yaml文件test-podaffinity-soft演示一下正向的Affinity

apiVersion: v1
kind: Pod
metadata:
  name: test-podaffinity-soft
  labels:
    app: app2
spec:
  containers:
    - name: mynginx
      image: mynginx:v2
  affinity:
    podAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchExpressions:
              - key: app
                operator: In
                values:
                  - test
          topologyKey: kubernetes.io/hostname

这里新的pod想和app:test的pod分配在同一个hostname的node上，也就是master

apiVersion: v1
kind: Pod
metadata:
  name: test-podaffinity-soft
  labels:
    app: app2
spec:
  containers:
    - name: mynginx
      image: mynginx:v2
  affinity:
    podAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 1
          podAffinityTerm:
            labelSelector:
              matchExpressions:
                - key: app
                  operator: In
                  values:
                    - test
            topologyKey: kubernetes.io/hostname

注意这里权重是1，新建以后发现新pod并不在master上

[root@k8s-master affinity]# kubectl get pod -o wide --show-labels
NAME                     READY   STATUS    RESTARTS   AGE   IP             NODE         NOMINATED NODE   READINESS GATES   LABELS
test-nodeaffinity-hard   1/1     Running   0          71m   10.244.0.19    k8s-master   <none>           <none>            app=test
test-nodeaffinity-soft   1/1     Running   0          61m   10.244.1.133   k8s-node1    <none>           <none>            app=app1
test-podaffinity-hard    1/1     Running   0          17m   10.244.0.20    k8s-master   <none>           <none>            app=app2
test-podaffinity-soft    1/1     Running   0          23s   10.244.1.134   k8s-node1    <none>           <none>            app=app2

总结

总结下这一节的知识点

Affinity分为节点型的NodeAffinity和pod型的PodAffinity，不过到了最后都是看node的标签决定分配位置，前者是直接查看，后者是根据符合条件的pod去查看

PodAffinity分为正向的Affinity和反向的Anti-Affinity

硬策略是一定要满足的，而软连接有权重，按照权重大小去依次尽量满足

apiVersion: v1 
kind: Pod 
metadata: 
  name: with-anti-affinity
spec: 
  affinity: 
    podAffinity: 
      requiredDuringSchedulingIgnoredDuringExecution: 
      - labelSelector: 
          matchExpressions: 
          - key: security 
            operator: In 
            values: 
            - S1 
        topologyKey: "kubernetes.io/hostname" 
    podAntiAffinity: 
      preferredDuringSchedulingIgnoredDuringExecution: 
      - weight: 100 
      podAffinityTerm: 
        labelSelector: 
          matchExpressions: 
          - key: security 
            operator: In 
            values: 
            - S2 
        topologyKey: kubernetes.io/hostname
  containers: 
  - name: with-anti-affinity 
    image: nginx 

---

apiVersion: v1 
kind: Pod 
metadata: 
  name: pod-flag-s2 
    labels: 
    security: "S2" 
    app: "nginx" 
spec: 
  containers: 
  - name: nginx 
    image: nginx 

--- 


apiVersion: v1 
kind: Pod 
metadata: 
  name: pod-flag-s1 
  labels: 
    security: "S1" 
    app: "nginx" 
spec:
  containers:  
  - name: nginx 
    image: nginx

---

 apiVersion: v1 
 kind: Pod 
 metadata: 
   name: pod-affinity 
spec:
  affinity: 
    podAffinity: 
      requiredDuringSchedulingIgnoredDuringExecution: 
      - labelSelector: 
          matchExpressions: 
          - key: security 
            operator: In 
            values: 
            - S1
        topologyKey: kubernetes.io/hostname

12.activeDeadlineSeconds

该activeDeadlineSeconds适用于工作的持续时间，不管有多少豆荚创建。一旦工作到达activeDeadlineSeconds，所有的运行荚的终止和工作状态将成为type: Failed与reason: DeadlineExceeded

apiVersion: v1 
kind: Pod 
metadata: 
  name: nginx 
  labels: 
    name: nginx 
spec: 
  activeDeadlineSeconds: 30 
  containers: 
  - name: nginx 
    image: nginx 
    ports: 
    - containerPort: 80 
      hostPort: 80

Pod资源对象_pod资源清单_03

13.Pod的dns策略

1）Pod dns策略none

apiVersion: v1 
kind: Pod 
metadata: 
  namespace: default 
  name: dns-example 
spec: 
  containers: 
  - name: test
    image: busybox 
    args: 
    - "sh" 
    - "-c" 
    - "sleep 3600" 
  dnsPolicy: "None" 
  dnsConfig: 
    nameservers: 
      - 114.114.115.115 
    searches: 
      - ns1.svc.cluster.local 
      - my.dns.search.suffix 
    options: 
    - name: ndots 
      value: "2"

14.dnsPolicy

Default：继承Pod所在宿主机的DNS设置
ClusterFirst：优先使用kubernetes环境的dns服务，将无法解析的域名转发到从宿主机继承的dns服务器
ClusterFirstWithHostNet：和ClusterFirst相同，对于以hostNetwork模式运行的Pod应明确知道使用该策略
None：忽略kubernetes环境的dns配置，通过spec.dnsConfig自定义DNS配置
自定义Dns配置可以通过spec.dnsConfig字段进行设置，可以设置如下信息

nameservers：一组dns服务器的列表，最多可设置3个
searchs：一组用于域名搜索的dns域名后缀，最多6个
options：配置其他可选参数，例如ndots、timeout等

1.default

apiVersion: v1 
kind: Pod 
metadata: 
  name: dns-example 
spec: 
  containers: 
  - name: test 
    image: busybox 
    args: 
    - "sh" 
    - "-c" 
    - "sleep 3600" 
  dnsPolicy: "Default"

2.hostNetWork-ClusterFirstWithHostNet

apiVersion: v1 
kind: Pod 
metadata: 
  name: dns-example 
spec: 
  containers: 
  - name: test 
    image: busybox 
    args: 
    - "sh" 
    - "-c" 
    - "sleep 3600" 
  dnsPolicy: "ClusterFirstWithHostNet" 
  hostNetWork: true

15.临时容器 Ephemeral Containers

当由于容器崩溃或容器镜像不包含调试工具而导致 kubectl exec 无用时，临时容器对于交互式故障排查很有用。

尤其是，Distroless 镜像允许用户部署最小的容器镜像，从而减少攻击面并减少故障和漏洞的暴露。由于 distroless

镜像不包含 Shell 或任何的调试工具，因此很难单独使用 kubectl exec 命令进行故障排查。

使用临时容器时，启用进程名字空间共享很有帮助，可以查看其他容器中的进程。

开启临时容器功能

开启特性

1. master 节点配置 APIServer 组件
[root@vms120 ~]# cat /etc/kubernetes/manifests/kube-apiserver.yaml
- --feature-gates=EphemeralCnotallow=true
...

2. master 节点配置 controller-manager
[root@vms120 ~]# vim /etc/kubernetes/manifests/kube-controller-manager.yaml
spec:
  containers:
  - command:
    - --feature-gates=EphemeralCnotallow=true            # 增加
...

3. master 节点配置 kube-scheduler
[root@vms120 ~]# vim /etc/kubernetes/manifests/kube-scheduler.yaml
spec:
  containers:
  - command:
    - --feature-gates=EphemeralCnotallow=true            # 增加
# 重启服务 
[root@vms120 ~]# systemctl restart kubelet.service

4. 所有 node 节点配置 kubelet 参数
添加 --feature-gates=EphemeralCnotallow=true
[root@vms121 kubernetes]# cat /var/lib/kubelet/kubeadm-flags.env
KUBELET_KUBEADM_ARGS="--network-plugin=cni --pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.6 --feature-gates=EphemeralCnotallow=true"
# 重启 node kubelet 服务
[root@vms121 kubernetes]# systemctl daemon-reload
[root@vms121 kubernetes]# systemctl restart kubelet

测试

1. 创建 pod
[root@vms120 ~]# kubectl run ephemeral-demo --image=registry.aliyuncs.com/google_containers/pause:3.2 --restart=Never
pod/ephemeral-demo created

[root@vms120 ~]# kubectl  exec -it ephemeral-demo  -- sh
OCI runtime exec failed: exec failed: unable to start container process: exec: "sh": executable file not found in $PATH: unknown
command terminated with exit code 126
# 无法 kubectl exec

解决无法 exec ，我们创建一个临时容器添加到这个pod里

加上-i参数将直接进入添加的临时容器的控制台界面，因为是使用kubectl run 创建的pod ,所以需要-target 参数指定另一个容器的进程命名空间。因为 kubectl run 不能在它创建的pod中启用共享进程命名空间

[root@vms120 ~]# kubectl debug -it ephemeral-demo --image=busybox --target=ephemeral-demo
Targeting container "ephemeral-demo". If you don't see processes from this container it may be because the container runtime doesn't support this feature.
Defaulting debug container name to debugger-bljnj.
If you don't see a command prompt, try pressing enter.
/ # ls
bin   dev   etc   home  proc  root  sys   tmp   usr   var
/ #

此时再去看pod 的信息会发现已经被添加了一个类型为ephemeralContainers的容器

[root@vms120 ~]# kubectl  get pod ephemeral-demo   -o json|jq .spec
{
  "containers": [
    {
      "image": "registry.aliyuncs.com/google_containers/pause:3.2",
      "imagePullPolicy": "IfNotPresent",
      "name": "ephemeral-demo",
      "resources": {},
      "terminationMessagePath": "/dev/termination-log",
      "terminationMessagePolicy": "File",
      "volumeMounts": [
        {
          "mountPath": "/var/run/secrets/kubernetes.io/serviceaccount",
          "name": "kube-api-access-sqmzl",
          "readOnly": true
        }
      ]
    }
  ],
  "dnsPolicy": "ClusterFirst",
  "enableServiceLinks": true,
  "ephemeralContainers": [
    {
      "image": "busbox",
      "imagePullPolicy": "Always",
      "name": "debugger-9l8mw",
      "resources": {},
      "stdin": true,
      "targetContainerName": "ephemeral-demo",
      "terminationMessagePath": "/dev/termination-log",
      "terminationMessagePolicy": "File",
      "tty": true
    },
    {
      "image": "busybox",
      "imagePullPolicy": "Always",
      "name": "debugger-slx6g",
      "resources": {},
      "stdin": true,
      "targetContainerName": "ephemeral-demo",
      "terminationMessagePath": "/dev/termination-log",
      "terminationMessagePolicy": "File",
      "tty": true
    },
    {
      "image": "busybox",
      "imagePullPolicy": "Always",
      "name": "debugger-gw6zt",
      "resources": {},
      "stdin": true,
      "terminationMessagePath": "/dev/termination-log",
      "terminationMessagePolicy": "File"
    },
    {
      "image": "busybox",
      "imagePullPolicy": "Always",
      "name": "debugger-cxc8b",
      "resources": {},
      "stdin": true,
      "targetContainerName": "ephemeral-demo",
      "terminationMessagePath": "/dev/termination-log",
      "terminationMessagePolicy": "File",
      "tty": true
    },
    {
      "image": "busybox",
      "imagePullPolicy": "Always",
      "name": "debugger-bljnj",
      "resources": {},
      "stdin": true,
      "targetContainerName": "ephemeral-demo",
      "terminationMessagePath": "/dev/termination-log",
      "terminationMessagePolicy": "File",
      "tty": true
    }
  ],
  "nodeName": "vms121.rhce.cc",
  "preemptionPolicy": "PreemptLowerPriority",
  "priority": 0,
  "restartPolicy": "Never",
  "schedulerName": "default-scheduler",
  "securityContext": {},
  "serviceAccount": "default",
  "serviceAccountName": "default",
  "terminationGracePeriodSeconds": 30,
  "tolerations": [
    {
      "effect": "NoExecute",
      "key": "node.kubernetes.io/not-ready",
      "operator": "Exists",
      "tolerationSeconds": 300
    },
    {
      "effect": "NoExecute",
      "key": "node.kubernetes.io/unreachable",
      "operator": "Exists",
      "tolerationSeconds": 300
    }
  ],
  "volumes": [
    {
      "name": "kube-api-access-sqmzl",
      "projected": {
        "defaultMode": 420,
        "sources": [
          {
            "serviceAccountToken": {
              "expirationSeconds": 3607,
              "path": "token"
            }
          },
          {
            "configMap": {
              "items": [
                {
                  "key": "ca.crt",
                  "path": "ca.crt"
                }
              ],
              "name": "kube-root-ca.crt"
            }
          },
          {
            "downwardAPI": {
              "items": [
                {
                  "fieldRef": {
                    "apiVersion": "v1",
                    "fieldPath": "metadata.namespace"
                  },
                  "path": "namespace"
                }
              ]
            }
          }
        ]
      }
    }
  ]
}

有些时候 Pod 的配置参数使得在某些情况下很难执行故障排查。例如，在容器镜像中不包含 shell 或者你的应用程序在启动时崩溃的情况下，就不能通过运行 kubectl exec 来排查容器故障。在这些情况下，你可以使用 kubectl debug 来创建 Pod 的副本，通过更改配置帮助调试

报错

error: ephemeral containers are disabled for this cluster (error from server: "the server could not find the requested resource").

未成功开启ephemeralContainers特性

json格式

{
    "apiVersion": "v1",
    "kind": "EphemeralContainers",
    "metadata": {
        "name": "nginx"
    },
    "ephemeralContainers": [
        {
            "command": [
                "bash"
            ],
            "image": "shoganator/rpi-alpine-tools",
            "imagePullPolicy": "Always",
            "name": "diagtools",
            "stdin": true,
            "tty": true,
            "terminationMessagePolicy": "File"
        }
    ]
}

kubectl -n default replace --raw / api /v1/namespaces/default/pods/ nginx / ephemeralcontainers -f ./ ephemeral.json

16.配置hosts域名之spec.hostalias

apiVersion: v1 
kind: Pod 
metadata:
  name: hostaliases-pod
spec: 
  restartPolicy: Never 
  hostAliases: 
  - ip: "127.0.0.1" 
    hostnames: 
    - "foo.local" 
    - "bar.local" 
  - ip: "10.1.2.3" 
    hostnames: 
    - "foo.remote" 
    - "bar.remote" 
  containers:
  - name: cat-hosts 
    image: nginx 
    command: 
    - cat 
    args: 
    - "/etc/hosts"

17.spec.hostname

固定pod 的 hostname

apiVersion: v1
kind: Pod
metadata:
  name: busybox2
  labels:
    name: busybox
spec:
  hostname: busybox-2
  containers:
  - image: busybox
    command:
      - sleep
      - "3600"
    name: busybox

这里可以看到，如果单独创建pod ，这里的pod name 是使用 metadata.name 参数的值，pod 的 hostname 是使用 spec.hostname 参数的值。

18.spec.nodeName

spec.nodeName用于强制约束将Pod调度到指定的Node节点上，这里说是“调度”，但其实指定了nodeName的Pod会直接跳过Scheduler的调度逻辑，直接写入PodList列表，该匹配规则是强制匹配。

apiVersion: v1 
kind: Pod 
metadata: 
  name: nodename-pod 
spec: 
  restartPolicy: Never 
  nodeName: k8s-master       ####我测试用ip地址不行
  containers: 
  - name: cat-hosts 
    image: nginx

19.preemptionPolicy（抢占机制）

优先级和抢占机制,解决的是Pod调度失败时该怎么办的问题

正常情况下,当一个Pod调度失败后,它就会被暂时“搁置”起来,直到Pod被更新,或者集群状态发生变化,调度器才会对这个Pod进行重新调度

特殊要求的场景:

当一个高优先级的Pod调度失败后,该Pod并不会被“搁置”,而是会“挤走”某个Node上的一些低优先级的Pod.这样就保证这个高优先级Pod的调度成功

apiVersion: v1 
kind: Pod 
metadata: 
  name: preemption-pod
spec: 
  restartPolicy: Never 
  preemptionPolicy: PreemptLowerPriority
  containers: 
  - name: cat-hosts 
    image: nginx

20.priority（优先级机制）

apiVersion: v1
kind: Pod 
metadata: 
  name: priority-pod 
spec: 
  restartPolicy: Never 
  preemptionPolicy: PreemptLowerPriority 
  priority: 1000 
  containers: 
  - name: cat-hosts 
    image: nginx

21.priorityClassName

apiVersion: v1 
kind: Pod 
metadata: 
  name: priorityclass-pod 
spec: 
  restartPolicy: Never 
  priorityClassName: high-priority    
  containers: 
  - name: cat-hosts 
    image: nginx 
    
---
apiVersion: scheduling.k8s.io/v1 
kind: PriorityClass 
metadata: 
  name: high-priority 
value: 1000000 
globalDefault: false 
description: "This priority class should be used for XYZ service pods only."

22.readinessGates

总结：这种只更新 Pod 中某一个或多个容器版本、而不影响整个 Pod 对象、其余容器的升级方式，被我们称为 Kubernetes 中的原地升级

在原地升级的过程中，我们仅仅更新了原 Pod 对象中 foo 容器的 image 字段来触发 foo 容器升级到新版本。而不管是 Pod 对象，还是 Node、IP 都没有发生变化，甚至 foo 容器升级的过程中 bar 容器还一直处于运行状态。

apiVersion: v1 
kind: Pod 
metadata: 
  name: nginx 
  labels: 
    name: nginx 
spec: 
  readinessGates: 
  - conditionType: "www.example.com/feature-1" 
  containers: 
  - name: nginx 
    image: nginx 
    ports: 
    - containerPort: 80 
      hostPort: 80

Pod资源对象_pod资源清单_04

此时对于使用自定义条件的Pod，仅当以下两个语句均适用时，该Pod才被评估为就绪：

Pod中的所有容器均已准备就绪。
ReadinessGates中指定的所有条件均为True。

当Pod的容器准备就绪，但至少缺少一个自定义条件或False时，kubelet将Pod的条件设置为ContainersReady。

23.Security Context

（1）背景引入

说明：'privileged'(特权模式) '不等于' root-->比如：root用户无法'关闭容器网卡'
 
思考：哪些容器需要'修改内核参数'？

Pod资源对象_pod资源清单_05

（2）安全上下文的配置级别

Pod资源对象_pod资源清单_06

（3）安全上下文的设置方式

Pod资源对象_pod资源清单_07

三 Pod 设置 Security Context

用法：在 Pod 定义的'资源清单文件中'添加 'securityContext' 字段,就可以为 Pod 指定安全上下文相关的设定

特点：通过'该字段'指定的内容将会对'当前 Pod 中的所有容器'生效

Pod资源对象_pod资源清单_08

apiVersion: v1
kind: Pod
metadata:
  name: security-context
spec:
  volumes:
  - name: sec-vol
    emptyDir: {}
  securityContext:          '三个常用参数'
    runAsUser: 1000
    runAsGroup: 3000
    fsGroup: 2000
  containers:
  - name: sec-demo
    image: busybox
    command: ["sh", "-c", "sleep 3600"]
    volumeMounts:
    - name: sec-vol
      mountPath: /pod/sec
    securityContext:
      allowPrivilegeEscalation: false

Pod资源对象_pod资源清单_09

一个思考：

制作镜像'USER'是root,但是进程运行的是1000,不能往'数据卷'里面写文件-->'怎么办？'
 
（1）进程指定成root -->'根本上'
 
（2）initContaner  --> chmod 改变权限

（3）查看和验证

说明的是runAsUser指的是'常驻进程-->也就是entrypoint或者cmd'的'用户身份'-->'也是进入容器后相应bash的用户身份'

① 进入容器，top命令查看

kubectl exec -it security-context -- top

Pod资源对象_pod资源清单_10

对比

Pod资源对象_pod资源清单_11

② 查看数据卷

Pod资源对象_pod资源清单_12

③ 综合查看

Pod资源对象_pod资源清单_13

④ 尝试删除

Pod资源对象_pod资源清单_14

四容器设置 Security Context

kubectl explain pods.spec.'containers.securityContext'
 
说明： 参数和Pod级别差不多,这里'不再细讲'

Pod资源对象_pod资源清单_15

apiVersion: v1
kind: Pod
metadata:
  name: security-context-container
spec:
  securityContext:
    runAsUser: 1000                              '对比'
  containers:
  - name: sec-ctx-demo
    image: busybox
    command: [ "sh", "-c", "sleep 60m" ]
    securityContext:
      runAsUser: 2000                            '对比'
      allowPrivilegeEscalation: false

Pod资源对象_pod资源清单_16

kubectl exec -it security-context-container -- top

Pod资源对象_pod资源清单_17

五设置 Linux Capabilities

（1）背景

Pod资源对象_pod资源清单_18

（2）什么是 Capabilities

Pod资源对象_pod资源清单_19

capabilities man page

Pod资源对象_pod资源清单_20

（3） docker中设置

Pod资源对象_pod资源清单_21

Docker 容器本质上就是一个进程，所以理论上容器就会和进程一样会有一些默认的开放权限，默认情况下 Docker 会删除必须的 capabilities 之外的所有 capabilities，因为在容器中我们经常会以 root 用户来运行，使用 capabilities 限制后，容器中的使用的 root 用户权限就比我们平时在宿主机上使用的 root 用户权限要少很多了，这样即使出现了安全漏洞，也很难破坏或者获取宿主机的 root 权限，所以 Docker 支持 Capabilities 对于容器的安全性来说是非常有必要的。

不过我们在运行容器的时候可以通过指定 --privileded 参数来开启容器的超级权限，这个参数一定要慎用，因为他会获取系统 root 用户所有能力赋值给容器，并且会扫描宿主机的所有设备文件挂载到容器内部，所以是非常危险的操作。

但是如果你确实需要一些特殊的权限，我们可以通过 --cap-add 和 --cap-drop 这两个参数来动态调整，可以最大限度地保证容器的使用安全。

Pod资源对象_pod资源清单_22

Pod资源对象_pod资源清单_23

下面'表格中'列出的 Capabilities 是 'Docker 默认给容器添加的',我们可以通过 --cap-drop '去除'其中一个或者多个

Pod资源对象_pod资源清单_24

vendor/github.com/containerd/containerd/oci/'spec_unix.go',这个文件定义了'缺省的capability'

新版本

Pod资源对象_pod资源清单_25

下面'表格中'列出的 Capabilities 是 Docker '默认删除'的,我们可以通过'--cap-add添加'其中一个或者多个：

Pod资源对象_pod资源清单_26

需求：修改网络接口数据

默认情况下是'没有权限的'，因为需要的 'NET_ADMIN' 这个 Capabilities ,'默认被移除了'

Pod资源对象_pod资源清单_27

docker run -it --rm '--cap-add=NET_ADMIN' busybox /bin/sh

参考博客

（4）操作系统上设置

kernel 2.2 之后Linux 以capabilities区分不同单元的关联root特权,'非root'进程都去'检查对应的capabilities'

扩展学习

Pod资源对象_pod资源清单_28

Pod资源对象_pod资源清单_29

结论：ping 命令在执行时需要'访问网络',所需的 capabilities 为 'cap_net_admin' 和 'cap_net_raw'

继续探究

列出了'系统支持'的capability

Pod资源对象_pod资源清单_30

查看'当前进程'的 capabilities 信息

Pod资源对象_pod资源清单_31

（5）Kubernetes设置

Pod资源对象_pod资源清单_32

Pod资源对象_pod资源清单_33

效果

Pod资源对象_pod资源清单_34

细节：如果在容器中先'关闭网卡,再开启网卡',网络不通

Pod资源对象_pod资源清单_35

24.serviceAccountName

apiVersion: v1 
kind: Pod 
metadata: 
  name: nginx 
  labels: 
    name: nginx 
spec: 
  serviceAccountName: default 
  containers: 
  - name: nginx 
    image: nginx 
    ports: 
    - containerPort: 80 
      hostPort: 80

25.subdomain

apiVersion: v1 
kind: Pod 
metadata: 
  name: nginx 
  labels: 
    app: nginx-0 
spec: 
  hostname: mark 
  subdomain: com 
  containers: 
  - name: nginx 
    image: nginx 
    ports: 
    - containerPort: 80 
      hostPort: 80

26.terminationGracePeriodSeconds

宽限期（terminationGracePeriodSeconds 参数定义，默认情况下30秒）

如果 pod 中定义了 preStop 处理程序，则 pod 被标记为“Terminating”状态时以同步的方式启动执行；若宽限期结束后，preStop 仍未执行结束，第二步会重新执行并额外获得一个2秒的小宽限期(最后的宽限期，所以定义prestop 注意时间,和terminationGracePeriodSeconds 参数配合使用),

apiVersion: v1 
kind: Pod
metadata: 
  name: nginx 
  labels: 
    name: nginx 
spec: 
  terminationGracePeriodSeconds: 0 
  containers: 
  - name: nginx 
    image: nginx 
    ports: 
    - containerPort: 80 
      hostPort: 80

27）污点与容忍

要搞清楚什么是污点？什么是容忍度？

污点 Taint

容忍度 Toleration

Taint（污点）和 Toleration（容忍）可以作用于node和 pod 上（即：污点是给node节点设置的，容忍度是给pod设置的），其目的是优化pod在集群间的调度，这跟节点亲和性类似，只不过它们作用的方式相反，具有Taint的node和pod是互斥关系，而具有节点亲和性关系的node和pod是相吸的。另外还有可以给node节点设置label，通过给pod设置nodeSelector将pod调度到具有匹配标签的节点上。

使用 kubectl taint 命令可以给某个Node节点设置污点，Node 被设置上污点之后就和 Pod 之间存在了一种互斥的关系，可以让 Node 拒绝 Pod 的调度执行，甚至将 Node 已经存在的 Pod 驱逐出去。

Pod 将在一定程度上不会被调度到 Node 上。但我们可以在 Pod 上设置容忍 ( Toleration ) ，意思是设置了容忍的 Pod 将可以容忍污点的存在，可以被调度到存在污点的 Node 上。

节点亲和性，是 Pod 的一种属性（偏好或硬性要求），它使 Pod 被吸引到一类特定的节点。Trint 则相反，它使节点能够排斥一类特定的 Pod。

Taint 和 Toleration 相互配合，可以用来避免 Pod 被分配到不合适的节点上。每个节点上都可以应用一个或多个 Taint，这表示对于那些不能容忍这些 Taint 的 Pod ，是不会被节点接受的。如果 Toleration 应用于 Pod 上，则表示这些 Pod 可以（但不要求）被调度到具有匹配 Taint 的节点上。

1、污点 ( Taint ) 的组成

使用kubectl taint命令可以给某个Node节点设置污点，Node被设置上污点之后就和Pod之间存在了一种相斥的关系，可以让Node拒绝Pod的调度执行，甚至将Node已经存在的Pod驱逐出去。

每个污点的组成如下：

key=value:effect

每个污点有一个 key 和 value 作为污点的标签，其中 value 可以为空，eﬀect 描述污点的作用。

当前 taint eﬀect 支持如下三个选项：

NoSchedule：表示k8s将不会将Pod调度到具有该污点的Node上
PreferNoSchedule：表示k8s将尽量避免将Pod调度到具有该污点的Node上
NoExecute：表示k8s将不会将Pod调度到具有该污点的Node上，同时会将Node上已经存在的Pod驱逐出去

2. 污点的设置、查看和去除

使用 kubectl taint 命令可以给某个 Node 节点设置污点，Node 将设置上污点之后就和 Pod 之间存在了一种相斥的关系，可以让 Node 拒绝 Pod 的调度执行，甚至将 Node 已经存在的 Pod 驱逐出去。

# 设置污点
kubectl taint nodes loc-node36 key1=value:Noscedule

# 节点说明中，查找 Taints 字段
kubectl describe node loc-master35

# 去除污点
kubectl taint nodes loc-node36 key1:Noscedule-

设置了污点的 Node 将要根据 taint 的 effect: Noscedule、PreferNoSchedule、NoExecute 和 Pod 之间产生互斥的关系，Pod 将在一定程度上不会被调度到 Node 上，但我们可以在 Pod 上设置容忍（Toleration）,意思是设置了容忍的 Pod 将可以容忍污点的存在，可以被调度到存在污点的 Node 上。

kubectl taint node [节点][任意值]:[NoSchedule、NoExecute、PreferNoSchedule]

#删除和创建中的值要对应上，node节点的名称需要通过kubectl get node对应上

3.容忍度(Toleration)详解及命令

pod.spec.tolerations

Pod 将在一定程度上不会被调度到 Node 上。但我们可以在 Pod 上设置容忍 ( Toleration ) ，意思是设置了容忍的 Pod 将可以容忍污点的存在，可以被调度到存在污点的 Node 上。

接下来我们编写yaml文件，例如将nginx 添加容忍，并且使用硬策略只捆绑在k8s-01上 (这里使用硬策略和软策略或者不添加都是可以的。)

tolerations:                   #添加容忍策略
- key: "key1"                  #对应我们添加节点的变量名
  operator: "Equal"            #操作符
  value: "Value"               #容忍的值   key1=value对应
  effect: "NoSchedule"         #添加容忍的规则，这里必须和我们标记的五点规则相同
  tolerationSeconds: 3600
- key: "key1"
  operator: "Equal"
  value: "Value"
  effect: "NoExecute"
- key: "key2"
  operator: "Exists"
  effect: "NoSchedule"

其中 key, vaule, eﬀect 要与 Node 上设置的 taint 保持一致。
operator 的值为 Exists 将会忽略 value 值。
tolerationSeconds 用于描述当 Pod 需要被驱逐时可以在 Pod 上继续保留运行的时间，类似限期驱离。

1、当不指定 key 值时，表示容忍所有的污点 key:

tolerations:
- operator: "Exists"

2、当不指定 effect 值时，表示容忍所有的污点作用

tolerations:
- key: "key"
  operator: "Exists"

3、当有多个 Master 存在时，为防止资源浪费，可以如下设置

kubectl taint nodes <Node-Name> node-role.kubernetes.io/master=:PreferNoSchedule

apiVersion: v1 
kind: Pod 
metadata: 
  name: nginx 
  labels: env: test 
spec:
  containers: 
  - name: nginx 
    image: nginx 
    imagePullPolicy: IfNotPresent 
    tolerations: 
    - key: "example-key" 
      operator: "Exists" 
      effect: "NoSchedule"

上一篇：kubectl基本命令

下一篇：Headless Services无头服务

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯