一、概述

1).Kubernetes scheduler在整个系统中承担了“承上启下”的重要功能,“承上”是指它负责接收Controller Manager创新的新Pod,为其安排一个落脚的Node; "启下"是指安置工作完成后,目标Node上的kubelet服务进程接管后继工作。 2).Kubernetes scheduler的作用是将待调度的Pod,按照特定的调度算法和调度策略绑定到集群中的某个合适的Node上,并将绑定信息写入etcd中,在整个调度过程中涉及三个对象,分别是:待调度Pod列表,可用Node列表,以及调度算法和策略。 3).目标节点上的kubelet通过API Server监听到Kubernetes scheduler 产生的Pod绑定事件,然后获取对应的Pod清单,下载Image镜像,并启动容器。

二、调度流程

1).Predicate预先调度过程,即遍历所有目标Node,筛选出符合要求的候选节点。 2).Priority确定最优节点,在第1)步的基础上,采用优选策略计算出每个候选节点的积分,积分最高者胜出。 3). Selected 将pods与 最终nodes绑定

三、预选策略 所有配置的策略需要全部匹配,一票否决。常用的一些预选策略有:

  1. CheckNodeCondition:检查Node节点是否是正常
  2. GeneralPred:(包含了多个预选策略) Hostname:检查pod对象是否定义了pods.spec.hostname,如果"是"就检查Node的hostname是否匹配; PodFitsHostPorts:检查pod对象是否定义了;pods.spec.containers.ports.HostPort, 如果node端口被占用,就需要pass; MatchNodeSelector:检查pod是否定义了pods.spec.nodeSelector,且进行适配; PodFitsResources: 检查Pod资源需求是否能被节点所满足(kubectl describe nodes NODE_NAME); 3)NoDiskConflict:检查pod依赖的存储卷是否能满足需求,默认没有启用; 4)PodToleratesNodeTaints:检查Pod上的spec.pods.tolerations可容忍的污点是否完全包含节点上的污点,如果"是",则通过预选。后添加的污点不会驱离pod; 5)PodToleratesNodeNoExecuteTaints:检查pod上定义的污点是否能容忍节点上定义的拥有NoExecute属性的污点,后添加的污点会驱离pod; 6)CheckNodeLabelPresence:检查节点上指定的标签是否存在; 7)CheckServiceAffinity:根据Pod所属的service已有的其他pod对象,尽可能将此pod放在已有类似pod的node上去,默认没启用; 8)MaxEBSVolumeCount: Aws环境使用,默认开启; 9)MaxGCEPDVolumeCount:Google环境使用,默认开启; 10) MaxAzureDiskVolumeCount:Microsoft环境使用,默认的开启; 11) CheckVolumeBinding: 检查已绑定和未绑定的pvc是否能满足pod对象对存储卷的需求; 12) NoVolumeZoneConflict: 检查区域是否冲突; 13) CheckNodeMemoryPressure:检查节点内存是否存在压力过大的状态; 14) CheckNodePIDPressure:检查节点上PID是否存在压力过大的状态; 15) CheckNodeDiskPressure:检查节点上磁盘IO是否存在压力过大的状态; 16) MatchInterPodAffinity: 要开启才会有亲和性。

四、优选策略 所有匹配项优先级分数相加,得分最高的被选出。 1) LeastRequested: (cpu((capacity-sum(requested))*10/capacity) + memory((capacity-sum(requested))*10/capacity)/2 ; 2) BalancedResoureAllocation: CPU和内存资源占用率相近节点的胜出。结合上面的使用,比如cpu和内存得分都为2,两者相近,该函数是为了平衡节点资源的使用状况; 3) NodePreferAvoidPods:该函数的优先级较高,pod倾向不要运行在该节点 ;节点注解信息"scheduler.alpha.kubernetes.io/preferAvoidPods", 如果没有这个注解,得分是10,权重是10000 4) TaintToleration: 将Pod对象的spec.tolerations列表与污点进行检查,匹配条越多,得分越低。 5) SelectorSpreding:使pod分散:查找与当前对象匹配的service/replica set/stateful set/匹配的现存节点有哪些,已经有此pod的pod对象得分会变低; 6) InterPodAffinity:遍历pod对象的亲和性条目,并将能够匹配节点的权重相加,值越大得分越高 ; 7) NodeAffinity:基于节点亲和性匹配,根据NodeSelector来匹配; 8) MostRequested:同LeastRequest:越小的越先用,尽量空出别的节点;默认不启用; 9) NodeLabelPriority:只关注标签本身,存在就得分,不存在就没分;默认不启用; 10) ImageLocality:根据满足当前pod对象需求的已有镜像体积大小之和来计算;默认不启用;

五、高级调度机制 影响节点选择 ,机制如下: 1)、节点选择器 nodeName: 指定某一个特定节点的名称( 在pod.spec.nodeName给定Node名称); nodeSelector: 指定某一类特定节点的Labels( 指定pod.spec.nodeSelector匹配node的label );

2)、节点亲和调度 pods.selector.affinity.nodeAffinity 有两个值:

  • preferDuringSchedulingIngoredDuringExcution(尽量满足)
  • requiredDuringSchedulingIngoredDuringExcution(必须满足)

3)、Pod亲和调度 如果定义多pods尽量运行在同一node或同一rack上,如何获知是否在同一node上?允许 scheduler 将第一个pod随机调度到一个node上,再将其它pod调度到该node上,利用这种方法实现pod亲和度。判定是否在同一位置需要依据topologyKey的值,由于判断的维度较多所以topologyKey的值也不同。例如:topologyKey的值为 kubernetes.io/hostname 时表示判断依据为nodeName;topologyKey的值为 zone 时表示判断依据为自定义nodeLabel zone等等。pod亲和性也分为硬亲和、软亲和。 pods.selector.affinity.podAffinity 有两个值:

  • requiredDuringSchedulingIgnoredDuringExecution
  • preferredDuringSchedulingIgnoredDuringExecution

4)、pod非亲和性调度 podAntiAffinity (与nodeAffinity相似)

策略名称 匹配目标 支持的操作符 支持拓扑域 设计目标
nodeAffinity 主机标签 In,NotIn,Exists,DoesNotExist,Gt,Lt 不支持 决定Pod可以部署在哪些主机上
podAffinity Pod标签 In,NotIn,Exists,DoesNotExist 支持 决定Pod可以和哪些Pod部署在同一拓扑域
PodAntiAffinity Pod标签 In,NotIn,Exists,DoesNotExist 支持 决定Pod不可以和哪些Pod部署在同一拓扑域

5)、污点调度(用于节点上) Taints 是定义在nodes上的 key:value 属性数据 (node配置污点)。 Tolerations 是定义在pods上的 key:value 属性数据 (pod配置容忍度)。

taint的effect属性 定义node对pod的排斥等级,其值如下: NoSchedule: 仅影响调度,对现存的Pod对象不产生影响 NoExecute: 不仅影响调度,还对现存的Pod对象产生影响(不能容忍污点的pod对象将会被驱逐)

PreferNoSchedule PreferNoSchedule:pod不能容忍时 最好不调度在该节点上

标记(打)污点语法: kubectl taint node NODE_NAME key=value:NoSchedule 删除污点语法: kubectl taint nodes slave2 name-

六、示例 1) 节点选择器nodeSelector示例

[root@docker79 scheduler]# cat pod-demo.yaml
apiVersion: v1
kind: Pod
metadata:
  name: pod-demo
  namespace: default
  labels:
    app: myapp
    tier: frontend
  annotations:
    inspiry.com/author: "cluster admin"
spec:
  containers:
  - name: myapp
    image: ikubernetes/myapp:v1
  nodeSelector:
    disktype: ssd
[root@docker79 scheduler]# kubectl apply -f pod-demo.yaml
pod/pod-demo created
[root@docker79 scheduler]#
[root@docker79 scheduler]# kubectl get nodes --show-labels
NAME       STATUS    ROLES     AGE       VERSION   LABELS
docker77   Ready     <none>    15d       v1.11.2   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disktype=ssd,kubernetes.io/hostname=docker77
docker78   Ready     <none>    15d       v1.11.2   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=docker78
docker79   Ready     master    15d       v1.11.2   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=docker79,node-role.kubernetes.io/master=
[root@docker79 scheduler]# kubectl get pods -o wide
NAME       READY     STATUS    RESTARTS   AGE       IP           NODE       NOMINATED NODE
pod-demo   1/1       Running   0          1m        10.244.1.2   docker77   <none>
[root@docker79 scheduler]# kubectl delete -f pod-demo.yaml
pod "pod-demo" deleted
[root@docker79 scheduler]#

[root@docker79 scheduler]# vim pod-demo.yaml
[root@docker79 scheduler]# cat pod-demo.yaml
apiVersion: v1
kind: Pod
metadata:
  name: pod-demo
  namespace: default
  labels:
    app: myapp
    tier: frontend
  annotations:
    inspiry.com/author: "cluster admin"
spec:
  containers:
  - name: myapp
    image: ikubernetes/myapp:v1
  nodeSelector:
    disktype: harddisk
[root@docker79 scheduler]# kubectl apply -f pod-demo.yaml
pod/pod-demo created
[root@docker79 scheduler]# kubectl get pods
NAME       READY     STATUS    RESTARTS   AGE
pod-demo   0/1       Pending   0          6s
[root@docker79 scheduler]#

说明:pod状态一直处于Pending状态,由于nodeSelector是强约束类型,所以没有符合条件的node时一直处于Pending,无法Running。

[root@docker79 scheduler]# kubectl label nodes docker78 disktype=harddisk
node/docker78 labeled

说明:为docker78 标上label之后,docker78符合nodeSelector的需求,pod状态转变为Running

[root@docker79 scheduler]# kubectl get pods
NAME       READY     STATUS    RESTARTS   AGE
pod-demo   1/1       Running   0          1m
[root@docker79 scheduler]#
[root@docker79 scheduler]# kubectl delete -f pod-demo.yaml
pod "pod-demo" deleted
[root@docker79 scheduler]#

2) node亲和性调度Required示例

[root@docker79 scheduler]# vim pod-nodeaffinity-demo.yaml
[root@docker79 scheduler]# cat pod-nodeaffinity-demo.yaml
apiVersion: v1
kind: Pod
metadata:
  name: pod-node-affinity-demo
  namespace: default
  labels:
    app: myapp
    tier: frontend
  annotations:
    inspiry.com/author: "cluster admin"
spec:
  containers:
  - name: myapp
    image: ikubernetes/myapp:v1
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: zone
            operator: In
            values: ["foo","bar"]
[root@docker79 scheduler]# kubectl apply -f pod-nodeaffinity-demo.yaml
pod/pod-node-affinity-demo created
[root@docker79 scheduler]# kubectl get pods
NAME                     READY     STATUS    RESTARTS   AGE
pod-node-affinity-demo   0/1       Pending   0          8s
[root@docker79 scheduler]#

说明:由于requiredAffinity为硬亲和,必须要求符合条件的node才可以运行pod,所以如果没有符合条件的node ,pod状态一直处于pending状态

[root@docker79 scheduler]# kubectl delete -f pod-nodeaffinity-demo.yaml
pod "pod-node-affinity-demo" deleted
[root@docker79 scheduler]#

3)node亲和性调度Preferred示例

[root@docker79 scheduler]# vim pod-nodeaffinity-demo-2.yaml
[root@docker79 scheduler]# cat pod-nodeaffinity-demo-2.yaml
apiVersion: v1
kind: Pod
metadata:
  name: pod-node-affinity-demo
  namespace: default
  labels:
    app: myapp
    tier: frontend
  annotations:
    inspiry.com/author: "cluster admin"
spec:
  containers:
  - name: myapp
    image: ikubernetes/myapp:v1
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - preference:
          matchExpressions:
          - key: zone
            operator: In
            values: ["foo","bar"]
        weight: 60
[root@docker79 scheduler]# kubectl apply -f pod-nodeaffinity-demo-2.yaml
pod/pod-node-affinity-demo created
[root@docker79 scheduler]# kubectl get pods
NAME                     READY     STATUS    RESTARTS   AGE
pod-node-affinity-demo   1/1       Running   0          5s
[root@docker79 scheduler]#

说明:由于preferred为软亲和,尽量要求pod运行在符合条件的node上,当没有符合条件的node时,也可以运行在其它node上,所以pod状态处于Running状态。

[root@docker79 scheduler]# kubectl delete -f pod-nodeaffinity-demo-2.yaml
pod "pod-node-affinity-demo" deleted
[root@docker79 scheduler]#

4) Pod亲和性required(位置依据hostName)示例

[root@docker79 scheduler]# vim pod-required-affinity-demo.yaml
[root@docker79 scheduler]# cat pod-required-affinity-demo.yaml
apiVersion: v1
kind: Pod
metadata:
  name: pod-first
  labels:
    app: myapp
    tier: frontend
spec:
  containers:
  - name: myapp
    image: ikubernetes/myapp:v1
---
apiVersion: v1
kind: Pod
metadata:
  name: pod-second
  labels:
    app: db
    tier: db
spec:
  containers:
  - name: busybox
    image: busybox:latest
    imagePullPolicy: IfNotPresent
    command: ["sh","-c","sleep 3600"]
  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - topologyKey: kubernetes.io/hostname
        labelSelector:
          matchExpressions:
          - {key: app, operator: In, values: ["myapp"]}
[root@docker79 scheduler]# kubectl apply -f pod-required-affinity-demo.yaml
pod/pod-first unchanged
pod/pod-second created
[root@docker79 scheduler]# kubectl get pods -o wide
NAME         READY     STATUS    RESTARTS   AGE       IP           NODE       NOMINATED NODE
pod-first    1/1       Running   0          4m        10.244.2.7   docker78   <none>
pod-second   1/1       Running   0          22s       10.244.2.8   docker78   <none>
[root@docker79 scheduler]# kubectl delete -f pod-required-affinity-demo.yaml
pod "pod-first" deleted
pod "pod-second" deleted
[root@docker79 scheduler]#

说明:两个pod由于affinity的原因运行在同一node上(第二个pod选择 app标签值为myapp的pod所运行的node上运行,判定node时的依据是hostName)。

5) Pod反亲和性required(位置依据hostName)示例

[root@docker79 scheduler]# cp pod-required-affinity-demo.yaml pod-required-antiaffinity-demo.yaml
[root@docker79 scheduler]# vim pod-required-antiaffinity-demo.yaml
[root@docker79 scheduler]# cat pod-required-antiaffinity-demo.yaml
apiVersion: v1
kind: Pod
metadata:
  name: pod-first
  labels:
    app: myapp
    tier: frontend
spec:
  containers:
  - name: myapp
    image: ikubernetes/myapp:v1
---
apiVersion: v1
kind: Pod
metadata:
  name: pod-second
  labels:
    app: db
    tier: db
spec:
  containers:
  - name: busybox
    image: busybox:latest
    imagePullPolicy: IfNotPresent
    command: ["sh","-c","sleep 3600"]
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - topologyKey: kubernetes.io/hostname
        labelSelector:
          matchExpressions:
          - {key: app, operator: In, values: ["myapp"]}
[root@docker79 scheduler]# kubectl apply -f pod-required-antiaffinity-demo.yaml
pod/pod-first created
pod/pod-second created
[root@docker79 scheduler]# kubectl get pods -o wide
NAME         READY     STATUS    RESTARTS   AGE       IP            NODE       NOMINATED NODE
pod-first    1/1       Running   0          9s        10.244.2.11   docker78   <none>
pod-second   1/1       Running   0          9s        10.244.1.3    docker77   <none>
[root@docker79 scheduler]#

说明:由于antiaffinity的原故,所以两个pod不能运行在同一node上。

[root@docker79 scheduler]# kubectl delete -f pod-required-antiaffinity-demo.yaml
pod "pod-first" deleted
pod "pod-second" deleted
[root@docker79 scheduler]#

6) Pod反亲和性required(位置依据nodeLabel)示例

[root@docker79 ~]# kubectl get nodes --show-labels
NAME       STATUS    ROLES     AGE       VERSION   LABELS
docker77   Ready     <none>    16d       v1.11.2   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disktype=ssd,kubernetes.io/hostname=docker77
docker78   Ready     <none>    16d       v1.11.2   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disktype=harddisk,kubernetes.io/hostname=docker78
docker79   Ready     master    16d       v1.11.2   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=docker79,node-role.kubernetes.io/master=docker79
[root@docker79 ~]# kubectl label nodes docker77 zone=foo
node/docker77 labeled
[root@docker79 ~]# kubectl label nodes docker78 zone=foo
node/docker78 labeled
[root@docker79 ~]#
[root@docker79 scheduler]# vim pod-required-antiaffinity-demo.yaml
[root@docker79 scheduler]# cat pod-required-antiaffinity-demo.yaml
apiVersion: v1
kind: Pod
metadata:
  name: pod-first
  labels:
    app: myapp
    tier: frontend
spec:
  containers:
  - name: myapp
    image: ikubernetes/myapp:v1
---
apiVersion: v1
kind: Pod
metadata:
  name: pod-second
  labels:
    app: db
    tier: db
spec:
  containers:
  - name: busybox
    image: busybox:latest
    imagePullPolicy: IfNotPresent
    command: ["sh","-c","sleep 3600"]
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - topologyKey: zone
        labelSelector:
          matchExpressions:
          - {key: app, operator: In, values: ["myapp"]}
[root@docker79 scheduler]#
[root@docker79 scheduler]# kubectl apply -f pod-required-antiaffinity-demo.yaml
pod/pod-first created
pod/pod-second created
[root@docker79 scheduler]# kubectl get pods
NAME         READY     STATUS    RESTARTS   AGE
pod-first    1/1       Running   0          8s
pod-second   0/1       Pending   0          8s

说明:因为docker77、docker78节点上都添加了label zone=foo ,而且使用podAntiAffinity反亲和,所以导致其中一个pod Running状态,另一个pending状态。

[root@docker79 scheduler]# kubectl delete -f pod-required-antiaffinity-demo.yaml
pod "pod-first" deleted
pod "pod-second" deleted
[root@docker79 scheduler]# 

7) 污点调度--为node 添加 taints 示例

[root@docker79 scheduler]# kubectl taint node docker78 node-type=production:NoSchedule
node/docker78 tainted
[root@docker79 scheduler]# kubectl describe node docker78

说明:上述动作为docker78添加taints,且effect的值为NoSchedule。 定义Deployment,如下所示:

[root@docker79 scheduler]# cp ../deploy-demo.yaml ./
[root@docker79 scheduler]# vim deploy-demo.yaml
[root@docker79 scheduler]# cat deploy-demo.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-deploy
  namespace: default
spec:
  replicas: 2
  selector:
    matchLabels:
      app: myapp
      release: canary
  template:
    metadata:
      labels:
        app: myapp
        release: canary
    spec:
      containers:
      - name: myapp
        image: ikubernetes/myapp:v1
        ports:
        - name: http
          containerPort: 80
[root@docker79 scheduler]#
[root@docker79 scheduler]# kubectl apply -f deploy-demo.yaml
deployment.apps/myapp-deploy created
[root@docker79 scheduler]# kubectl get pods -o wide
NAME                            READY     STATUS    RESTARTS   AGE       IP           NODE       NOMINATED NODE
myapp-deploy-69b47bc96d-2phfn   1/1       Running   0          8s        10.244.1.5   docker77   <none>
myapp-deploy-69b47bc96d-msfwq   1/1       Running   0          8s        10.244.1.4   docker77   <none>
[root@docker79 scheduler]# 

说明:上例创建普通的deployment ,由于pod没有定义tolerations,pod无法容忍带有taints的node,所以pods将运行在docker77上。 为docker77添加taint ,且effect的值为NoExecute,如下所示:

[root@docker79 scheduler]# kubectl taint node docker77 node-type=dev:NoExecute
node/docker77 tainted
[root@docker79 scheduler]# kubectl get pods -o wide
NAME                            READY     STATUS        RESTARTS   AGE       IP           NODE       NOMINATED NODE
myapp-deploy-69b47bc96d-2phfn   0/1       Terminating   0          1m        10.244.1.5   docker77   <none>
myapp-deploy-69b47bc96d-6sv54   0/1       Pending       0          5s        <none>       <none>     <none>
myapp-deploy-69b47bc96d-z7qqh   0/1       Pending       0          5s        <none>       <none>     <none>
[root@docker79 scheduler]# 

说明:当docker77添加了taints之后,且effect定义为NoExecute,所以原有的pods将被驱逐,pods无法匹配到任何nodes ,所以处于pending状态。

8) 污点调度--为pod 配置tolerations 示例

[root@docker79 scheduler]# vim deploy-demo.yaml
[root@docker79 scheduler]# cat deploy-demo.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-deploy
  namespace: default
spec:
  replicas: 2
  selector:
    matchLabels:
      app: myapp
      release: canary
  template:
    metadata:
      labels:
        app: myapp
        release: canary
    spec:
      containers:
      - name: myapp
        image: ikubernetes/myapp:v1
        ports:
        - name: http
          containerPort: 80
      tolerations:
      - key: "node-type"
        operator: "Equal"
        value: "production"
        effect: "NoSchedule"
[root@docker79 scheduler]# kubectl apply -f deploy-demo.yaml
deployment.apps/myapp-deploy configured
[root@docker79 scheduler]# kubectl get pods -o wide
NAME                           READY     STATUS    RESTARTS   AGE       IP            NODE       NOMINATED NODE
myapp-deploy-98fddd79f-grf77   1/1       Running   0          10s       10.244.2.16   docker78   <none>
myapp-deploy-98fddd79f-xgtj6   1/1       Running   0          12s       10.244.2.15   docker78   <none>
[root@docker79 scheduler]#

说明:pod定义了tolerations,且node-type属性的值为production,effect的值为 NoSchedule ,所以只有docker78匹配成功。 如果仅定义"node-type"属性,不定义其值,如下所示:

[root@docker79 scheduler]# vim deploy-demo.yaml
[root@docker79 scheduler]# cat deploy-demo.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
 name: myapp-deploy
 namespace: default
spec:
 replicas: 2
 selector:
   matchLabels:
     app: myapp
     release: canary
 template:
   metadata:
     labels:
       app: myapp
       release: canary
   spec:
     containers:
     - name: myapp
       image: ikubernetes/myapp:v1
       ports:
       - name: http
         containerPort: 80
     tolerations:
     - key: "node-type"
       operator: "Exists"
       value: ""
       effect: "NoSchedule"
[root@docker79 scheduler]#
[root@docker79 scheduler]# kubectl apply -f deploy-demo.yaml
deployment.apps/myapp-deploy configured
[root@docker79 scheduler]# kubectl get pods -o wide
NAME                            READY     STATUS        RESTARTS   AGE       IP            NODE       NOMINATED NODE
myapp-deploy-7dd988dc9d-dg5zr   1/1       Running       0          3s        10.244.2.17   docker78   <none>
myapp-deploy-7dd988dc9d-pvw6r   1/1       Running       0          2s        10.244.2.18   docker78   <none>
myapp-deploy-98fddd79f-grf77    0/1       Terminating   0          4m        10.244.2.16   docker78   <none>
myapp-deploy-98fddd79f-xgtj6    1/1       Terminating   0          4m        10.244.2.15   docker78   <none>
[root@docker79 scheduler]#

说明:只要"node-type"属性存在,且effect定义为 NoSchedule的节点匹配,所以仅有docker78匹配,docker77不匹配 。(上面效果为 pods被删除然后重建的过程) 如果在定义Deployment时只定义"node-type" 属性,不定义其值 ,如下所示:

[root@docker79 scheduler]# vim deploy-demo.yaml
[root@docker79 scheduler]# cat deploy-demo.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-deploy
  namespace: default
spec:
  replicas: 2
  selector:
    matchLabels:
      app: myapp
      release: canary
  template:
    metadata:
      labels:
        app: myapp
        release: canary
    spec:
      containers:
      - name: myapp
        image: ikubernetes/myapp:v1
        ports:
        - name: http
          containerPort: 80
      tolerations:
      - key: "node-type"
        operator: "Exists"
        value: ""
        effect: ""
[root@docker79 scheduler]# kubectl apply -f deploy-demo.yaml
deployment.apps/myapp-deploy configured
[root@docker79 scheduler]# kubectl get pods -o wide
NAME                            READY     STATUS              RESTARTS   AGE       IP            NODE       NOMINATED NODE
myapp-deploy-7dd988dc9d-dg5zr   1/1       Running             0          2m        10.244.2.17   docker78   <none>
myapp-deploy-7dd988dc9d-pvw6r   0/1       Terminating         0          2m        10.244.2.18   docker78   <none>
myapp-deploy-f9f87c46d-9skm9    0/1       ContainerCreating   0          1s        <none>        docker77   <none>
myapp-deploy-f9f87c46d-qr4d9    1/1       Running             0          3s        10.244.2.19   docker78   <none>
[root@docker79 scheduler]#

说明:只要"node-type" 属性存在即可匹配,所以docker77、docker78全都匹配。 如果把上例effect的值 定义为NoExecute,如下所示:

[root@docker79 scheduler]# vim deploy-demo.yaml
[root@docker79 scheduler]# cat deploy-demo.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-deploy
  namespace: default
spec:
  replicas: 2
  selector:
    matchLabels:
      app: myapp
      release: canary
  template:
    metadata:
      labels:
        app: myapp
        release: canary
    spec:
      containers:
      - name: myapp
        image: ikubernetes/myapp:v1
        ports:
        - name: http
          containerPort: 80
      tolerations:
      - key: "node-type"
        operator: "Exists"
        value: ""
        effect: "NoExecute"
[root@docker79 scheduler]#
[root@docker79 scheduler]# kubectl apply -f deploy-demo.yaml
deployment.apps/myapp-deploy configured
[root@docker79 scheduler]# kubectl get pods -o wide
NAME                            READY     STATUS        RESTARTS   AGE       IP            NODE       NOMINATED NODE
myapp-deploy-765984bf98-hwdkv   1/1       Running       0          4s        10.244.1.7    docker77   <none>
myapp-deploy-765984bf98-mfr4j   1/1       Running       0          2s        10.244.1.8    docker77   <none>
myapp-deploy-f9f87c46d-9skm9    0/1       Terminating   0          1m        10.244.1.6    docker77   <none>
myapp-deploy-f9f87c46d-qr4d9    0/1       Terminating   0          1m        10.244.2.19   docker78   <none>
[root@docker79 scheduler]#

说明:将effect的值定义为NoExecute ,所以只有docker77匹配成功。