K8s的高级调度方式-亲和度和污点

 

1 默认的scheduler的调度过程:

  1. 预选策略:从所有节点当中选择基本符合选择条件的节点。
  2. 优选函数:在众多符合基本条件的节点中使用优选函数,计算节点各自的得分,通过比较进行排序。
  3. 从最高得分的节点中随机选择出一个作为Pod运行的节点。

可以通过自己的预设来影响预选、优选过程,从而实现符合我们期望的调度结果。

2 影响调度方式:

  • 节点选择器:NodeSelector,甚至可以设置nodename来选择节点本身。
  • 亲和性调度:NodeAffinity(节点亲和性)、podAffinity(Pod亲和性)、PodAntiAffinity(Pod的反亲和性)
  • 污点和容忍度:Taint、toleration

 

 

3  节点选择器:NodeSelector

如果我们期望把Pod调度到某一个特定的节点上,可以通过设定Pod.spec.nodeName给定node名称实现。我们可以给一部分node打上特有标签,在pod.spec.nodeSelector中匹配这些标签。可以极大的缩小预选范围。
    给node添加标签:
   kubectl label nodes NODE_NAME key1=value1...keyN=valueN

如:在node01上打上标签为app=frontend,而在pod上设置NodeSelector为这个标签,则此Pod只能运行在存在此标签的节点上。
      若没有node存在此标签,则Pod无法被调度,即为Pending状态。

 

我们先给一个node打上标签

[root@k8s-master ~]# kubectl get  nodes --show-labels
NAME         STATUS   ROLES    AGE   VERSION   LABELS
k8s-master   Ready    master   12d   v1.15.3   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-master,kubernetes.io/os=linux,node-role.kubernetes.io/master=
k8s-node-1   Ready    node     12d   v1.15.3   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node-1,kubernetes.io/os=linux,node-role.kubernetes.io/node=
k8s-node-2   Ready    node     12d   v1.15.3   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node-2,kubernetes.io/os=linux,node-role.kubernetes.io/node=
[root@k8s-master ~]#
[root@k8s-master ~]#
[root@k8s-master ~]# kubectl label nodes k8s-node-1 disk=ssd
node/k8s-node-1 labeled
[root@k8s-master ~]# kubectl get  nodes --show-labels
NAME         STATUS   ROLES    AGE   VERSION   LABELS
k8s-master   Ready    master   12d   v1.15.3   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-master,kubernetes.io/os=linux,node-role.kubernetes.io/master=
k8s-node-1   Ready    node     12d   v1.15.3   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disk=ssd,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node-1,kubernetes.io/os=linux,node-role.kubernetes.io/node=
k8s-node-2   Ready    node     12d   v1.15.3   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node-2,kubernetes.io/os=linux,node-role.kubernetes.io/node=
[root@k8s-master ~]# kubectl get  nodes --show-labels|grep ssd
k8s-node-1   Ready    node     12d   v1.15.3   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disk=ssd,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node-1,kubernetes.io/os=linux,node-role.kubernetes.io/node=
[root@k8s-master ~]#

 

 

# cat nodeSelector.yaml
apiVersion: v1
kind: Pod
metadata:
  name: nginx-pod
  labels:
    app: my-pod

spec:
  containers:
  - name: my-pod
    image: nginx
    ports:
    - name: http
      containerPort: 80
  nodeSelector:
    disk: ssd  #如果nodeSelector中指定的标签节点都没有,该pod就会处于Pending状态(预选失败)

 

[root@k8s-master schedule]# kubectl create -f nodeSelector.yaml
pod/nginx-pod created
[root@k8s-master schedule]# kubectl get pod
NAME        READY   STATUS    RESTARTS   AGE
nginx-pod   1/1     Running   0          6s
[root@k8s-master schedule]# kubectl describe pod nginx-pod | grep Node
Node:         k8s-node-1/10.6.76.23
Node-Selectors:  disk=ssd
[root@k8s-master schedule]#

 

 

 

 

4 节点亲和度调度nodeAffinity

requiredDuringSchedulingIgnoredDuringExecution  硬亲和性 必须满足亲和性。
preferredDuringSchedulingIgnoredDuringExecution 软亲和性 能满足最好,不满足也没关系。

4.1  硬亲和性

matchExpressions : 匹配表达式,这个标签可以指定一段,例如pod中定义的key为zone,operator为In(包含那些),values为 foo和bar。就是在node节点中包含foo和bar的标签中调度
matchFields : 匹配字段 和上面的意思 不过他可以不定义标签值,可以定义

 

 

选择在 node 有 zone 标签值为 foo 或 bbb 值的节点上运行 pod

 

[root@k8s-master ~]# kubectl get  nodes --show-labels| grep zone
k8s-node-1   Ready    node     46d   v1.15.3   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disk=ssd,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node-1,kubernetes.io/os=linux,node-role.kubernetes.io/node=,zone=foo
[root@k8s-master ~]#

 

# cat node-affinity-1.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-hello-deployment
  namespace:
  labels:
    app: nginx-hello
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx-hello
  template:
    metadata:    
      labels:
        app: nginx-hello
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: zone
                operator: In
                values:
                - foo
                - bbb
      containers:
      - name: nginx-hello
        image: nginx
        ports:
        - containerPort: 80

 

 

[root@k8s-master schedule]# kubectl get  pods -o wide
NAME                                     READY   STATUS    RESTARTS   AGE     IP             NODE         NOMINATED NODE   READINESSGATES
nginx-hello-deployment-d457bd7bc-fsjjn   1/1     Running   0          2m34s   10.254.1.124   k8s-node-1   <none>           <none>
nginx-hello-deployment-d457bd7bc-ntb8h   1/1     Running   0          2m34s   10.254.1.123   k8s-node-1   <none>           <none>
nginx-pod                                1/1     Running   0          58m     10.254.1.120   k8s-node-1   <none>           <none>
[root@k8s-master schedule]#

 

 

 

我们发现都按标签 分配到node1 上面了,我们把标签改一下,让pod匹配不上

# cat node-affinity-1.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-hello-deployment
  namespace:
  labels:
    app: nginx-hello
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx-hello
  template:
    metadata:
      labels:
        app: nginx-hello
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: zone
                operator: In
                values:
                - foo-no
                - bbb-no
      containers:
      - name: nginx-hello
        image: nginx
        ports:
        - containerPort: 80

 

 

#查看(没有zone这个标签value值匹配不上,所以会Pending

[root@k8s-master schedule]# kubectl get  pods -o wide
NAME                                      READY   STATUS    RESTARTS   AGE   IP             NODE         NOMINATED NODE   READINESS GATES
nginx-hello-deployment-6c96b5675f-8jqnx   0/1     Pending   0          43s   <none>         <none>       <none>           <none>
nginx-hello-deployment-6c96b5675f-lbnsw   0/1     Pending   0          43s   <none>         <none>       <none>           <none>
nginx-pod                                 1/1     Running   0          60m   10.254.1.120   k8s-node-1   <none>           <none>
[root@k8s-master schedule]#

 

 

4.2 软亲和

nodeAffinity的preferredDuringSchedulingIgnoredDuringExecution (软亲和,选择条件匹配多的,就算都不满足条件,还是会生成pod)

 

# cat node-affinity-1.yaml
apiVersion: apps/v1kind: Deployment
metadata:  name: nginx-hello-deployment
  namespace:  labels:
    app: nginx-hellospec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx-hello
  template:
    metadata:
      labels:
        app: nginx-hello
    spec:
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - preference:
              matchExpressions:
              - key: zone
                operator: In
                values:
                - foo-no
                - bbb-no
            weight: 60 #匹配相应nodeSelectorTerm相关联的权重,1-100
      containers:
      - name: nginx-hello
        image: nginx
        ports:
        - containerPort: 80

 

[root@k8s-master schedule]# kubectl get  pods -o wide
NAME                                     READY   STATUS    RESTARTS   AGE   IP             NODE         NOMINATED NODE   READINESS GATES
nginx-hello-deployment-98654dc57-cvvlb   1/1     Running   0          15s   10.254.1.125   k8s-node-1   <none>           <none>
nginx-hello-deployment-98654dc57-mglbx   1/1     Running   0          20s   10.254.2.90    k8s-node-2   <none>           <none>
nginx-pod                                1/1     Running   0          72m   10.254.1.120   k8s-node-1   <none>           <none>
[root@k8s-master schedule]#

 

 

 

 

5 pod亲和度podAffinity

Pod亲和性场景,我们的k8s集群的节点分布在不同的区域或者不同的机房,当服务A和服务B要求部署在同一个区域或者同一机房的时候,我们就需要亲和性调度了。

labelSelector : 选择跟那组Pod亲和
namespaces : 选择哪个命名空间
topologyKey : 指定节点上的哪个键

 

 

5.1  按labelSelector标签亲和

让两个POD标签处于一处

# cat pod-affinity.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.15
        ports:
        - containerPort: 80
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment-pod-affinity
  namespace:
  labels:
    app: nginx-hello
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx-hello
  template:
    metadata:
      labels:
        app: nginx-hello
    spec:
      affinity:
        podAffinity:
          #preferredDuringSchedulingIgnoredDuringExecution:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app #标签键名,上面pod定义
                operator: In  #In表示在
                values:
                - nginx #app标签的值
            topologyKey: kubernetes.io/hostname #kubernetes.io/hostname的值一样代表pod处于同一位置     #此pod应位于同一位置(亲和力)或不位于同一位置(反亲和力),与pods匹配指定名称空间中的labelSelector,其中co-located定义为在标签值为的节点上运行,key topologyKey匹配任何选定pod的任何节点在跑
      containers:
      - name: nginx-hello
        image: nginx
        ports:
        - containerPort: 80

 

 

 

[root@k8s-master ~]# kubectl get  pod -o wide| grep nginx
nginx-deployment-6f6d9b887f-5mvqs                1/1     Running   0          6s    10.254.2.92    k8s-node-2   <none>    <none>
nginx-deployment-pod-affinity-5566c6d4fd-2tnrq   1/1     Running   0          6s    10.254.2.93    k8s-node-2   <none>    <none>
[root@k8s-master ~]#

 

 

 

5.2  podAntiAffinity反亲和

让pod和某个pod不处于同一node,和上面相反)

 

# cat pod-affinity.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.15
        ports:
        - containerPort: 80
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment-pod-affinity
  namespace:
  labels:
    app: nginx-hello
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx-hello
  template:
    metadata:
      labels:
        app: nginx-hello
    spec:
      affinity:
        #podAffinity:
        podAntiAffinity:  #就改了这里
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app #标签键名,上面pod定义
                operator: In  #In表示在
                values:
                - nginx #app1标签的值
            topologyKey: kubernetes.io/hostname #kubernetes.io/hostname的值一样代表pod处于同一位置     #此pod应位于同一位置(亲和力)或不位于同一位置(反亲和力),与pods匹配指定名称空间中的labelSelector,其中co-located定义为在标签值为的节点上运行,key topologyKey匹配任何选定pod的任何节点在跑
      containers:
      - name: nginx-hello
        image: nginx
        ports:
        - containerPort: 80
[root@k8s-master ~]# kubectl  apply -f  a.yaml
deployment.extensions/nginx-deployment unchanged
deployment.apps/nginx-deployment-pod-affinity configured
[root@k8s-master ~]# kubectl get  pod -o wide| grep nginx
nginx-deployment-6f6d9b887f-5mvqs                1/1     Running             0          68s   10.254.2.92    k8s-node-2   <none>           <none>
nginx-deployment-pod-affinity-5566c6d4fd-2tnrq   1/1     Running             0          68s   10.254.2.93    k8s-node-2   <none>           <none>
nginx-deployment-pod-affinity-86bdf6996b-fdb8f   0/1     ContainerCreating   0          4s    <none>         k8s-node-1   <none>           <none>
[root@k8s-master ~]#
[root@k8s-master ~]#
[root@k8s-master ~]# kubectl get  pod -o wide| grep nginx
nginx-deployment-6f6d9b887f-5mvqs                1/1     Running   0          73s   10.254.2.92    k8s-node-2   <none>    <none>
nginx-deployment-pod-affinity-86bdf6996b-fdb8f   1/1     Running   0          9s    10.254.1.56    k8s-node-1   <none>    <none>
[root@k8s-master ~]#

 

 

 

6 污点调度


 

taints and tolerations 允许将某个节点做标记,以使得所有的pod都不会被调度到该节点上。但是如果某个pod明确制定了 tolerates 则可以正常调度到被标记的节点上。

# 可以使用命令行为 Node 节点添加 Taints:

kubectl taint nodes node1 key=value:NoSchedule

 

operator可以定义为:
Equal:表示key是否等于value,默认
Exists:表示key是否存在,此时无需定义value

 

tain 的 effect 定义对 Pod 排斥效果:
NoSchedule:仅影响调度过程,对现存的Pod对象不产生影响;
NoExecute:既影响调度过程,也影响显著的Pod对象;不容忍的Pod对象将被驱逐
PreferNoSchedule: 表示尽量不调度

 

 

#查看污点

[root@k8s-master schedule]# kubectl  describe  node  k8s-master |grep Taints
Taints:             node-role.kubernetes.io/master:PreferNoSchedule
[root@k8s-master schedule]#

 

 

#给node1打上污点

#kubectl taint node k8s-node-1 node-type=production:NoSchedule
[root@k8s-master schedule]# kubectl  describe  node  k8s-node-1 |grep Taints
Taints:             node-type=production:NoSchedule
[root@k8s-master schedule]#

 

 

# cat deploy.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  replicas: 2
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.15
        ports:
        - containerPort: 80

 

 

[root@k8s-master schedule]# kubectl apply -f deploy.yaml
deployment.extensions/nginx-deployment unchanged

 

 

#pod都运行在node-2上

[root@k8s-master schedule]# kubectl get pods  -o wide
NAME                                READY   STATUS    RESTARTS   AGE   IP            NODE         NOMINATED NODE   READINESS GATES
nginx-deployment-6f6d9b887f-j5nmz   1/1     Running   0          83s   10.254.2.94   k8s-node-2   <none>           <none>
nginx-deployment-6f6d9b887f-wjfpp   1/1     Running   0          83s   10.254.2.93   k8s-node-2   <none>           <none>
[root@k8s-master schedule]#

 

 

 

#给node2打上污点

[root@k8s-master schedule]# kubectl delete deployments nginx-deployment
deployment.extensions "nginx-deployment" deleted
[root@k8s-master schedule]# kubectl get pods  -o wide
No resources found.
[root@k8s-master schedule]# kubectl taint node k8s-node-2 node-type=production:NoSchedule
node/k8s-node-2 tainted
[root@k8s-master schedule]# kubectl  describe  node  k8s-node-2 |grep Taints
Taints:             node-type=production:NoSchedule
[root@k8s-master schedule]#
[root@k8s-master schedule]# kubectl apply -f deploy.yaml
deployment.extensions/nginx-deployment created

 

#结果pod都运行在master上了

[root@k8s-master schedule]# kubectl get pods  -o wide
NAME                                READY   STATUS    RESTARTS   AGE   IP            NODE         NOMINATED NODE   READINESS GATES
nginx-deployment-6f6d9b887f-ck6pd   1/1     Running   0          15s   10.254.0.48   k8s-master   <none>           <none>
nginx-deployment-6f6d9b887f-gdwm6   1/1     Running   0          15s   10.254.0.49   k8s-master   <none>           <none>
[root@k8s-master schedule]#

 

 

#master也打上污点

[root@k8s-master schedule]# kubectl taint node k8s-master node-type=production:NoSchedule
node/k8s-master tainted
[root@k8s-master schedule]# kubectl delete deployments nginx-deployment
deployment.extensions "nginx-deployment" deleted
[root@k8s-master schedule]# kubectl apply -f deploy.yaml
deployment.extensions/nginx-deployment created
[root@k8s-master schedule]#

 

 

#没有节点可以启动pod

[root@k8s-master schedule]# kubectl get pods  -o wide
NAME                                READY   STATUS    RESTARTS   AGE   IP       NODE     NOMINATED NODE   READINESS GATES
nginx-deployment-6f6d9b887f-mld4v   0/1     Pending   0          5s    <none>   <none>   <none>           <none>
nginx-deployment-6f6d9b887f-q4nfj   0/1     Pending   0          5s    <none>   <none>   <none>           <none>
[root@k8s-master schedule]#

 

 

#不能容忍污点

[root@k8s-master schedule]# kubectl describe pod nginx-deployment-6f6d9b887f-mld4v |tail -1
  Warning  FailedScheduling  51s (x6 over 3m29s)  default-scheduler  0/3 nodes are available: 3 node(s) had taints that the pod didn't tolerate.

 

 

#定义Toleration(容忍)

# cat deploy.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  replicas: 2
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.15
        ports:
        - containerPort: 80
      tolerations:
      - key: "node-type"  # #之前定义的污点名
        operator: "Equal" #Exists,如果node-type污点在,就能容忍,Equal精确
        value: "production"  #污点值
        effect: "NoSchedule"  #效果

 

[root@k8s-master schedule]# kubectl apply -f deploy.yaml
deployment.extensions/nginx-deployment unchanged

 

 

 

#两个pod均衡调度到两个node

[root@k8s-master schedule]# kubectl get pods  -o wide
NAME                                READY   STATUS    RESTARTS   AGE   IP             NODE         NOMINATED NODE   READINESS GATES
nginx-deployment-565dd6b94d-4cdhz   1/1     Running   0          32s   10.254.1.130   k8s-node-1   <none>           <none>
nginx-deployment-565dd6b94d-fqzm7   1/1     Running   0          32s   10.254.2.95    k8s-node-2   <none>           <none>
[root@k8s-master schedule]#

 

 

#删除污点

[root@k8s-master schedule]# kubectl  describe  nodes k8s-master |grep Taints
Taints:             node-role.kubernetes.io/master:PreferNoSchedule
[root@k8s-master schedule]# kubectl  describe  nodes k8s-node-1 |grep Taints
Taints:             node-type=production:NoSchedule
[root@k8s-master schedule]# kubectl taint node k8s-node-1 node-type-
node/k8s-node-1 untainted
[root@k8s-master schedule]# kubectl  describe  nodes k8s-node-1 |grep Taints
Taints:             <none>
[root@k8s-master schedule]#