K8s的高级调度方式-亲和度和污点
1 默认的scheduler的调度过程:
- 预选策略:从所有节点当中选择基本符合选择条件的节点。
- 优选函数:在众多符合基本条件的节点中使用优选函数,计算节点各自的得分,通过比较进行排序。
- 从最高得分的节点中随机选择出一个作为Pod运行的节点。
可以通过自己的预设来影响预选、优选过程,从而实现符合我们期望的调度结果。
2 影响调度方式:
- 节点选择器:NodeSelector,甚至可以设置nodename来选择节点本身。
- 亲和性调度:NodeAffinity(节点亲和性)、podAffinity(Pod亲和性)、PodAntiAffinity(Pod的反亲和性)
- 污点和容忍度:Taint、toleration
3 节点选择器:NodeSelector
如果我们期望把Pod调度到某一个特定的节点上,可以通过设定Pod.spec.nodeName给定node名称实现。我们可以给一部分node打上特有标签,在pod.spec.nodeSelector中匹配这些标签。可以极大的缩小预选范围。
给node添加标签:
kubectl label
nodes NODE_NAME key1=value1...keyN=valueN
如:在node01上打上标签为app=frontend,而在pod上设置NodeSelector为这个标签,则此Pod只能运行在存在此标签的节点上。
若没有node存在此标签,则Pod无法被调度,即为Pending状态。
我们先给一个node打上标签
[root@k8s-master ~]# kubectl get nodes --show-labels
NAME STATUS ROLES AGE VERSION LABELS
k8s-master Ready master 12d v1.15.3 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-master,kubernetes.io/os=linux,node-role.kubernetes.io/master=
k8s-node-1 Ready node 12d v1.15.3 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node-1,kubernetes.io/os=linux,node-role.kubernetes.io/node=
k8s-node-2 Ready node 12d v1.15.3 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node-2,kubernetes.io/os=linux,node-role.kubernetes.io/node=
[root@k8s-master ~]#
[root@k8s-master ~]#
[root@k8s-master ~]# kubectl label nodes k8s-node-1 disk=ssd
node/k8s-node-1 labeled
[root@k8s-master ~]# kubectl get nodes --show-labels
NAME STATUS ROLES AGE VERSION LABELS
k8s-master Ready master 12d v1.15.3 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-master,kubernetes.io/os=linux,node-role.kubernetes.io/master=
k8s-node-1 Ready node 12d v1.15.3 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disk=ssd,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node-1,kubernetes.io/os=linux,node-role.kubernetes.io/node=
k8s-node-2 Ready node 12d v1.15.3 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node-2,kubernetes.io/os=linux,node-role.kubernetes.io/node=
[root@k8s-master ~]# kubectl get nodes --show-labels|grep ssd
k8s-node-1 Ready node 12d v1.15.3 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disk=ssd,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node-1,kubernetes.io/os=linux,node-role.kubernetes.io/node=
[root@k8s-master ~]#
# cat nodeSelector.yaml
apiVersion: v1
kind: Pod
metadata:
name: nginx-pod
labels:
app: my-pod
spec:
containers:
- name: my-pod
image: nginx
ports:
- name: http
containerPort: 80
nodeSelector:
disk: ssd #如果nodeSelector中指定的标签节点都没有,该pod就会处于Pending状态(预选失败)
[root@k8s-master schedule]# kubectl create -f nodeSelector.yaml
pod/nginx-pod created
[root@k8s-master schedule]# kubectl get pod
NAME READY STATUS RESTARTS AGE
nginx-pod 1/1 Running 0 6s
[root@k8s-master schedule]# kubectl describe pod nginx-pod | grep Node
Node: k8s-node-1/10.6.76.23
Node-Selectors: disk=ssd
[root@k8s-master schedule]#
4 节点亲和度调度nodeAffinity
requiredDuringSchedulingIgnoredDuringExecution 硬亲和性 必须满足亲和性。
preferredDuringSchedulingIgnoredDuringExecution 软亲和性
能满足最好,不满足也没关系。
4.1 硬亲和性
matchExpressions : 匹配表达式,这个标签可以指定一段,例如pod中定义的key为zone,operator为In(包含那些),values为 foo和bar。就是在node节点中包含foo和bar的标签中调度
matchFields : 匹配字段 和上面的意思 不过他可以不定义标签值,可以定义
选择在 node 有 zone 标签值为 foo 或 bbb 值的节点上运行 pod
[root@k8s-master ~]# kubectl get nodes --show-labels| grep zone
k8s-node-1 Ready node 46d v1.15.3 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disk=ssd,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node-1,kubernetes.io/os=linux,node-role.kubernetes.io/node=,zone=foo
[root@k8s-master ~]#
# cat node-affinity-1.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-hello-deployment
namespace:
labels:
app: nginx-hello
spec:
replicas: 2
selector:
matchLabels:
app: nginx-hello
template:
metadata:
labels:
app: nginx-hello
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: zone
operator: In
values:
- foo
- bbb
containers:
- name: nginx-hello
image: nginx
ports:
- containerPort: 80
[root@k8s-master schedule]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESSGATES
nginx-hello-deployment-d457bd7bc-fsjjn 1/1 Running 0 2m34s 10.254.1.124 k8s-node-1 <none> <none>
nginx-hello-deployment-d457bd7bc-ntb8h 1/1 Running 0 2m34s 10.254.1.123 k8s-node-1 <none> <none>
nginx-pod 1/1 Running 0 58m 10.254.1.120 k8s-node-1 <none> <none>
[root@k8s-master schedule]#
我们发现都按标签 分配到node1 上面了,我们把标签改一下,让pod匹配不上
# cat node-affinity-1.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-hello-deployment
namespace:
labels:
app: nginx-hello
spec:
replicas: 2
selector:
matchLabels:
app: nginx-hello
template:
metadata:
labels:
app: nginx-hello
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: zone
operator: In
values:
- foo-no
- bbb-no
containers:
- name: nginx-hello
image: nginx
ports:
- containerPort: 80
#查看(没有zone这个标签value值匹配不上,所以会Pending
[root@k8s-master schedule]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-hello-deployment-6c96b5675f-8jqnx 0/1 Pending 0 43s <none> <none> <none> <none>
nginx-hello-deployment-6c96b5675f-lbnsw 0/1 Pending 0 43s <none> <none> <none> <none>
nginx-pod 1/1 Running 0 60m 10.254.1.120 k8s-node-1 <none> <none>
[root@k8s-master schedule]#
4.2 软亲和
nodeAffinity的preferredDuringSchedulingIgnoredDuringExecution (软亲和,选择条件匹配多的,就算都不满足条件,还是会生成pod)
# cat node-affinity-1.yaml
apiVersion: apps/v1kind: Deployment
metadata: name: nginx-hello-deployment
namespace: labels:
app: nginx-hellospec:
replicas: 2
selector:
matchLabels:
app: nginx-hello
template:
metadata:
labels:
app: nginx-hello
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- preference:
matchExpressions:
- key: zone
operator: In
values:
- foo-no
- bbb-no
weight: 60 #匹配相应nodeSelectorTerm相关联的权重,1-100
containers:
- name: nginx-hello
image: nginx
ports:
- containerPort: 80
[root@k8s-master schedule]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-hello-deployment-98654dc57-cvvlb 1/1 Running 0 15s 10.254.1.125 k8s-node-1 <none> <none>
nginx-hello-deployment-98654dc57-mglbx 1/1 Running 0 20s 10.254.2.90 k8s-node-2 <none> <none>
nginx-pod 1/1 Running 0 72m 10.254.1.120 k8s-node-1 <none> <none>
[root@k8s-master schedule]#
5 pod亲和度podAffinity
Pod亲和性场景,我们的k8s集群的节点分布在不同的区域或者不同的机房,当服务A和服务B要求部署在同一个区域或者同一机房的时候,我们就需要亲和性调度了。
labelSelector : 选择跟那组Pod亲和
namespaces : 选择哪个命名空间
topologyKey : 指定节点上的哪个键
5.1 按labelSelector标签亲和
让两个POD标签处于一处
# cat pod-affinity.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: nginx-deployment
spec:
replicas: 1
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.15
ports:
- containerPort: 80
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment-pod-affinity
namespace:
labels:
app: nginx-hello
spec:
replicas: 1
selector:
matchLabels:
app: nginx-hello
template:
metadata:
labels:
app: nginx-hello
spec:
affinity:
podAffinity:
#preferredDuringSchedulingIgnoredDuringExecution:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app #标签键名,上面pod定义
operator: In #In表示在
values:
- nginx #app标签的值
topologyKey: kubernetes.io/hostname #kubernetes.io/hostname的值一样代表pod处于同一位置 #此pod应位于同一位置(亲和力)或不位于同一位置(反亲和力),与pods匹配指定名称空间中的labelSelector,其中co-located定义为在标签值为的节点上运行,key topologyKey匹配任何选定pod的任何节点在跑
containers:
- name: nginx-hello
image: nginx
ports:
- containerPort: 80
[root@k8s-master ~]# kubectl get pod -o wide| grep nginx
nginx-deployment-6f6d9b887f-5mvqs 1/1 Running 0 6s 10.254.2.92 k8s-node-2 <none> <none>
nginx-deployment-pod-affinity-5566c6d4fd-2tnrq 1/1 Running 0 6s 10.254.2.93 k8s-node-2 <none> <none>
[root@k8s-master ~]#
5.2 podAntiAffinity反亲和
让pod和某个pod不处于同一node,和上面相反)
# cat pod-affinity.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: nginx-deployment
spec:
replicas: 1
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.15
ports:
- containerPort: 80
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment-pod-affinity
namespace:
labels:
app: nginx-hello
spec:
replicas: 1
selector:
matchLabels:
app: nginx-hello
template:
metadata:
labels:
app: nginx-hello
spec:
affinity:
#podAffinity:
podAntiAffinity: #就改了这里
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app #标签键名,上面pod定义
operator: In #In表示在
values:
- nginx #app1标签的值
topologyKey: kubernetes.io/hostname #kubernetes.io/hostname的值一样代表pod处于同一位置 #此pod应位于同一位置(亲和力)或不位于同一位置(反亲和力),与pods匹配指定名称空间中的labelSelector,其中co-located定义为在标签值为的节点上运行,key topologyKey匹配任何选定pod的任何节点在跑
containers:
- name: nginx-hello
image: nginx
ports:
- containerPort: 80
[root@k8s-master ~]# kubectl apply -f a.yaml
deployment.extensions/nginx-deployment unchanged
deployment.apps/nginx-deployment-pod-affinity configured
[root@k8s-master ~]# kubectl get pod -o wide| grep nginx
nginx-deployment-6f6d9b887f-5mvqs 1/1 Running 0 68s 10.254.2.92 k8s-node-2 <none> <none>
nginx-deployment-pod-affinity-5566c6d4fd-2tnrq 1/1 Running 0 68s 10.254.2.93 k8s-node-2 <none> <none>
nginx-deployment-pod-affinity-86bdf6996b-fdb8f 0/1 ContainerCreating 0 4s <none> k8s-node-1 <none> <none>
[root@k8s-master ~]#
[root@k8s-master ~]#
[root@k8s-master ~]# kubectl get pod -o wide| grep nginx
nginx-deployment-6f6d9b887f-5mvqs 1/1 Running 0 73s 10.254.2.92 k8s-node-2 <none> <none>
nginx-deployment-pod-affinity-86bdf6996b-fdb8f 1/1 Running 0 9s 10.254.1.56 k8s-node-1 <none> <none>
[root@k8s-master ~]#
6 污点调度
taints and tolerations 允许将某个节点做标记,以使得所有的pod都不会被调度到该节点上。但是如果某个pod明确制定了 tolerates 则可以正常调度到被标记的节点上。
# 可以使用命令行为 Node 节点添加 Taints:
kubectl taint nodes node1 key=value:NoSchedule
operator可以定义为:
Equal:表示key是否等于value,默认
Exists:表示key是否存在,此时无需定义value
tain 的 effect 定义对 Pod 排斥效果:
NoSchedule:仅影响调度过程,对现存的Pod对象不产生影响;
NoExecute:既影响调度过程,也影响显著的Pod对象;不容忍的Pod对象将被驱逐
PreferNoSchedule: 表示尽量不调度
#查看污点
[root@k8s-master schedule]# kubectl describe node k8s-master |grep Taints
Taints: node-role.kubernetes.io/master:PreferNoSchedule
[root@k8s-master schedule]#
#给node1打上污点
#kubectl taint node k8s-node-1 node-type=production:NoSchedule
[root@k8s-master schedule]# kubectl describe node k8s-node-1 |grep Taints
Taints: node-type=production:NoSchedule
[root@k8s-master schedule]#
# cat deploy.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: nginx-deployment
spec:
replicas: 2
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.15
ports:
- containerPort: 80
[root@k8s-master schedule]# kubectl apply -f deploy.yaml
deployment.extensions/nginx-deployment unchanged
#pod都运行在node-2上
[root@k8s-master schedule]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-deployment-6f6d9b887f-j5nmz 1/1 Running 0 83s 10.254.2.94 k8s-node-2 <none> <none>
nginx-deployment-6f6d9b887f-wjfpp 1/1 Running 0 83s 10.254.2.93 k8s-node-2 <none> <none>
[root@k8s-master schedule]#
#给node2打上污点
[root@k8s-master schedule]# kubectl delete deployments nginx-deployment
deployment.extensions "nginx-deployment" deleted
[root@k8s-master schedule]# kubectl get pods -o wide
No resources found.
[root@k8s-master schedule]# kubectl taint node k8s-node-2 node-type=production:NoSchedule
node/k8s-node-2 tainted
[root@k8s-master schedule]# kubectl describe node k8s-node-2 |grep Taints
Taints: node-type=production:NoSchedule
[root@k8s-master schedule]#
[root@k8s-master schedule]# kubectl apply -f deploy.yaml
deployment.extensions/nginx-deployment created
#结果pod都运行在master上了
[root@k8s-master schedule]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-deployment-6f6d9b887f-ck6pd 1/1 Running 0 15s 10.254.0.48 k8s-master <none> <none>
nginx-deployment-6f6d9b887f-gdwm6 1/1 Running 0 15s 10.254.0.49 k8s-master <none> <none>
[root@k8s-master schedule]#
#master也打上污点
[root@k8s-master schedule]# kubectl taint node k8s-master node-type=production:NoSchedule
node/k8s-master tainted
[root@k8s-master schedule]# kubectl delete deployments nginx-deployment
deployment.extensions "nginx-deployment" deleted
[root@k8s-master schedule]# kubectl apply -f deploy.yaml
deployment.extensions/nginx-deployment created
[root@k8s-master schedule]#
#没有节点可以启动pod
[root@k8s-master schedule]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-deployment-6f6d9b887f-mld4v 0/1 Pending 0 5s <none> <none> <none> <none>
nginx-deployment-6f6d9b887f-q4nfj 0/1 Pending 0 5s <none> <none> <none> <none>
[root@k8s-master schedule]#
#不能容忍污点
[root@k8s-master schedule]# kubectl describe pod nginx-deployment-6f6d9b887f-mld4v |tail -1
Warning FailedScheduling 51s (x6 over 3m29s) default-scheduler 0/3 nodes are available: 3 node(s) had taints that the pod didn't tolerate.
#定义Toleration(容忍)
# cat deploy.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: nginx-deployment
spec:
replicas: 2
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.15
ports:
- containerPort: 80
tolerations:
- key: "node-type" # #之前定义的污点名
operator: "Equal" #Exists,如果node-type污点在,就能容忍,Equal精确
value: "production" #污点值
effect: "NoSchedule" #效果
[root@k8s-master schedule]# kubectl apply -f deploy.yaml
deployment.extensions/nginx-deployment unchanged
#两个pod均衡调度到两个node
[root@k8s-master schedule]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-deployment-565dd6b94d-4cdhz 1/1 Running 0 32s 10.254.1.130 k8s-node-1 <none> <none>
nginx-deployment-565dd6b94d-fqzm7 1/1 Running 0 32s 10.254.2.95 k8s-node-2 <none> <none>
[root@k8s-master schedule]#
#删除污点
[root@k8s-master schedule]# kubectl describe nodes k8s-master |grep Taints
Taints: node-role.kubernetes.io/master:PreferNoSchedule
[root@k8s-master schedule]# kubectl describe nodes k8s-node-1 |grep Taints
Taints: node-type=production:NoSchedule
[root@k8s-master schedule]# kubectl taint node k8s-node-1 node-type-
node/k8s-node-1 untainted
[root@k8s-master schedule]# kubectl describe nodes k8s-node-1 |grep Taints
Taints: <none>
[root@k8s-master schedule]#