Kubernetes中的默认调度器 kubernetes自定义调度器

转载

mob6454cc68daf3 2024-04-19 14:11:58

文章标签 Kubernetes中的默认调度器 kubernetes Pod 自定义 ci 文章分类 kubernetes 云计算

文章目录

Scheduling Framework
如何开始？

1. 写一个`KubeSchedulerConfiguration` yaml文件
2. 修改kube-scheduler容器配置

开发一个新的插件

coding
更新插件配置为QueueSort阶段使用NoOp插件
测试效果

总结

Scheduling Framework

kubernetes自定义调度器使用schedule framework，schedule framework要求自定义调度逻辑以插件方式实现，类似回调函数。

参考：https://kubernetes.io/docs/concepts/scheduling-eviction/scheduling-framework/

Kubernetes中的默认调度器 kubernetes自定义调度器_ci

如上图所示，调度过程大致分成预选和优选两个阶段，选出最合适的node后，将Pod和node进行绑定。

可以通过实现每个阶段预定义的接口，在里面写自定义逻辑。比如QueueSort阶段（调度第一阶段，这个阶段用于给调度队列中的pod进行调度优先级排序），需要实现对比两个Pod优先级的接口。

Less(*v1.pod, *v1.pod) bool

如何开始？

基于github上的项目进行二次开发就可以。

github开源地址 https://github.com/kubernetes-sigs/scheduler-plugins.git
kubernetes版本： v1.23.5

在kubernetes默认调度流程中插入自定义逻辑的大概流程是：

写一个KubeSchedulerConfiguration文件，将其挂载到kube-scheduler容器中。KubeSchedulerConfiguration 配置了某些阶段回调哪个插件的逻辑。
修改kube-scheduler容器配置，一般是/etc/kubernetes/manifests/kube-scheduler.yaml。配置--config参数指向KubeSchedulerConfiguration文件在容器中的路径。修改保存之后，kubernetes会自动重启kube-system命名空间的scheduler容器让配置生效。

1. 写一个`KubeSchedulerConfiguration` yaml文件

/etc/kubernetes/sched-cc.yaml

apiVersion: kubescheduler.config.k8s.io/v1beta2
kind: KubeSchedulerConfiguration
leaderElection:
  # (Optional) Change true to false if you are not running a HA control-plane.
  leaderElect: true
clientConnection:
  kubeconfig: /etc/kubernetes/scheduler.conf
profiles:
  - schedulerName: default-scheduler
    plugins:
      # queueSort插件只使用CoScheduling插件
      queueSort:
        enabled:
          - name: Coscheduling
        disabled:
          - name: "*"
      preFilter:
        enabled:
          - name: Coscheduling
      postFilter:
        enabled:
          - name: Coscheduling
      permit:
        enabled:
          - name: Coscheduling
      reserve:
        enabled:
          - name: Coscheduling
      postBind:
        enabled:
          - name: Coscheduling

2. 修改kube-scheduler容器配置

/etc/kubernetes/manifests/kube-scheduler.yaml

apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    component: kube-scheduler
    tier: control-plane
  name: kube-scheduler
  namespace: kube-system
spec:
  containers:
  - command:
    # 修改这里kube-scheduler的启动参数
    - kube-scheduler
    - --authentication-kubeconfig=/etc/kubernetes/scheduler.conf
    - --authorization-kubeconfig=/etc/kubernetes/scheduler.conf
    - --bind-address=127.0.0.1
    # KubeSchedulerConfiguration文件在容器中的路径
    - --config=/etc/kubernetes/sched-cc.yaml
    - -v=9
    # 开发自定义逻辑后，手动构建镜像的镜像tag
    image: localhost:5000/scheduler-plugins/kube-scheduler:latest
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 8
      httpGet:
        host: 127.0.0.1
        path: /healthz
        port: 10259
        scheme: HTTPS
      initialDelaySeconds: 10
      periodSeconds: 10
      timeoutSeconds: 15
    name: kube-scheduler
    resources:
      requests:
        cpu: 100m
    startupProbe:
      failureThreshold: 24
      httpGet:
        host: 127.0.0.1
        path: /healthz
        port: 10259
        scheme: HTTPS
      initialDelaySeconds: 10
      periodSeconds: 10
      timeoutSeconds: 15
    volumeMounts:
    - mountPath: /etc/kubernetes
      name: kubeconfig
  hostNetwork: true
  priorityClassName: system-node-critical
  securityContext:
    seccompProfile:
      type: RuntimeDefault
  volumes:
  # 挂载自定义KubeSchedulerConfiguration文件
  - hostPath:
      path: /etc/kubernetes/
      type: Directory
    name: kubeconfig
status: {}

开发一个新的插件

coding

除了仓库里本身有的，可以按照规范开发一个自己的插件。

在工程的cmd/scheduler/main.go路径下，注册下自己的插件，参照其他插件代码实现自定义逻辑，重新构建镜像就可以将自定义逻辑插入k8s默认调度流程中。

func main() {
	rand.Seed(time.Now().UnixNano())

	// Register custom plugins to the scheduler framework.
	// Later they can consist of scheduler profile(s) and hence
	// used by various kinds of workloads.
	command := app.NewSchedulerCommand(
		app.WithPlugin(capacityscheduling.Name, capacityscheduling.New),
		app.WithPlugin(coscheduling.Name, coscheduling.New),
		app.WithPlugin(loadvariationriskbalancing.Name, loadvariationriskbalancing.New),
		app.WithPlugin(noderesources.AllocatableName, noderesources.NewAllocatable),
		app.WithPlugin(noderesourcetopology.Name, noderesourcetopology.New),
		app.WithPlugin(preemptiontoleration.Name, preemptiontoleration.New),
		app.WithPlugin(targetloadpacking.Name, targetloadpacking.New),

		// 新增加一行，实现了Name和New函数的插件
		app.WithPlugin(noop.Name,noop.New)
	)

	// TODO: once we switch everything over to Cobra commands, we can go back to calling
	// utilflag.InitFlags() (by removing its pflag.Parse() call). For now, we have to set the
	// normalize func and add the go flag set by hand.
	// utilflag.InitFlags()
	logs.InitLogs()
	defer logs.FlushLogs()

	if err := command.Execute(); err != nil {
		os.Exit(1)
	}
}

自定义调度逻辑：

简单描述一下，紧急情况下，需要优先调度label为emergency: red的Pod. 如果不是紧急Pod，那么就按照优先级排。

const Name = "NoOp"

// NoOp is a plugin that do nothing just
type NoOp struct{}

func (pl *NoOp) Less(info *framework.QueuedPodInfo, info2 *framework.QueuedPodInfo) bool {
	p1 := corev1helpers.PodPriority(info.Pod)
	p2 := corev1helpers.PodPriority(info2.Pod)
	// if emergency is code red, then schedule it in priority
	if emer,ok:=info.Pod.Labels["emergency"];ok && emer=="red"{
		 return true
	} else{
		 return p1 > p2
	}
}

var _ framework.QueueSortPlugin = &NoOp{}

// Name returns name of the plugin.
func (pl *NoOp) Name() string {
	return Name
}

// New initializes a new plugin and returns it.
func New(_ runtime.Object, _ framework.Handle) (framework.Plugin, error) {
	return &NoOp{}, nil
}

更新插件配置为QueueSort阶段使用NoOp插件

修改/etc/kubernetes/sched-cc.yaml为：

apiVersion: kubescheduler.config.k8s.io/v1beta2
kind: KubeSchedulerConfiguration
leaderElection:
  # (Optional) Change true to false if you are not running a HA control-plane.
  leaderElect: true
clientConnection:
  kubeconfig: /etc/kubernetes/scheduler.conf
profiles:
  - schedulerName: default-scheduler
    plugins:
      # queueSort插件只使用CoScheduling插件
      queueSort:
        enabled:
          - name: NoOp
        disabled:
          - name: "*"

重新构建镜像后make localimage，重启kube-system的scheduler容器（直接杀掉）使得插件配置生效。

测试效果

使用kubectl apply -f 以下文件，先创建一个非紧急，再创建一个紧急。

# deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: pause
spec:
  replicas: 4
  selector:
    matchLabels:
      app: pause
  template:
    metadata:
      labels:
        app: pause
        pod-group.scheduling.sigs.k8s.io: pg1
    spec:
      containers:
        - name: nginx
          image: nginx

---

# deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: red
spec:
  replicas: 1
  selector:
    matchLabels:
      app: test
  template:
    metadata:
      labels:
        app: test
        emergency: red
    spec:
      containers:
        - name: nginx
          image: nginx

查看调度结果为，虽然紧急Pod是之后apply的，但是调度却优先于非紧急pod

^C[root@localhost kubernetes]# kubectl get pods -w
NAME                    READY   STATUS    RESTARTS   AGE
pause-c9db6b47f-l2xg9   0/1     Pending   0          0s
red-77bffc7d58-zsvjc    0/1     Pending   0          0s
pause-c9db6b47f-k7hgz   0/1     Pending   0          0s
pause-c9db6b47f-l2xg9   0/1     Pending   0          0s
pause-c9db6b47f-gp5sw   0/1     Pending   0          0s
red-77bffc7d58-zsvjc    0/1     Pending   0          0s
pause-c9db6b47f-k7hgz   0/1     Pending   0          0s
pause-c9db6b47f-hhs7d   0/1     Pending   0          0s
pause-c9db6b47f-gp5sw   0/1     Pending   0          0s
pause-c9db6b47f-hhs7d   0/1     Pending   0          0s
pause-c9db6b47f-l2xg9   0/1     ContainerCreating   0          0s
red-77bffc7d58-zsvjc    0/1     ContainerCreating   0          0s
pause-c9db6b47f-gp5sw   0/1     ContainerCreating   0          0s
pause-c9db6b47f-k7hgz   0/1     ContainerCreating   0          0s
pause-c9db6b47f-hhs7d   0/1     ContainerCreating   0          0s
# 优先启动了紧急pod
red-77bffc7d58-zsvjc    1/1     Running             0          4s
pause-c9db6b47f-hhs7d   1/1     Running             0          6s
pause-c9db6b47f-k7hgz   1/1     Running             0          8s
pause-c9db6b47f-l2xg9   1/1     Running             0          9s
pause-c9db6b47f-gp5sw   1/1     Running             0          10s

总结

本文只是做一个二次开发的概念验证，主要注重理解二次开发代码逻辑的插入默认调度流程的方法。

学会如何开发一个简单的插件后，就可以实现更加复杂的逻辑。比如

基于prometheus监控合理分配Pod的运行节点，消除集群倾斜现象
同时启动100个Pod用于spark计算

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：索引是升序索引是有序的吗

下一篇：消息队列消息推送消息队列发短信

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯