在初始设置 Kubernetes 集群后,最常见的操作是通过添加更多运行工作负载(容器和 Pod)的节点来扩展集群。 扩展集群的方式取决于最初在集群引导期间使用的工具。 本指南演示如何使用 kubeadm 命令行工具将更多工作节点添加到 Kubernetes 集群。

示例集群

有一个包含两个工作节点和一个主节点的集群,容器运行时为containerd,使用的网络插件为calico,操作系统为Ubuntu 20.04

集群初始化配置--control-plane-endpointk8s-cluster.test.com

当前所有节点/etc/hosts新增的配置:

192.168.1.140   k8s-cluster.test.com
192.168.1.140 k8s-master-01  k8s-master-01.test.com
192.168.1.141 k8s-worker-01  k8s-worker-01.test.com
192.168.1.142 k8s-worker-02  k8s-worker-02.test.com
$ kubectl get nodes -o wide
NAME            STATUS   ROLES                  AGE     VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION     CONTAINER-RUNTIME
k8s-master-01   Ready    control-plane,master   4h20m   v1.23.2   192.168.1.140   <none>        Ubuntu 20.04.3 LTS   5.4.0-96-generic   containerd://1.4.12
k8s-worker-01   Ready    <none>                 3h28m   v1.23.2   192.168.1.141   <none>        Ubuntu 20.04.3 LTS   5.4.0-96-generic   containerd://1.4.12
k8s-worker-02   Ready    <none>                 3h20m   v1.23.2   192.168.1.142   <none>        Ubuntu 20.04.3 LTS   5.4.0-96-generic   containerd://1.4.12

新增工作节点

节点信息

  • 操作系统:Ubuntu 20.04
  • CPU4vCPU
  • 内存:4GB
  • 主机名:k8s-worker-03

防火墙

# shell
sudo ufw allow 22/tcp
# Kubelet API	
sudo ufw allow 10250/tcp
# NodePort Services
sudo ufw allow 30000:32767/tcp
# calico
sudo ufw allow 179/tcp
sudo ufw allow 5473/tcp
sudo ufw allow 4789/udp

/etc/hosts配置

注意:所有节点都需调整。

192.168.1.140   k8s-cluster.test.com
192.168.1.140 k8s-master-01  k8s-master-01.test.com
192.168.1.141 k8s-worker-01  k8s-worker-01.test.com
192.168.1.142 k8s-worker-02  k8s-worker-02.test.com
192.168.1.143 k8s-worker-03  k8s-worker-03.test.com

关闭swap分区

sudo sed -i 's/^\(.*swap.*\)$/#\1/g' /etc/fstab
sudo swapoff -a

安装Kubernetes

sudo apt update
sudo apt -y install curl apt-transport-https
curl -s https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | sudo apt-key add -
echo "deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list

sudo apt update
sudo apt -y install vim git wget
sudo apt -y install kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl

安装容器运行时containerd

# Configure persistent loading of modules
sudo tee /etc/modules-load.d/containerd.conf <<EOF
overlay
br_netfilter
EOF

# Load at runtime
sudo modprobe overlay
sudo modprobe br_netfilter

# Ensure sysctl params are set
sudo tee /etc/sysctl.d/kubernetes.conf<<EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOF

# Reload configs
sudo sysctl --system

# Install required packages
sudo apt install -y curl gnupg2 software-properties-common apt-transport-https ca-certificates

# Add Docker repo
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"

# Install containerd
sudo apt update
sudo apt install -y containerd.io

# Configure containerd and start service
sudo su -
mkdir -p /etc/containerd
containerd config default>/etc/containerd/config.toml
# Change image repository
sed -i 's/k8s.gcr.io/registry.aliyuncs.com\/google_containers/g' /etc/containerd/config.toml

要使用 systemd cgroup 驱动程序,需要在 /etc/containerd/config.toml 中设置:

...
        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
          ...
          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
            SystemdCgroup = true

调整sandbox_image镜像地址:

sudo sed -i 's/k8s.gcr.io/registry.aliyuncs.com\/google_containers/g' /etc/containerd/config.toml

[plugins."io.containerd.grpc.v1.cri"]
    ...
    sandbox_image = "registry.aliyuncs.com/google_containers/pause:3.2"

重启服务:

# restart containerd
systemctl restart containerd
systemctl enable containerd
systemctl status  containerd

获取加入令牌

将新的工作节点加入 Kubernetes 集群时需要令牌。 当使用 kubeadm 初始集群时,会生成一个令牌,该令牌会在 24 小时后过期。
检查是否有令牌, 在控制节点上运行命令:

$ kubeadm token list
TOKEN                     TTL         EXPIRES                USAGES                   DESCRIPTION                                                EXTRA GROUPS
bdqsdw.2uf50yfvo3uwy93w   19h         2022-01-23T01:52:48Z   authentication,signing   The default bootstrap token generated by 'kubeadm init'.   system:bootstrappers:kubeadm:default-node-token

如果令牌已过期,请使用以下命令生成一个新令牌:

sudo kubeadm token create

使用以下命令获取生成的令牌:

kubeadm token list

还可以生成令牌并加入打印命令:

kubeadm token create --print-join-command

获取令牌 CA 证书哈希

kubeadm join 命令通过将其哈希与提供的哈希匹配来验证根 CA 公钥。 通过运行以下命令获取主节点上的令牌 CA 证书哈希。

openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'

获取 api-server-endpoint 地址

在主节点中,使用 kubectl cluster-info 命令获取:

$ kubectl cluster-info
Kubernetes control plane is running at https://k8s-cluster.test.com:6443
CoreDNS is running at https://k8s-cluster.test.com:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

如输出所示,示例中为:https://k8s-cluster.test.com:6443

将工作节点加入集群

kubeadm join 命令用于将工作节点或其他主节点加入集群。 将工作节点加入集群的命令语法是:

kubeadm join [api-server-endpoint] [flags]

所需的常见标志是:

  • --token 字符串:要使用的令牌
  • --discovery-token-ca-cert-hash,格式为:<type>:<value>

完整的命令具有以下格式:

kubeadm join \
  <control-plane-host>:<control-plane-port> \
  --token <token> \
  --discovery-token-ca-cert-hash sha256:<hash>

示例:

$ kubeadm join k8s-cluster.test.com:6443 --token bdqsdw.2uf50yfvo3uwy93w \
	--discovery-token-ca-cert-hash sha256:2a6f431cc99860ff6e15519e08e62f01b9b0cb051380031582bd5cc22efbc084
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
W0122 03:39:25.496531   74229 utils.go:69] The recommended value for "resolvConf" in "KubeletConfiguration" is: /run/systemd/resolve/resolv.conf; the provided value is: /run/systemd/resolve/resolv.conf
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

等待节点处于“就绪”状态 – 检查控制节点,该过程可能需要几分钟时间,因为在配置和启动服务之前会拉取容器映像。

$ kubectl get nodes
NAME            STATUS   ROLES                  AGE     VERSION
k8s-master-01   Ready    control-plane,master   5h47m   v1.23.2
k8s-worker-01   Ready    <none>                 4h55m   v1.23.2
k8s-worker-02   Ready    <none>                 4h47m   v1.23.2
k8s-worker-03   Ready    <none>                 5m      v1.23.2

从集群中移除工作节点

要从集群中移除工作节点,请执行以下操作。

从节点迁移 pod

kubectl drain  <node-name> --delete-local-data --ignore-daemonsets

将节点标记为不可调度

防止节点调度新的 pod

kubectl cordon <node-name>

重置被移除节点

恢复通过“kubeadm join”对节点所做的更改。

kubeadm reset

一旦成功执行 kubeadm reset 命令,还可以重新将新节点加入集群。