kubernetes 节点硬盘故障如何恢复 kubernetes增加节点

转载

mob64ca13fd9f8e 2023-11-23 16:46:11

文章标签 kubernetes docker Ubuntu sed f5 文章分类 kubernetes 云计算

在初始设置 Kubernetes 集群后，最常见的操作是通过添加更多运行工作负载（容器和 Pod）的节点来扩展集群。扩展集群的方式取决于最初在集群引导期间使用的工具。本指南演示如何使用 kubeadm 命令行工具将更多工作节点添加到 Kubernetes 集群。

示例集群

有一个包含两个工作节点和一个主节点的集群，容器运行时为containerd，使用的网络插件为calico，操作系统为Ubuntu 20.04。

集群初始化配置--control-plane-endpoint为k8s-cluster.test.com

当前所有节点/etc/hosts新增的配置：

192.168.1.140   k8s-cluster.test.com
192.168.1.140 k8s-master-01  k8s-master-01.test.com
192.168.1.141 k8s-worker-01  k8s-worker-01.test.com
192.168.1.142 k8s-worker-02  k8s-worker-02.test.com

$ kubectl get nodes -o wide
NAME            STATUS   ROLES                  AGE     VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION     CONTAINER-RUNTIME
k8s-master-01   Ready    control-plane,master   4h20m   v1.23.2   192.168.1.140   <none>        Ubuntu 20.04.3 LTS   5.4.0-96-generic   containerd://1.4.12
k8s-worker-01   Ready    <none>                 3h28m   v1.23.2   192.168.1.141   <none>        Ubuntu 20.04.3 LTS   5.4.0-96-generic   containerd://1.4.12
k8s-worker-02   Ready    <none>                 3h20m   v1.23.2   192.168.1.142   <none>        Ubuntu 20.04.3 LTS   5.4.0-96-generic   containerd://1.4.12

新增工作节点

节点信息

操作系统：Ubuntu 20.04
CPU：4vCPU
内存：4GB
主机名：k8s-worker-03

防火墙

# shell
sudo ufw allow 22/tcp
# Kubelet API	
sudo ufw allow 10250/tcp
# NodePort Services
sudo ufw allow 30000:32767/tcp
# calico
sudo ufw allow 179/tcp
sudo ufw allow 5473/tcp
sudo ufw allow 4789/udp

/etc/hosts配置

注意：所有节点都需调整。

192.168.1.140   k8s-cluster.test.com
192.168.1.140 k8s-master-01  k8s-master-01.test.com
192.168.1.141 k8s-worker-01  k8s-worker-01.test.com
192.168.1.142 k8s-worker-02  k8s-worker-02.test.com
192.168.1.143 k8s-worker-03  k8s-worker-03.test.com

关闭swap分区

sudo sed -i 's/^\(.*swap.*\)$/#\1/g' /etc/fstab
sudo swapoff -a

安装Kubernetes

sudo apt update
sudo apt -y install curl apt-transport-https
curl -s https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | sudo apt-key add -
echo "deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list

sudo apt update
sudo apt -y install vim git wget
sudo apt -y install kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl

安装容器运行时containerd

# Configure persistent loading of modules
sudo tee /etc/modules-load.d/containerd.conf <<EOF
overlay
br_netfilter
EOF

# Load at runtime
sudo modprobe overlay
sudo modprobe br_netfilter

# Ensure sysctl params are set
sudo tee /etc/sysctl.d/kubernetes.conf<<EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOF

# Reload configs
sudo sysctl --system

# Install required packages
sudo apt install -y curl gnupg2 software-properties-common apt-transport-https ca-certificates

# Add Docker repo
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"

# Install containerd
sudo apt update
sudo apt install -y containerd.io

# Configure containerd and start service
sudo su -
mkdir -p /etc/containerd
containerd config default>/etc/containerd/config.toml
# Change image repository
sed -i 's/k8s.gcr.io/registry.aliyuncs.com\/google_containers/g' /etc/containerd/config.toml

要使用 systemd cgroup 驱动程序，需要在 /etc/containerd/config.toml 中设置：

...
        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
          ...
          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
            SystemdCgroup = true

调整sandbox_image镜像地址：

sudo sed -i 's/k8s.gcr.io/registry.aliyuncs.com\/google_containers/g' /etc/containerd/config.toml

[plugins."io.containerd.grpc.v1.cri"]
    ...
    sandbox_image = "registry.aliyuncs.com/google_containers/pause:3.2"

重启服务：

# restart containerd
systemctl restart containerd
systemctl enable containerd
systemctl status  containerd

获取加入令牌

将新的工作节点加入 Kubernetes 集群时需要令牌。当使用 kubeadm 初始集群时，会生成一个令牌，该令牌会在 24 小时后过期。
检查是否有令牌，在控制节点上运行命令：

$ kubeadm token list
TOKEN                     TTL         EXPIRES                USAGES                   DESCRIPTION                                                EXTRA GROUPS
bdqsdw.2uf50yfvo3uwy93w   19h         2022-01-23T01:52:48Z   authentication,signing   The default bootstrap token generated by 'kubeadm init'.   system:bootstrappers:kubeadm:default-node-token

如果令牌已过期，请使用以下命令生成一个新令牌：

sudo kubeadm token create

使用以下命令获取生成的令牌：

kubeadm token list

还可以生成令牌并加入打印命令：

kubeadm token create --print-join-command

获取令牌 CA 证书哈希

kubeadm join 命令通过将其哈希与提供的哈希匹配来验证根 CA 公钥。通过运行以下命令获取主节点上的令牌 CA 证书哈希。

openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'

获取 `api-server-endpoint` 地址

在主节点中，使用 kubectl cluster-info 命令获取：

$ kubectl cluster-info
Kubernetes control plane is running at https://k8s-cluster.test.com:6443
CoreDNS is running at https://k8s-cluster.test.com:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

如输出所示，示例中为：https://k8s-cluster.test.com:6443。

将工作节点加入集群

kubeadm join 命令用于将工作节点或其他主节点加入集群。将工作节点加入集群的命令语法是：

kubeadm join [api-server-endpoint] [flags]

所需的常见标志是：

--token 字符串：要使用的令牌
--discovery-token-ca-cert-hash，格式为：<type>:<value>

完整的命令具有以下格式：

kubeadm join \
  <control-plane-host>:<control-plane-port> \
  --token <token> \
  --discovery-token-ca-cert-hash sha256:<hash>

示例：

$ kubeadm join k8s-cluster.test.com:6443 --token bdqsdw.2uf50yfvo3uwy93w \
	--discovery-token-ca-cert-hash sha256:2a6f431cc99860ff6e15519e08e62f01b9b0cb051380031582bd5cc22efbc084
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
W0122 03:39:25.496531   74229 utils.go:69] The recommended value for "resolvConf" in "KubeletConfiguration" is: /run/systemd/resolve/resolv.conf; the provided value is: /run/systemd/resolve/resolv.conf
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

等待节点处于“就绪”状态 – 检查控制节点，该过程可能需要几分钟时间，因为在配置和启动服务之前会拉取容器映像。

$ kubectl get nodes
NAME            STATUS   ROLES                  AGE     VERSION
k8s-master-01   Ready    control-plane,master   5h47m   v1.23.2
k8s-worker-01   Ready    <none>                 4h55m   v1.23.2
k8s-worker-02   Ready    <none>                 4h47m   v1.23.2
k8s-worker-03   Ready    <none>                 5m      v1.23.2

从集群中移除工作节点

要从集群中移除工作节点，请执行以下操作。

从节点迁移 pod

kubectl drain  <node-name> --delete-local-data --ignore-daemonsets

将节点标记为不可调度

防止节点调度新的 pod。

kubectl cordon <node-name>

重置被移除节点

恢复通过“kubeadm join”对节点所做的更改。

kubeadm reset

一旦成功执行 kubeadm reset 命令，还可以重新将新节点加入集群。

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：java ArrayList设置固定长度设置arraylist的长度

下一篇：多输入多输出的神经网络模型多输出神经网络预测

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯