环境:最小化安装的系统(以redhat系为例)


报障案例1

故障:k8s集群拉起来了,ceph也起来了,但是在安装docker镜像仓库的时候,发现仓库一直处于pending的状态,重启也无法解决问题。
1.png
2.png
```html/xml
排障过程:首先是通过Kubectl describe pod -n namespace 来查看,发现pv和pvc挂掉了
接着我们继续查看pv和pvc的情况,发现cepth报错
最后我们查看cepth容器的日志,发现是容器内存ceph的配置文件出错
解决方案:cepth这里是运行早容器里面的,cephmount官方的默认配置是支持xfs文件系统格式,但是我们这边配置的是ext4格式的
操作系 统,只能去cepth.conf的配置文件里面修改本容器ceph的文件格式,再重启ceph即可。后面重新部署docker仓库
的时候就可以部署了。


----
### 日志保障2
```html/xml
2022-07-04 10:02:18 [INFO] - fatal: [master01]: 
FAILED! => {"changed": false, "failures": [], "msg": "
Depsolve Error occured: \n Problem: conflicting requests\n  
- nothing provides libcrypto.so.10()(64bit) needed by 
- cephmount-1.0.0-1.aarch64\n  - nothing provides
- libcrypto.so.10(libcrypto.so.10)(64bit) needed by cephmount-1.0.0-1.aarch64", "rc": 1, "results": []}

注释:原因是缺少openssl及其依赖包,使用外网下载最新版本的openssl-1:1.1.1m-5 后,依旧出现如下问题。
解决方案: 首先去我们自制的软件仓库使用rpm -ivh cephmount-1.0.0-1.aarch64.rpm 。提示报错,原因是需要openssl-1:1.0.2k的低版本的openssl。我们这边只需要重新打包一个最新的cephmount ,rpm包即可。或者如果是使用开源技术的话直接去外网下载最新的cephmount

总结:cephmount版本过低,无法兼容最新的opnessl


报障日志3

fatal: [master01]: FAILED! => {"changed": true, "cmd": "kubeadm init --config /etc/kubernetes/kubeadm-config.yaml -v=5", "delta": "0:00:00.380079", "end": "2022-07-04 09:30:56.707434", "msg": "non-zero return code", "rc": 1, "start": "2022-07-04 09:30:56.327355", "stderr": "I0704 09:30:56.427082 207693 initconfiguration.go:246] loading configuration from \"/etc/kubernetes/kubeadm-config.yaml\"\nI0704 09:30:56.440094 207693 interface.go:431] Looking for default routes with IPv4 addresses\nI0704 09:30:56.440134 207693 interface.go:436] Default route transits interface \"enp0s18\"\nI0704 09:30:56.440506 207693 interface.go:208] Interface enp0s18 is up\nI0704 09:30:56.440690 207693 interface.go:256] Interface \"enp0s18\" has 2 addresses :[10.165.141.79/24 fe80::9c6d:aaa3:9e6b:2d60/64].\nI0704 09:30:56.440742 207693 interface.go:223] Checking addr 10.165.141.79/24.\nI0704 09:30:56.440759 207693 interface.go:230] IP found 10.165.141.79\nI0704 09:30:56.440775 207693 interface.go:262] Found valid IPv4 address 10.165.141.79 for interface \"enp0s18\".\nI0704 09:30:56.440789 207693 interface.go:442] Found active IP 10.165.141.79 \nI0704 09:30:56.452720 207693 checks.go:582] validating Kubernetes and kubeadm version\nI0704 09:30:56.452789 207693 checks.go:167] validating if the firewall is enabled and active\nI0704 09:30:56.486520 207693 checks.go:202] validating availability of port 6443\nI0704 09:30:56.487059 207693 checks.go:202] validating availability of port 10259\nI0704 09:30:56.487141 207693 checks.go:202] validating availability of port 10257\nI0704 09:30:56.487196 207693 checks.go:287] validating the existence of file /etc/kubernetes/manifests/kube-apiserver.yaml\nI0704 09:30:56.487248 207693 checks.go:287] validating the existence of file /etc/kubernetes/manifests/kube-controller-manager.yaml\nI0704 09:30:56.487270 207693 checks.go:287] validating the existence of file /etc/kubernetes/manifests/kube-scheduler.yaml\nI0704 09:30:56.487286 207693 checks.go:287] validating the existence of file /etc/kubernetes/manifests/etcd.yaml\nI0704 09:30:56.487302 207693 checks.go:437] validating if the connectivity type is via proxy or direct\nI0704 09:30:56.487371 207693 checks.go:476] validating http connectivity to first IP address in the CIDR\nI0704 09:30:56.487421 207693 checks.go:476] validating http connectivity to first IP address in the CIDR\nI0704 09:30:56.487446 207693 checks.go:103] validating the container runtime\nI0704 09:30:56.513452 207693 checks.go:377] validating the presence of executable crictl\nI0704 09:30:56.513568 207693 checks.go:336] validating the contents of file /proc/sys/net/bridge/bridge-nf-call-iptables\nI0704 09:30:56.513709 207693 checks.go:336] validating the contents of file /proc/sys/net/ipv4/ip_forward\nI0704 09:30:56.513791 207693 checks.go:654] validating whether swap is enabled or not\nI0704 09:30:56.513882 207693 checks.go:377] validating the presence of executable conntrack\nI0704 09:30:56.513934 207693 checks.go:377] validating the presence of executable ip\nI0704 09:30:56.513964 207693 checks.go:377] validating the presence of executable iptables\nI0704 09:30:56.514102 207693 checks.go:377] validating the presence of executable mount\nI0704 09:30:56.514147 207693 checks.go:377] validating the presence of executable nsenter\nI0704 09:30:56.514176 207693 checks.go:377] validating the presence of executable ebtables\nI0704 09:30:56.514272 207693 checks.go:377] validating the presence of executable ethtool\nI0704 09:30:56.514315 207693 checks.go:377] validating the presence of executable socat\nI0704 09:30:56.514354 207693 checks.go:377] validating the presence of executable tc\nI0704 09:30:56.514400 207693 checks.go:377] validating the presence of executable touch\nI0704 09:30:56.514435 207693 checks.go:525] running all checks\nI0704 09:30:56.531998 207693 checks.go:408] checking whether the given node name is valid and reachable using net.LookupHost\nI0704 09:30:56.532455 207693 checks.go:623] validating kubelet version\nI0704 09:30:56.668051 207693 checks.go:129] validating if the \"kubelet\" service is enabled and active\n\t[WARNING Service-Kubelet]: kubelet service is not enabled, please run 'systemctl enable kubelet.service'\nI0704 09:30:56.702480 207693 checks.go:202] validating availability of port 10250\nI0704 09:30:56.702604 207693 checks.go:202] validating availability of port 2379\nI0704 09:30:56.702683 207693 checks.go:202] validating availability of port 2380\nI0704 09:30:56.702748 207693 checks.go:250] validating the existence and emptiness of directory /u01/local/kube-system/etcd/\n[preflight] Some fatal errors occurred:\n\t[ERROR KubeletVersion]: the kubelet version is higher than the control plane version. This is not a supported version skew and may lead to a malfunctional cluster. Kubelet version: \"1.24.2\" Control plane version: \"1.21.8\"\n[preflight] If you know what you are doing, you can make a check non-fatal with --ignore-preflight-errors=...\nerror execution phase preflight\nk8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(Runner).Run.func1\n\t/root/kubernetes-1.21.8/_output/local/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:235\nk8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(Runner).visitAll\n\t/root/kubernetes-1.21.8/_output/local/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:421\nk8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(Runner).Run\n\t/root/kubernetes-1.21.8/_output/local/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:207\nk8s.io/kubernetes/cmd/kubeadm/app/cmd.newCmdInit.func1\n\t/root/kubernetes-1.21.8/_output/local/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/init.go:152\nk8s.io/kubernetes/vendor/github.com/spf13/cobra.(Command).execute\n\t/root/kubernetes-1.21.8/_output/local/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:850\nk8s.io/kubernetes/vendor/github.com/spf13/cobra.(Command).ExecuteC\n\t/root/kubernetes-1.21.8/_output/local/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:958\nk8s.io/kubernetes/vendor/github.com/spf13/cobra.(Command).Execute\n\t/root/kubernetes-1.21.8/_output/local/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:895\nk8s.io/kubernetes/cmd/kubeadm/app.Run\n\t/root/kubernetes-1.21.8/_output/local/go/src/k8s.io/kubernetes/cmd/kubeadm/app/kubeadm.go:50\nmain.main\n\t_output/local/go/src/k8s.io/kubernetes/cmd/kubeadm/kubeadm.go:25\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:255\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_arm64.s:1133", "stderr_lines": ["I0704 09:30:56.427082 207693 initconfiguration.go:246] loading configuration from \"/etc/kubernetes/kubeadm-config.yaml\"", "I0704 09:30:56.440094 207693 interface.go:431] Looking for default routes with IPv4 addresses", "I0704 09:30:56.440134 207693 interface.go:436] Default route transits interface \"enp0s18\"", "I0704 09:30:56.440506 207693 interface.go:208] Interface enp0s18 is up", "I0704 09:30:56.440690 207693 interface.go:256] Interface \"enp0s18\" has 2 addresses :[10.165.141.79/24 fe80::9c6d:aaa3:9e6b:2d60/64].", "I0704 09:30:56.440742 207693 interface.go:223] Checking addr 10.165.141.79/24.", "I0704 09:30:56.440759 207693 interface.go:230] IP found 10.165.141.79", "I0704 09:30:56.440775 207693 interface.go:262] Found valid IPv4 address 10.165.141.79 for interface \"enp0s18\".", "I0704 09:30:56.440789 207693 interface.go:442] Found active IP 10.165.141.79 ", "I0704 09:30:56.452720 207693 checks.go:582] validating Kubernetes and kubeadm version", "I0704 09:30:56.452789 207693 checks.go:167] validating if the firewall is enabled and active", "I0704 09:30:56.486520 207693 checks.go:202] validating availability of port 6443", "I0704 09:30:56.487059 207693 checks.go:202] validating availability of port 10259", "I0704 09:30:56.487141 207693 checks.go:202] validating availability of port 10257", "I0704 09:30:56.487196 207693 checks.go:287] validating the existence of file /etc/kubernetes/manifests/kube-apiserver.yaml", "I0704 09:30:56.487248 207693 checks.go:287] validating the existence of file /etc/kubernetes/manifests/kube-controller-manager.yaml", "I0704 09:30:56.487270 207693 checks.go:287] validating the existence of file /etc/kubernetes/manifests/kube-scheduler.yaml", "I0704 09:30:56.487286 207693 checks.go:287] validating the existence of file /etc/kubernetes/manifests/etcd.yaml", "I0704 09:30:56.487302 207693 checks.go:437] validating if the connectivity type is via proxy or direct", "I0704 09:30:56.487371 207693 checks.go:476] validating http connectivity to first IP address in the CIDR", "I0704 09:30:56.487421 207693 checks.go:476] validating http connectivity to first IP address in the CIDR", "I0704 09:30:56.487446 207693 checks.go:103] validating the container runtime", "I0704 09:30:56.513452 207693 checks.go:377] validating the presence of executable crictl", "I0704 09:30:56.513568 207693 checks.go:336] validating the contents of file /proc/sys/net/bridge/bridge-nf-call-iptables", "I0704 09:30:56.513709 207693 checks.go:336] validating the contents of file /proc/sys/net/ipv4/ip_forward", "I0704 09:30:56.513791 207693 checks.go:654] validating whether swap is enabled or not", "I0704 09:30:56.513882 207693 checks.go:377] validating the presence of executable conntrack", "I0704 09:30:56.513934 207693 checks.go:377] validating the presence of executable ip", "I0704 09:30:56.513964 207693 checks.go:377] validating the presence of executable iptables", "I0704 09:30:56.514102 207693 checks.go:377] validating the presence of executable mount", "I0704 09:30:56.514147 207693 checks.go:377] validating the presence of executable nsenter", "I0704 09:30:56.514176 207693 checks.go:377] validating the presence of executable ebtables", "I0704 09:30:56.514272 207693 checks.go:377] validating the presence of executable ethtool", "I0704 09:30:56.514315 207693 checks.go:377] validating the presence of executable socat", "I0704 09:30:56.514354 207693 checks.go:377] validating the presence of executable tc", "I0704 09:30:56.514400 207693 checks.go:377] validating the presence of executable touch", "I0704 09:30:56.514435 207693 checks.go:525] running all checks", "I0704 09:30:56.531998 207693 checks.go:408] checking whether the given node name is valid and reachable using net.LookupHost", "I0704 09:30:56.532455 207693 checks.go:623] validating kubelet version", "I0704 09:30:56.668051 207693 checks.go:129] validating if the \"kubelet\" service is enabled and active", "\t[WARNING Service-Kubelet]: kubelet service is not enabled, please run 'systemctl enable kubelet.service'", "I0704 09:30:56.702480 207693 checks.go:202] validating availability of port 10250", "I0704 09:30:56.702604 207693 checks.go:202] validating availability of port 2379", "I0704 09:30:56.702683 207693 checks.go:202] validating availability of port 2380", "I0704 09:30:56.702748 207693 checks.go:250] validating the existence and emptiness of directory /u01/local/kube-system/etcd/", "[preflight] Some fatal errors occurred:", "\t[ERROR KubeletVersion]: the kubelet version is higher than the control plane version. This is not a supported version skew and may lead to a malfunctional cluster. Kubelet version: \"1.24.2\" Control plane version: \"1.21.8\"", "[preflight] If you know what you are doing, you can make a check non-fatal with --ignore-preflight-errors=...", "error execution phase preflight", "k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(Runner).Run.func1", "\t/root/kubernetes-1.21.8/_output/local/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:235", "k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(Runner).visitAll", "\t/root/kubernetes-1.21.8/_output/local/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:421", "k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(Runner).Run", "\t/root/kubernetes-1.21.8/_output/local/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:207", "k8s.io/kubernetes/cmd/kubeadm/app/cmd.newCmdInit.func1", "\t/root/kubernetes-1.21.8/_output/local/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/init.go:152", "k8s.io/kubernetes/vendor/github.com/spf13/cobra.(Command).execute", "\t/root/kubernetes-1.21.8/_output/local/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:850", "k8s.io/kubernetes/vendor/github.com/spf13/cobra.(Command).ExecuteC", "\t/root/kubernetes-1.21.8/_output/local/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:958", "k8s.io/kubernetes/vendor/github.com/spf13/cobra.(Command).Execute", "\t/root/kubernetes-1.21.8/_output/local/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:895", "k8s.io/kubernetes/cmd/kubeadm/app.Run", "\t/root/kubernetes-1.21.8/_output/local/go/src/k8s.io/kubernetes/cmd/kubeadm/app/kubeadm.go:50", "main.main", "\t_output/local/go/src/k8s.io/kubernetes/cmd/kubeadm/kubeadm.go:25", "runtime.main", "\t/usr/local/go/src/runtime/proc.go:255", "runtime.goexit", "\t/usr/local/go/src/runtime/asm_arm64.s:1133"], "stdout": "[config] WARNING: Ignored YAML document with GroupVersionKind kubeadm.k8s.io/v1beta2, Kind=JoinConfiguration\n[init] Using Kubernetes version: v1.21.8\n[preflight] Running pre-flight checks", "stdout_lines": ["[config] WARNING: Ignored YAML document with GroupVersionKind kubeadm.k8s.io/v1beta2, Kind=JoinConfiguration", "[init] Using Kubernetes version: v1.21.8", "[preflight] Running pre-flight checks"]}
```html/xml
原因:在安装的时候已经存在高版本的k8s了,我这边的产品需求的是1218版本,
而这台机器可能因为连接外网的yum源的缘故,导致在安装k8s的时候是最新版本的1242版本
。我们自制软件仓库里的1218版本无法安装

解决方案:在我部署我的产品前,我先卸载掉1242版本的k8s,
并且删除掉对应的缓存,然后再重新安装我指定的1218的k8s