22:58:25 CST stdout: [master2]
Job for etcd.service failed because a timeout was exceeded.
See "systemctl status etcd.service" and "journalctl -xeu etcd.service" for details.
22:58:25 CST message: [master2]
start etcd failed: Failed to exec command: sudo -E /bin/bash -c "systemctl daemon-reload && systemctl restart etcd && systemctl enable etcd"
Job for etcd.service failed because a timeout was exceeded.
See "systemctl status etcd.service" and "journalctl -xeu etcd.service" for details.: Process exited with status 1
22:59:51 CST stdout: [master1]
Job for etcd.service failed because a timeout was exceeded.
See "systemctl status etcd.service" and "journalctl -xeu etcd.service" for details.
22:59:51 CST message: [master1]
start etcd failed: Failed to exec command: sudo -E /bin/bash -c "systemctl daemon-reload && systemctl restart etcd && systemctl enable etcd"
Job for etcd.service failed because a timeout was exceeded.
See "systemctl status etcd.service" and "journalctl -xeu etcd.service" for details.: Process exited with status 1

kupesphere 踩坑记录_bash

我以为是kubesphere中kk的bug或者版本问题,换了几个版本,发现还是有这个问题,查看系统日志

只看到

Oct 20 22:59:45 v2141 etcd[2304]: rejected connection from "192.168.122.142:46836" (error "remote error: tls: bad certificate", ServerName "")
Oct 20 22:59:45 v2141 etcd[2304]: rejected connection from "192.168.122.142:46848" (error "remote error: tls: bad certificate", ServerName "")
Oct 20 22:59:45 v2141 etcd[2304]: rejected connection from "192.168.122.143:47460" (error "remote error: tls: bad certificate", ServerName "")
Oct 20 22:59:45 v2141 etcd[2304]: rejected connection from "192.168.122.143:47470" (error "remote error: tls: bad certificate", ServerName "")

直到我手动执行

systemctl daemon-reload && systemctl restart etcd && systemctl enable etcd

tail -f /var/log/syslog

发现如下

Oct 20 22:59:45 v2141 etcd[2304]: rejected connection from "192.168.122.143:47470" (error "remote error: tls: bad certificate", ServerName "")
Oct 20 22:59:45 v2141 etcd[2304]: health check for peer 4b6b6b04950cb4b0 could not connect: x509: certificate is valid for 127.0.0.1, ::1, 192.168.122.121, 192.168.122.122, 192.168.122.123, 192.168.122.124, 192.168.122.125, 192.168.122.126, 192.168.122.127, not 192.168.122.143
Oct 20 22:59:45 v2141 etcd[2304]: health check for peer 4b6b6b04950cb4b0 could not connect: x509: certificate is valid for 127.0.0.1, ::1, 192.168.122.121, 192.168.122.122, 192.168.122.123, 192.168.122.124, 192.168.122.125, 192.168.122.126, 192.168.122.127, not 192.168.122.143
Oct 20 22:59:45 v2141 etcd[2304]: rejected connection from "192.168.122.142:46860" (error "remote error: tls: bad certificate", ServerName "")
Oct 20 22:59:45 v2141 etcd[2304]: rejected connection from "192.168.122.142:46862" (error "remote error: tls: bad certificate", ServerName "")
Oct 20 22:59:45 v2141 etcd[2304]: health check for peer 8959ed642c954627 could not connect: x509: certificate is valid for 127.0.0.1, ::1, 192.168.122.121, 192.168.122.122, 192.168.122.123, 192.168.122.124, 192.168.122.125, 192.168.122.126, 192.168.122.127, not 192.168.122.142

kupesphere 踩坑记录_bash_02

证书之前之前另外一个集群的,md5sum 得到了确认

root@master1:~# ip a s eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 96:2c:8f:a0:7b:02 brd ff:ff:ff:ff:ff:ff
    altname enp6s18
    inet 192.168.122.121/24 brd 192.168.122.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::942c:8fff:fea0:7b02/64 scope link 
       valid_lft forever preferred_lft forever
root@master1:~# md5sum /etc/ssl/etcd/ssl/admin-master1-key.pem
c29eacd168e16ce0acb23f7b9540dd18  /etc/ssl/etcd/ssl/admin-master1-key.pem

kupesphere 踩坑记录_bash_03

kupesphere 踩坑记录_系统日志_04

./kk delete cluster -f config-sample.yaml 清除集群不会删除 文件夹kubekey中之前生成的配置比如key

kupesphere 踩坑记录_bash_05

./kk delete cluster -f config-sample.yaml  && rm -rf kubekey

然后就ok了

kupesphere 踩坑记录_bash_06