Kubernetes 报错小结

原创

饭桶也得吃饭 2022-01-15 10:07:40 博主文章分类：云原生 ©著作权

文章标签 linux k8s 文章分类 虚拟化云计算

©著作权归作者所有：来自51CTO博客作者饭桶也得吃饭的原创作品，请联系作者获取转载授权，否则将追究法律责任

K8S使用过程遇到的问题

1、The connection to the server localhost:8080 was refused - did you specify the right host or port?

问题分析

环境变量原因：kubernetes master没有与本机绑定，集群初始化的时候没有绑定，此时设置在本机的环境变量即可解决问题。

解决： 步骤一：设置环境变量

##具体根据情况，此处记录linux设置该环境变量
ll /etc/kubernetes/kubelet.conf
-rw------- 1 root root 1906 1月  15 09:52 /etc/kubernetes/kubelet.conf

##方式一：编辑文件设置
vim /etc/profile
>>在底部增加新的环境变量 export KUBECONFIG=/etc/kubernetes/kubelet.conf

##方式二:直接追加文件内容
echo "export KUBECONFIG=/etc/kubernetes/kubelet.conf" >> /etc/profile

步骤二：使生效

source /etc/profile

AWS EKS 解决方法：

aws eks update-kubeconfig --region cn-north-1 --name <cluster_name>

2、INSTALLATION FAILED: cannot re-use a name that is still in use

#执行helm安装时报错 Error: INSTALLATION FAILED: cannot re-use a name that is still in use

解决：

helm ls --all-namespaces
kubectl delete namespace qsh-test
kubectl create namespace qsh-test

3、Pod无法删除

每当删除namespace或pod 等一些Kubernetes资源时，有时资源状态会卡在terminating，很长时间无法删除，甚至有时增加--force flag(强制删除)之后还是无法正常删除。这时就需要edit该资源，将字段finalizers设置为null，之后Kubernetes资源就正常删除了。

当删除pod时有时会卡住，pod状态变为terminating，无法删除pod

强制删除

kubectl delete pod xxx -n xxx --force --grace-period=0

如果强制删除还不行，设置finalizers为空

（如果一个容器已经在运行，这时需要对一些容器属性进行修改，又不想删除容器，或不方便通过replace的方式进行更新。kubernetes还提供了一种在容器运行时，直接对容器进行修改的方式，就是patch命令。）

kubectl patch pod xxx -n xxx -p '{"metadata":{"finalizers":null}}'

这样pod就可以删除了。

4、Namespace无法删除

unable to create new content in namespace posthog because it is being terminated

现象：

##命名空间一直处于Terminating状态
[ec2-user@eks posthog]$ kubectl get ns -owide
NAME                       STATUS        AGE
default                    Active        3d
kube-node-lease            Active        3d
kube-public                Active        3d
kube-system                Active        3d
posthog                    Terminating   3h23m

##执行强制删除命令会一直卡住
[ec2-user@eks posthog]$ kubectl delete ns posthog --grace-period=0 --force
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
\namespace "posthog" force deleted

解决：

##查看posthog的命名空间描述
kubectl get ns posthog -o json > ns-posthog.json

##删除spec
###删除前内容如下：
    "spec": {
        "finalizers": [
            "kubernetes"
        ]
    },

###删除后内容如下：
    "spec": {
    },

##打开一个新窗口运行kubectl proxy跑一个API代理在本地的8081端口
kubectl proxy --port=8081

##curl删除
curl -k -H "Content-Type:application/json" -X PUT --data-binary @ns-posthog.json http://127.0.0.1:8081/api/v1/namespaces/posthog/finalize

##重新检查，发现已删除
kubectl get ns

5、PV无法删除

K8s 集群内有一个已经不再使用的 PV，虽然已经删除了与其关联的 Pod 及 PVC，并对其执行了删除命令，但仍无法正常删除，一直处于 Terminating 状态：

解决方法：

##执行如下命令强制删除（efs-pv 替换成实际需要删除的 pv 名称）：
kubectl patch pv efs-pv -p '{"metadata":{"finalizers":null}}'

##再次查看可以发现该 pv 已被删除

6、创建nginx-ingress-controller 出错

Error: INSTALLATION FAILED: rendered manifests contain a resource that already exists. Unable to continue with install: IngressClass "nginx" in namespace "" exists and cannot be imported into the current release: invalid ownership metadata; annotation validation error: missing key "meta.helm.sh/release-name": must be set to "k8s-nginx"; annotation validation error: missing key "meta.helm.sh/release-namespace": must be set to "nginx-ingress-controller" helm.go:84: [debug] IngressClass "nginx" in namespace "" exists and cannot be imported into the current release: invalid ownership metadata; annotation validation error: missing key "meta.helm.sh/release-name": must be set to "k8s-nginx"; annotation validation error: missing key "meta.helm.sh/release-namespace": must be set to "nginx-ingress-controller" rendered manifests contain a resource that already exists. Unable to continue with install

原因分析： 使用 helm 创建nginx-ingress-controller时出错

查看helm chart仓库values.yaml文件

#... ... ...
##查看以下字段
ingressClassResource:
  name: nginx
  enabled: true
  default: false
  controllerClass: "k8s.io/ingress-nginx"
  parameters: {}
#... ... ...

解决方法：

helm install k8s-nginx mynginx/nginx-ingress-controller -n nginx-ingress-controller --create-namespace --set ingressClassResource.name="nginx-new"

如果没生效，使用以下命令：

helm install k8s-nginx mynginx/nginx-ingress-controller -n nginx-ingress-controller --create-namespace --set controller.ingressClassResource.name="nginx-new"

etcd集群部署遇到的问题

1、etcd.serverice启动报错，显示–logger=zap有问题

解决方法： 修改配置文件，去掉该参数，重新启动服务

2、publish error: etcdserver: request timed out，由于etcd集群没有同时启动导致

解决方法： 在部署了etcd的节点上，同时启动etcd服务systemctl start etcd

3、error #1: dial tcp 127.0.0.1:2379: connect: connection refused，由于参数ETCD_LISTEN_CLIENT_URLS没有将172.0.0.1:2379包含在内

解决方法： ETCD_LISTEN_CLIENT_URLS添加https://172.0.0.1:2379或者直接改成0.0.0.0:2379

4、error #1: dial tcp 127.0.0.1:4001: connect: connection refused，由于低版本的peer的监听端口是否4001

解决方法： ETCD_LISTEN_CLIENT_URLS和ETCD_ADVERTISE_CLIENT_URLS参数上配置4001的监听端口

5、error #1: net/http: HTTP/1.x transport connection broken: malformed HTTP response “\x15\x03\x01\x00\x02\x02”，由于配置信息监听地址写成了http://

解决方法： 将监听地址改成https://

kube-apiserver.service 遇到的错误

1、error: unable to find suitable network address.error=‘no default routes found in “/proc/net/route” or “/proc/net/ipv6_route”’. Tr… to fix this，由于没有配置网关路由问题

解决方法：

route add default gw 172.16.0.1

2、error: --etcd-servers must be specified

解决:sudo journalctl -xe -u kube-apiserver | more通过查看更多错误信息，除了error: --etcd-servers must be specified错误提示外无其他错误信息，通过手动执行system unit检查是否配置有误，手动能正常启动，说明配置文件可能存在字符错误，重新写入配置后，启动正常

3、watch chan error: etcdserver: mvcc: required revision has been compacted，由于etcd的版本问题导致的，不影响功能的使用

解决方法： 可以安装对应版本的etcd

kubelet和kube-proxy 部署遇到的错误

1、failed complete: v1alpha1.KubeProxyConfiguration.ClientConnection

failed complete: v1alpha1.KubeProxyConfiguration.ClientConnection: readObjectStart: expect { or n, but found “, error found in #10 byte of …|nection”:“kubeconfig|…, bigger context …|pha1”,“bindAddress”:“0.0.0.0”,“clientConnection”:"kubeconfig:/data/kubernetes/cfg/kube-proxy.kubecon|…

解决方法： 检查yml文件格式是否正确，yml配置文件遇到":“或者”-"后面必须留一个空格！

2、network plugin is not ready: cni config uninitialized

Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized，由于没有插件cni

解决方法： 修改kubelet.conf配置文件去掉相关配置参数–network-plugin=cni，重启服务即可或者下在安装cni插件

3、failed to get imageFs info: unable to find data in memory cache

在错误日志中发现：E0927 15:38:12.475997 16586 kubelet.go:1308] Image garbage collection failed once. Stats initialization may not have completed yet: failed to get imageFs info: unable to find data in memory cache

解决方法：