K8S使用过程遇到的问题

1、The connection to the server localhost:8080 was refused - did you specify the right host or port?

问题分析

环境变量 原因:kubernetes master没有与本机绑定,集群初始化的时候没有绑定,此时设置在本机的环境变量即可解决问题。

解决: 步骤一:设置环境变量

##具体根据情况,此处记录linux设置该环境变量
ll /etc/kubernetes/kubelet.conf
-rw------- 1 root root 1906 1月  15 09:52 /etc/kubernetes/kubelet.conf

##方式一:编辑文件设置
vim /etc/profile
>>在底部增加新的环境变量 export KUBECONFIG=/etc/kubernetes/kubelet.conf

##方式二:直接追加文件内容
echo "export KUBECONFIG=/etc/kubernetes/kubelet.conf" >> /etc/profile

步骤二:使生效

source /etc/profile

AWS EKS 解决方法:

aws eks update-kubeconfig --region cn-north-1 --name <cluster_name>

2、INSTALLATION FAILED: cannot re-use a name that is still in use

#执行helm安装时报错 Error: INSTALLATION FAILED: cannot re-use a name that is still in use

解决:

helm ls --all-namespaces
kubectl delete namespace qsh-test
kubectl create namespace qsh-test

3、Pod无法删除

每当删除namespace或pod 等一些Kubernetes资源时,有时资源状态会卡在terminating,很长时间无法删除,甚至有时增加--force flag(强制删除)之后还是无法正常删除。这时就需要edit该资源,将字段finalizers设置为null,之后Kubernetes资源就正常删除了。

当删除pod时有时会卡住,pod状态变为terminating,无法删除pod

强制删除

kubectl delete pod xxx -n xxx --force --grace-period=0

如果强制删除还不行,设置finalizers为空

(如果一个容器已经在运行,这时需要对一些容器属性进行修改,又不想删除容器,或不方便通过replace的方式进行更新。kubernetes还提供了一种在容器运行时,直接对容器进行修改的方式,就是patch命令。)

kubectl patch pod xxx -n xxx -p '{"metadata":{"finalizers":null}}'

这样pod就可以删除了。

4、Namespace无法删除

unable to create new content in namespace posthog because it is being terminated

现象:

##命名空间一直处于Terminating状态
[ec2-user@eks posthog]$ kubectl get ns -owide
NAME                       STATUS        AGE
default                    Active        3d
kube-node-lease            Active        3d
kube-public                Active        3d
kube-system                Active        3d
posthog                    Terminating   3h23m

##执行强制删除命令会一直卡住
[ec2-user@eks posthog]$ kubectl delete ns posthog --grace-period=0 --force
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
\namespace "posthog" force deleted

解决:

##查看posthog的命名空间描述
kubectl get ns posthog -o json > ns-posthog.json

##删除spec
###删除前内容如下:
    "spec": {
        "finalizers": [
            "kubernetes"
        ]
    },

###删除后内容如下:
    "spec": {
    },

##打开一个新窗口运行kubectl proxy跑一个API代理在本地的8081端口
kubectl proxy --port=8081

##curl删除
curl -k -H "Content-Type:application/json" -X PUT --data-binary @ns-posthog.json http://127.0.0.1:8081/api/v1/namespaces/posthog/finalize

##重新检查,发现已删除
kubectl get ns

5、PV无法删除

K8s 集群内有一个已经不再使用的 PV,虽然已经删除了与其关联的 Pod 及 PVC,并对其执行了删除命令,但仍无法正常删除,一直处于 Terminating 状态:

image.png

解决方法:

##执行如下命令强制删除(efs-pv 替换成实际需要删除的 pv 名称):
kubectl patch pv efs-pv -p '{"metadata":{"finalizers":null}}'

##再次查看可以发现该 pv 已被删除

6、创建nginx-ingress-controller 出错

Error: INSTALLATION FAILED: rendered manifests contain a resource that already exists. Unable to continue with install: IngressClass "nginx" in namespace "" exists and cannot be imported into the current release: invalid ownership metadata; annotation validation error: missing key "meta.helm.sh/release-name": must be set to "k8s-nginx"; annotation validation error: missing key "meta.helm.sh/release-namespace": must be set to "nginx-ingress-controller" helm.go:84: [debug] IngressClass "nginx" in namespace "" exists and cannot be imported into the current release: invalid ownership metadata; annotation validation error: missing key "meta.helm.sh/release-name": must be set to "k8s-nginx"; annotation validation error: missing key "meta.helm.sh/release-namespace": must be set to "nginx-ingress-controller" rendered manifests contain a resource that already exists. Unable to continue with install

原因分析: 使用 helm 创建nginx-ingress-controller时出错

查看helm chart仓库values.yaml文件

#... ... ...
##查看以下字段
ingressClassResource:
  name: nginx
  enabled: true
  default: false
  controllerClass: "k8s.io/ingress-nginx"
  parameters: {}
#... ... ...

解决方法:

helm install k8s-nginx mynginx/nginx-ingress-controller -n nginx-ingress-controller --create-namespace --set ingressClassResource.name="nginx-new"

如果没生效,使用以下命令:

helm install k8s-nginx mynginx/nginx-ingress-controller -n nginx-ingress-controller --create-namespace --set controller.ingressClassResource.name="nginx-new"

etcd集群部署遇到的问题

1、etcd.serverice启动报错,显示–logger=zap有问题

解决方法: 修改配置文件,去掉该参数,重新启动服务

2、publish error: etcdserver: request timed out,由于etcd集群没有同时启动导致

解决方法: 在部署了etcd的节点上,同时启动etcd服务systemctl start etcd

3、error #1: dial tcp 127.0.0.1:2379: connect: connection refused,由于参数ETCD_LISTEN_CLIENT_URLS没有将172.0.0.1:2379包含在内

解决方法: ETCD_LISTEN_CLIENT_URLS添加https://172.0.0.1:2379或者直接改成0.0.0.0:2379

4、error #1: dial tcp 127.0.0.1:4001: connect: connection refused,由于低版本的peer的监听端口是否4001

解决方法: ETCD_LISTEN_CLIENT_URLS和ETCD_ADVERTISE_CLIENT_URLS参数上配置4001的监听端口

5、error #1: net/http: HTTP/1.x transport connection broken: malformed HTTP response “\x15\x03\x01\x00\x02\x02”,由于配置信息监听地址写成了http://

解决方法: 将监听地址改成https://

kube-apiserver.service 遇到的错误

1、error: unable to find suitable network address.error=‘no default routes found in “/proc/net/route” or “/proc/net/ipv6_route”’. Tr… to fix this,由于没有配置网关路由问题

解决方法:

route add default gw 172.16.0.1

2、error: --etcd-servers must be specified

解决:sudo journalctl -xe -u kube-apiserver | more通过查看更多错误信息,除了error: --etcd-servers must be specified错误提示外无其他错误信息,通过手动执行system unit检查是否配置有误,手动能正常启动,说明配置文件可能存在字符错误,重新写入配置后,启动正常

3、watch chan error: etcdserver: mvcc: required revision has been compacted,由于etcd的版本问题导致的,不影响功能的使用

解决方法: 可以安装对应版本的etcd

kubelet和kube-proxy 部署遇到的错误

1、failed complete: v1alpha1.KubeProxyConfiguration.ClientConnection

failed complete: v1alpha1.KubeProxyConfiguration.ClientConnection: readObjectStart: expect { or n, but found “, error found in #10 byte of …|nection”:“kubeconfig|…, bigger context …|pha1”,“bindAddress”:“0.0.0.0”,“clientConnection”:"kubeconfig:/data/kubernetes/cfg/kube-proxy.kubecon|…

解决方法: 检查yml文件格式是否正确,yml配置文件遇到":“或者”-"后面必须留一个空格!

2、network plugin is not ready: cni config uninitialized

Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized,由于没有插件cni

解决方法: 修改kubelet.conf配置文件去掉相关配置参数–network-plugin=cni,重启服务即可或者下在安装cni插件

3、failed to get imageFs info: unable to find data in memory cache

在错误日志中发现:E0927 15:38:12.475997 16586 kubelet.go:1308] Image garbage collection failed once. Stats initialization may not have completed yet: failed to get imageFs info: unable to find data in memory cache

解决方法:

yum -y upgrade systemd

4、failed to run Kubelet: Running with swap on is not supported, please disable swap! or set --fail-swap-on flag to f…

解决方法: node节点上没有关闭交换分区,临时关闭的swapoff -a,最好就是永久关闭

————————————————