启用ISTIO-CNI后自动注入的POD会启动istio-validation容器用来检测网络是否正常,在为我们公司另外一条业务线的测试环境Setup时发现istio-validation容器无法启动,日志输出:

Error connecting to 127.0.0.6:15002: dial tcp 127.0.0.1:0->127.0.0.6:15002: connect: connection refused

各种排查,最后查看系统日志journalctl -ex

Nov 02 14:50:30 k8s-worker-03 kubelet[1029]: W1102 14:50:30.291177    1029 cni.go:202] Error validating CNI config list {
Nov 02 14:50:30 k8s-worker-03 kubelet[1029]: "name": "cbr0",
Nov 02 14:50:30 k8s-worker-03 kubelet[1029]: "cniVersion": "0.3.1",
Nov 02 14:50:30 k8s-worker-03 kubelet[1029]: "plugins": [
Nov 02 14:50:30 k8s-worker-03 kubelet[1029]: {
Nov 02 14:50:30 k8s-worker-03 kubelet[1029]: "type": "flannel",
Nov 02 14:50:30 k8s-worker-03 kubelet[1029]: "delegate": {
Nov 02 14:50:30 k8s-worker-03 kubelet[1029]: "hairpinMode": true,
Nov 02 14:50:30 k8s-worker-03 kubelet[1029]: "isDefaultGateway": true
Nov 02 14:50:30 k8s-worker-03 kubelet[1029]: }
Nov 02 14:50:30 k8s-worker-03 kubelet[1029]: },
Nov 02 14:50:30 k8s-worker-03 kubelet[1029]: {
Nov 02 14:50:30 k8s-worker-03 kubelet[1029]: "type": "portmap",
Nov 02 14:50:30 k8s-worker-03 kubelet[1029]: "capabilities": {
Nov 02 14:50:30 k8s-worker-03 kubelet[1029]: "portMappings": true
Nov 02 14:50:30 k8s-worker-03 kubelet[1029]: }
Nov 02 14:50:30 k8s-worker-03 kubelet[1029]: },
Nov 02 14:50:30 k8s-worker-03 kubelet[1029]: {
Nov 02 14:50:30 k8s-worker-03 kubelet[1029]: "cniVersion": "0.3.1",
Nov 02 14:50:30 k8s-worker-03 kubelet[1029]: "name": "istio-cni",
Nov 02 14:50:30 k8s-worker-03 kubelet[1029]: "type": "istio-cni",
Nov 02 14:50:30 k8s-worker-03 kubelet[1029]: "log_level": "info",
Nov 02 14:50:30 k8s-worker-03 kubelet[1029]: "kubernetes": {
Nov 02 14:50:30 k8s-worker-03 kubelet[1029]: "kubeconfig": "/etc/cni/net.d/ZZZ-istio-cni-kubeconfig",
Nov 02 14:50:30 k8s-worker-03 kubelet[1029]: "cni_bin_dir": "/opt/cni/bin",
Nov 02 14:50:30 k8s-worker-03 kubelet[1029]: "exclude_namespaces": [
Nov 02 14:50:30 k8s-worker-03 kubelet[1029]: "istio-system",
Nov 02 14:50:30 k8s-worker-03 kubelet[1029]: "kube-system"
Nov 02 14:50:30 k8s-worker-03 kubelet[1029]: ]
Nov 02 14:50:30 k8s-worker-03 kubelet[1029]: }
Nov 02 14:50:30 k8s-worker-03 kubelet[1029]: }
Nov 02 14:50:30 k8s-worker-03 kubelet[1029]: ]
Nov 02 14:50:30 k8s-worker-03 kubelet[1029]: }
Nov 02 14:50:30 k8s-worker-03 kubelet[1029]: : [failed to find plugin "istio-cni" in path [/opt/kube/bin]]
Nov 02 14:50:30 k8s-worker-03 kubelet[1029]: W1102 14:50:30.291194    1029 cni.go:237] Unable to update cni config: no valid networks found in /etc/cni/net.d

发现是因为istio-cni的配置和K8S配置的cni可执行文件路径不一致导致,istio-cni的demonset启动的pod无法调用该文件夹下的二进制文件创建IPTABLES规则,这种情况比较容易出现在借助各种第三方工具进行K8S集群部署的环境中比如ansible部署k8s集群,默认CNI可执行文件目录在/opt/kube/bin而istio默认设置为/opt/cni/bin,查看configmap或者istio-cni的pod日志都可以找到

解决方案:

方案一:

修改部署istio都yaml文件加入官方说明的cniBinDir: 你的路径

  cni:
      excludeNamespaces:
       - istio-system
       - kube-system
      logLevel: info
      cniBinDir: /opt/kube/bin
      repair:
        enabled: true
        deletePods: false

或者命令行方式部署时加入--set values.cni.cniBinDir=... 和 --set values.cni.cniConfDir=... 选项

方案二:

修改istio-system空间下名为istio-cni-config的configmap
找到cniBinDir更改为正确的路径,重新生成所有pod

以上只列举了bin目录的错误,不同环境中也有可能是cniConfDir的错误,修改为正确的就好。