1准备工作

关闭SELinux和防火墙

这样做的目的是为了允许容器访问主机文件系统,让 Pod 网络工作正常。

systemctl stop firewalld && systemctl disable firewalld


sudo setenforce 0
sudo sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config

关闭swap

swapoff -a
sed -i 's/^.*centos-swap/#&/g' /etc/fstab

内核参数修改

# 激活 br_netfilter 模块
modprobe br_netfilter
cat << EOF > /etc/modules-load.d/k8s.conf
br_netfilter
EOF

# 内核参数设置:开启IP转发,允许iptables对bridge的数据进行处理
cat << EOF > /etc/sysctl.d/k8s.conf 
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
EOF

# 立即生效
sysctl --system

修改hostname

hostnamectl set-hostname kube-master-1

配置hosts

在所有master和node节点上配置hosts映射。例如

echo "192.168.252.79 kube-master-1" >> /etc/hosts
echo "192.168.252.88 node1" >> /etc/hosts
echo "192.168.252.89 node2" >> /etc/hosts

集群时间同步

可以使用chrony,此处假定各节点时间是同步的

centos8需要安装iproute-tc

dnf install -y iproute-tc

2安装Docker

安装

yum install -y yum-utils device-mapper-persistent-data lvm2
yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
yum install -y docker-ce
systemctl enable docker && systemctl start docker

配置

Kubernetes 推荐使用 systemd 来代替 cgroupfs。因为 docker 容器默认 cgroup 驱动为 cgroupfs,而 kubelet 默认为 systemd,所以为了使系统更为稳定,容器和 kubelet 应该都使用 systemd 作为 cgroup 驱动。

如果没有该文件,创建即可

cat << EOF > /etc/docker/daemon.json
{
  "registry-mirrors": ["https://registry.docker-cn.com"],
  "exec-opts": ["native.cgroupdriver=systemd"]
}
EOF

systemctl daemon-reload
systemctl restart docker

3安装kubeadm及相关工具

配置yum源

cat << EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF

安装并配置自动补全

yum install -y kubelet kubeadm kubectl
systemctl enable kubelet && systemctl start kubelet

#自动补全
kubeadm completion bash > /etc/bash_completion.d/kubeadm
kubectl completion bash >/etc/bash_completion.d/kubectl
source /etc/bash_completion.d/kubeadm
source /etc/bash_completion.d/kubectl

4安装master

下载kubernetes相关镜像

#查看需要下载的镜像列表
kubeadm config images list

将需要下载:
registry.k8s.io/kube-apiserver:v1.26.0
registry.k8s.io/kube-controller-manager:v1.26.0
registry.k8s.io/kube-scheduler:v1.26.0
registry.k8s.io/kube-proxy:v1.26.0
registry.k8s.io/pause:3.9
registry.k8s.io/etcd:3.5.6-0
registry.k8s.io/coredns/coredns:v1.9.3

安装

kubeadm init --image-repository=registry.aliyuncs.com/google_containers  --kubernetes-version=v1.26.0 --service-cidr=10.1.0.0/16 --pod-network-cidr=10.244.0.0/16

选项说明:
–image-repository:选择用于拉取镜像的镜像仓库(默认为“k8s.gcr.io” )
–kubernetes-version:选择特定的Kubernetes版本(默认为“stable-1”)
–service-cidr:为服务的VIP指定使用的IP地址范围(默认为“10.96.0.0/12”)
–pod-network-cidr:指定Pod网络的IP地址范围。如果设置,则将自动为每个节点分配CIDR。
–control-plane-endpoint=xxx.xxx.xxx.xxx:6443 。指定api-server地址,如果是公网,最好指定

注:
因为后面要部署 flannel,参照flannel文档,如果想不改flannel配置,此处pod网络范围需配置为10.244.0.0/16(和falnnel默认的一致)

如果报错
[ERROR CRI]: container runtime is not running: output: time=“2020-09-24T11:49:16Z” level=fatal msg=“getting status of runtime failed: rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService”
, error: exit status 1
执行如下命令,再重试

rm /etc/containerd/config.toml
systemctl restart containerd

报错失败之后

解决问题之后,reset kubeadm再重试

kubeadm reset

日志查看

tail -f  /var/log/messages

安装的yaml文件位置

/etc/kubernetes/manifests/

遇到的问题

通过 tail -f /var/log/messages 命令查看到报错:
Error syncing pod, skipping" err=“failed to “CreatePodSandbox” for “kube-controller-manager-kube-master-1_kube-system(9609d43a045e55d723a6a6e70d0b050f)” with CreatePodSandboxError: “Failed to create sandbox for pod \“kube-controller-manager-kube-master-1_kube-system(9609d43a045e55d723a6a6e70d0b050f)\”: rpc error: code = Unknown desc = failed to get sandbox image \“registry.k8s.io/pause:3.6\”: failed to pull image \“registry.k8s.io/pause:3.6\”: failed to pull and unpack image \“registry.k8s.io/pause:3.6\”: failed to resolve reference \“registry.k8s.io/pause:3.6\”: failed to do request: Head \“https://asia-east1-docker.pkg.dev/v2/k8s-artifacts-prod/images/pause/manifests/3.6\”: dial tcp 108.177.125.82:443: i/o timeout”” pod=“kube-system/kube-controller-manager-kube-master-1” podUID=9609d43a045e55d723a6a6e70d0b050f

在拉取pause镜像是,超时了。 这个是CRI containerd 报的错,所以改docker的镜像地址不管用,需要修改/etc/containerd/config.toml文件

解决办法

生成默认配置

containerd config default > /etc/containerd/config.toml

覆盖配置文件
参考

disabled_plugins = []
imports = []
oom_score = 0
plugin_dir = ""
required_plugins = []
root = "/var/lib/containerd"
state = "/run/containerd"
temp = ""
version = 2

[cgroup]
  path = ""

[debug]
  address = ""
  format = ""
  gid = 0
  level = ""
  uid = 0

[grpc]
  address = "/run/containerd/containerd.sock"
  gid = 0
  max_recv_message_size = 16777216
  max_send_message_size = 16777216
  tcp_address = ""
  tcp_tls_ca = ""
  tcp_tls_cert = ""
  tcp_tls_key = ""
  uid = 0

[metrics]
  address = ""
  grpc_histogram = false

[plugins]

  [plugins."io.containerd.gc.v1.scheduler"]
    deletion_threshold = 0
    mutation_threshold = 100
    pause_threshold = 0.02
    schedule_delay = "0s"
    startup_delay = "100ms"

  [plugins."io.containerd.grpc.v1.cri"]
    device_ownership_from_security_context = false
    disable_apparmor = false
    disable_cgroup = false
    disable_hugetlb_controller = true
    disable_proc_mount = false
    disable_tcp_service = true
    enable_selinux = false
    enable_tls_streaming = false
    enable_unprivileged_icmp = false
    enable_unprivileged_ports = false
    ignore_image_defined_volumes = false
    max_concurrent_downloads = 3
    max_container_log_line_size = 16384
    netns_mounts_under_state_dir = false
    restrict_oom_score_adj = false
    sandbox_image = "registry.aliyuncs.com/google_containers/pause:3.9"
    selinux_category_range = 1024
    stats_collect_period = 10
    stream_idle_timeout = "4h0m0s"
    stream_server_address = "127.0.0.1"
    stream_server_port = "0"
    systemd_cgroup = false
    tolerate_missing_hugetlb_controller = true
    unset_seccomp_profile = ""

    [plugins."io.containerd.grpc.v1.cri".cni]
      bin_dir = "/opt/cni/bin"
      conf_dir = "/etc/cni/net.d"
      conf_template = ""
      ip_pref = ""
      max_conf_num = 1

    [plugins."io.containerd.grpc.v1.cri".containerd]
      default_runtime_name = "runc"
      disable_snapshot_annotations = true
      discard_unpacked_layers = false
      ignore_rdt_not_enabled_errors = false
      no_pivot = false
      snapshotter = "overlayfs"

      [plugins."io.containerd.grpc.v1.cri".containerd.default_runtime]
        base_runtime_spec = ""
        cni_conf_dir = ""
        cni_max_conf_num = 0
        container_annotations = []
        pod_annotations = []
        privileged_without_host_devices = false
        runtime_engine = ""
        runtime_path = ""
        runtime_root = ""
        runtime_type = ""

        [plugins."io.containerd.grpc.v1.cri".containerd.default_runtime.options]

      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]

        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
          base_runtime_spec = ""
          cni_conf_dir = ""
          cni_max_conf_num = 0
          container_annotations = []
          pod_annotations = []
          privileged_without_host_devices = false
          runtime_engine = ""
          runtime_path = ""
          runtime_root = ""
          runtime_type = "io.containerd.runc.v2"

          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
            BinaryName = ""
            CriuImagePath = ""
            CriuPath = ""
            CriuWorkPath = ""
            IoGid = 0
            IoUid = 0
            NoNewKeyring = false
            NoPivotRoot = false
            Root = ""
            ShimCgroup = ""
            SystemdCgroup = true

      [plugins."io.containerd.grpc.v1.cri".containerd.untrusted_workload_runtime]
        base_runtime_spec = ""
        cni_conf_dir = ""
        cni_max_conf_num = 0
        container_annotations = []
        pod_annotations = []
        privileged_without_host_devices = false
        runtime_engine = ""
        runtime_path = ""
        runtime_root = ""
        runtime_type = ""

        [plugins."io.containerd.grpc.v1.cri".containerd.untrusted_workload_runtime.options]

    [plugins."io.containerd.grpc.v1.cri".image_decryption]
      key_model = "node"

    [plugins."io.containerd.grpc.v1.cri".registry]
      config_path = ""

      [plugins."io.containerd.grpc.v1.cri".registry.auths]

      [plugins."io.containerd.grpc.v1.cri".registry.configs]

      [plugins."io.containerd.grpc.v1.cri".registry.headers]

      [plugins."io.containerd.grpc.v1.cri".registry.mirrors]

    [plugins."io.containerd.grpc.v1.cri".x509_key_pair_streaming]
      tls_cert_file = ""
      tls_key_file = ""

  [plugins."io.containerd.internal.v1.opt"]
    path = "/opt/containerd"

  [plugins."io.containerd.internal.v1.restart"]
    interval = "10s"

  [plugins."io.containerd.internal.v1.tracing"]
    sampling_ratio = 1.0
    service_name = "containerd"

  [plugins."io.containerd.metadata.v1.bolt"]
    content_sharing_policy = "shared"

  [plugins."io.containerd.monitor.v1.cgroups"]
    no_prometheus = false

  [plugins."io.containerd.runtime.v1.linux"]
    no_shim = false
    runtime = "runc"
    runtime_root = ""
    shim = "containerd-shim"
    shim_debug = false

  [plugins."io.containerd.runtime.v2.task"]
    platforms = ["linux/amd64"]
    sched_core = false

  [plugins."io.containerd.service.v1.diff-service"]
    default = ["walking"]

  [plugins."io.containerd.service.v1.tasks-service"]
    rdt_config_file = ""

  [plugins."io.containerd.snapshotter.v1.aufs"]
    root_path = ""

  [plugins."io.containerd.snapshotter.v1.btrfs"]
    root_path = ""

  [plugins."io.containerd.snapshotter.v1.devmapper"]
    async_remove = false
    base_image_size = ""
    discard_blocks = false
    fs_options = ""
    fs_type = ""
    pool_name = ""
    root_path = ""

  [plugins."io.containerd.snapshotter.v1.native"]
    root_path = ""

  [plugins."io.containerd.snapshotter.v1.overlayfs"]
    root_path = ""
    upperdir_label = false

  [plugins."io.containerd.snapshotter.v1.zfs"]
    root_path = ""

  [plugins."io.containerd.tracing.processor.v1.otlp"]
    endpoint = ""
    insecure = false
    protocol = ""

[proxy_plugins]

[stream_processors]

  [stream_processors."io.containerd.ocicrypt.decoder.v1.tar"]
    accepts = ["application/vnd.oci.image.layer.v1.tar+encrypted"]
    args = ["--decryption-keys-path", "/etc/containerd/ocicrypt/keys"]
    env = ["OCICRYPT_KEYPROVIDER_CONFIG=/etc/containerd/ocicrypt/ocicrypt_keyprovider.conf"]
    path = "ctd-decoder"
    returns = "application/vnd.oci.image.layer.v1.tar"

  [stream_processors."io.containerd.ocicrypt.decoder.v1.tar.gzip"]
    accepts = ["application/vnd.oci.image.layer.v1.tar+gzip+encrypted"]
    args = ["--decryption-keys-path", "/etc/containerd/ocicrypt/keys"]
    env = ["OCICRYPT_KEYPROVIDER_CONFIG=/etc/containerd/ocicrypt/ocicrypt_keyprovider.conf"]
    path = "ctd-decoder"
    returns = "application/vnd.oci.image.layer.v1.tar+gzip"

[timeouts]
  "io.containerd.timeout.bolt.open" = "0s"
  "io.containerd.timeout.shim.cleanup" = "5s"
  "io.containerd.timeout.shim.load" = "5s"
  "io.containerd.timeout.shim.shutdown" = "3s"
  "io.containerd.timeout.task.state" = "2s"

[ttrpc]
  address = ""
  gid = 0
  uid = 0

让配置文件生效

$ systemctl daemon-reload && systemctl restart containerd

安装成功后将提示如下信息:

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

按照提示执行以上三条命令

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

获取join命令

kubeadm token create --print-join-command

5安装网络插件

此处选用flannel网络插件

# 安装
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

# 查看状态
kubectl get pods -A | grep flannel

6添加Node节点

选定一个node节点,运行上述步骤 1-3

执行安装master之后生成的 kubeadm join 命令

如果提示证书ip校验失败,需要重新生成证书,并制定ip

重新生成证书

7验证

8问题记录

kubeadm join 时报错健康检测失败

先检查kubelet 是否正常
systemctl status kubelet

显示错误
 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=1/FAILURE)

查看kubelet日志
journalctl -u kubelet --no-pager

显示提示
"command failed" err="failed to validate kubelet flags: the container runtime endpoint address was not specified or empty, use --container-runtime-endpoint to set"

原因应该是containerd的默认配置有问题,替换为步骤4中containerd的配置文件,并让配置文件生效,重启containerd,然后重新join