1准备工作
关闭SELinux和防火墙
这样做的目的是为了允许容器访问主机文件系统,让 Pod 网络工作正常。
systemctl stop firewalld && systemctl disable firewalld
sudo setenforce 0
sudo sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config
关闭swap
swapoff -a
sed -i 's/^.*centos-swap/#&/g' /etc/fstab
内核参数修改
# 激活 br_netfilter 模块
modprobe br_netfilter
cat << EOF > /etc/modules-load.d/k8s.conf
br_netfilter
EOF
# 内核参数设置:开启IP转发,允许iptables对bridge的数据进行处理
cat << EOF > /etc/sysctl.d/k8s.conf
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
EOF
# 立即生效
sysctl --system
修改hostname
hostnamectl set-hostname kube-master-1
配置hosts
在所有master和node节点上配置hosts映射。例如
echo "192.168.252.79 kube-master-1" >> /etc/hosts
echo "192.168.252.88 node1" >> /etc/hosts
echo "192.168.252.89 node2" >> /etc/hosts
集群时间同步
可以使用chrony,此处假定各节点时间是同步的
centos8需要安装iproute-tc
dnf install -y iproute-tc
2安装Docker
安装
yum install -y yum-utils device-mapper-persistent-data lvm2
yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
yum install -y docker-ce
systemctl enable docker && systemctl start docker
配置
Kubernetes 推荐使用 systemd 来代替 cgroupfs。因为 docker 容器默认 cgroup 驱动为 cgroupfs,而 kubelet 默认为 systemd,所以为了使系统更为稳定,容器和 kubelet 应该都使用 systemd 作为 cgroup 驱动。
如果没有该文件,创建即可
cat << EOF > /etc/docker/daemon.json
{
"registry-mirrors": ["https://registry.docker-cn.com"],
"exec-opts": ["native.cgroupdriver=systemd"]
}
EOF
systemctl daemon-reload
systemctl restart docker
3安装kubeadm及相关工具
配置yum源
cat << EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF
安装并配置自动补全
yum install -y kubelet kubeadm kubectl
systemctl enable kubelet && systemctl start kubelet
#自动补全
kubeadm completion bash > /etc/bash_completion.d/kubeadm
kubectl completion bash >/etc/bash_completion.d/kubectl
source /etc/bash_completion.d/kubeadm
source /etc/bash_completion.d/kubectl
4安装master
下载kubernetes相关镜像
#查看需要下载的镜像列表
kubeadm config images list
将需要下载:
registry.k8s.io/kube-apiserver:v1.26.0
registry.k8s.io/kube-controller-manager:v1.26.0
registry.k8s.io/kube-scheduler:v1.26.0
registry.k8s.io/kube-proxy:v1.26.0
registry.k8s.io/pause:3.9
registry.k8s.io/etcd:3.5.6-0
registry.k8s.io/coredns/coredns:v1.9.3
安装
kubeadm init --image-repository=registry.aliyuncs.com/google_containers --kubernetes-version=v1.26.0 --service-cidr=10.1.0.0/16 --pod-network-cidr=10.244.0.0/16
选项说明:
–image-repository:选择用于拉取镜像的镜像仓库(默认为“k8s.gcr.io” )
–kubernetes-version:选择特定的Kubernetes版本(默认为“stable-1”)
–service-cidr:为服务的VIP指定使用的IP地址范围(默认为“10.96.0.0/12”)
–pod-network-cidr:指定Pod网络的IP地址范围。如果设置,则将自动为每个节点分配CIDR。
–control-plane-endpoint=xxx.xxx.xxx.xxx:6443 。指定api-server地址,如果是公网,最好指定
注:
因为后面要部署 flannel,参照flannel文档,如果想不改flannel配置,此处pod网络范围需配置为10.244.0.0/16(和falnnel默认的一致)
如果报错
[ERROR CRI]: container runtime is not running: output: time=“2020-09-24T11:49:16Z” level=fatal msg=“getting status of runtime failed: rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService”
, error: exit status 1
执行如下命令,再重试
rm /etc/containerd/config.toml
systemctl restart containerd
报错失败之后
解决问题之后,reset kubeadm再重试
kubeadm reset
日志查看
tail -f /var/log/messages
安装的yaml文件位置
/etc/kubernetes/manifests/
遇到的问题
通过 tail -f /var/log/messages 命令查看到报错:
Error syncing pod, skipping" err=“failed to “CreatePodSandbox” for “kube-controller-manager-kube-master-1_kube-system(9609d43a045e55d723a6a6e70d0b050f)” with CreatePodSandboxError: “Failed to create sandbox for pod \“kube-controller-manager-kube-master-1_kube-system(9609d43a045e55d723a6a6e70d0b050f)\”: rpc error: code = Unknown desc = failed to get sandbox image \“registry.k8s.io/pause:3.6\”: failed to pull image \“registry.k8s.io/pause:3.6\”: failed to pull and unpack image \“registry.k8s.io/pause:3.6\”: failed to resolve reference \“registry.k8s.io/pause:3.6\”: failed to do request: Head \“https://asia-east1-docker.pkg.dev/v2/k8s-artifacts-prod/images/pause/manifests/3.6\”: dial tcp 108.177.125.82:443: i/o timeout”” pod=“kube-system/kube-controller-manager-kube-master-1” podUID=9609d43a045e55d723a6a6e70d0b050f
在拉取pause镜像是,超时了。 这个是CRI containerd 报的错,所以改docker的镜像地址不管用,需要修改/etc/containerd/config.toml文件
解决办法
生成默认配置
containerd config default > /etc/containerd/config.toml
覆盖配置文件
参考
disabled_plugins = []
imports = []
oom_score = 0
plugin_dir = ""
required_plugins = []
root = "/var/lib/containerd"
state = "/run/containerd"
temp = ""
version = 2
[cgroup]
path = ""
[debug]
address = ""
format = ""
gid = 0
level = ""
uid = 0
[grpc]
address = "/run/containerd/containerd.sock"
gid = 0
max_recv_message_size = 16777216
max_send_message_size = 16777216
tcp_address = ""
tcp_tls_ca = ""
tcp_tls_cert = ""
tcp_tls_key = ""
uid = 0
[metrics]
address = ""
grpc_histogram = false
[plugins]
[plugins."io.containerd.gc.v1.scheduler"]
deletion_threshold = 0
mutation_threshold = 100
pause_threshold = 0.02
schedule_delay = "0s"
startup_delay = "100ms"
[plugins."io.containerd.grpc.v1.cri"]
device_ownership_from_security_context = false
disable_apparmor = false
disable_cgroup = false
disable_hugetlb_controller = true
disable_proc_mount = false
disable_tcp_service = true
enable_selinux = false
enable_tls_streaming = false
enable_unprivileged_icmp = false
enable_unprivileged_ports = false
ignore_image_defined_volumes = false
max_concurrent_downloads = 3
max_container_log_line_size = 16384
netns_mounts_under_state_dir = false
restrict_oom_score_adj = false
sandbox_image = "registry.aliyuncs.com/google_containers/pause:3.9"
selinux_category_range = 1024
stats_collect_period = 10
stream_idle_timeout = "4h0m0s"
stream_server_address = "127.0.0.1"
stream_server_port = "0"
systemd_cgroup = false
tolerate_missing_hugetlb_controller = true
unset_seccomp_profile = ""
[plugins."io.containerd.grpc.v1.cri".cni]
bin_dir = "/opt/cni/bin"
conf_dir = "/etc/cni/net.d"
conf_template = ""
ip_pref = ""
max_conf_num = 1
[plugins."io.containerd.grpc.v1.cri".containerd]
default_runtime_name = "runc"
disable_snapshot_annotations = true
discard_unpacked_layers = false
ignore_rdt_not_enabled_errors = false
no_pivot = false
snapshotter = "overlayfs"
[plugins."io.containerd.grpc.v1.cri".containerd.default_runtime]
base_runtime_spec = ""
cni_conf_dir = ""
cni_max_conf_num = 0
container_annotations = []
pod_annotations = []
privileged_without_host_devices = false
runtime_engine = ""
runtime_path = ""
runtime_root = ""
runtime_type = ""
[plugins."io.containerd.grpc.v1.cri".containerd.default_runtime.options]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
base_runtime_spec = ""
cni_conf_dir = ""
cni_max_conf_num = 0
container_annotations = []
pod_annotations = []
privileged_without_host_devices = false
runtime_engine = ""
runtime_path = ""
runtime_root = ""
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
BinaryName = ""
CriuImagePath = ""
CriuPath = ""
CriuWorkPath = ""
IoGid = 0
IoUid = 0
NoNewKeyring = false
NoPivotRoot = false
Root = ""
ShimCgroup = ""
SystemdCgroup = true
[plugins."io.containerd.grpc.v1.cri".containerd.untrusted_workload_runtime]
base_runtime_spec = ""
cni_conf_dir = ""
cni_max_conf_num = 0
container_annotations = []
pod_annotations = []
privileged_without_host_devices = false
runtime_engine = ""
runtime_path = ""
runtime_root = ""
runtime_type = ""
[plugins."io.containerd.grpc.v1.cri".containerd.untrusted_workload_runtime.options]
[plugins."io.containerd.grpc.v1.cri".image_decryption]
key_model = "node"
[plugins."io.containerd.grpc.v1.cri".registry]
config_path = ""
[plugins."io.containerd.grpc.v1.cri".registry.auths]
[plugins."io.containerd.grpc.v1.cri".registry.configs]
[plugins."io.containerd.grpc.v1.cri".registry.headers]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors]
[plugins."io.containerd.grpc.v1.cri".x509_key_pair_streaming]
tls_cert_file = ""
tls_key_file = ""
[plugins."io.containerd.internal.v1.opt"]
path = "/opt/containerd"
[plugins."io.containerd.internal.v1.restart"]
interval = "10s"
[plugins."io.containerd.internal.v1.tracing"]
sampling_ratio = 1.0
service_name = "containerd"
[plugins."io.containerd.metadata.v1.bolt"]
content_sharing_policy = "shared"
[plugins."io.containerd.monitor.v1.cgroups"]
no_prometheus = false
[plugins."io.containerd.runtime.v1.linux"]
no_shim = false
runtime = "runc"
runtime_root = ""
shim = "containerd-shim"
shim_debug = false
[plugins."io.containerd.runtime.v2.task"]
platforms = ["linux/amd64"]
sched_core = false
[plugins."io.containerd.service.v1.diff-service"]
default = ["walking"]
[plugins."io.containerd.service.v1.tasks-service"]
rdt_config_file = ""
[plugins."io.containerd.snapshotter.v1.aufs"]
root_path = ""
[plugins."io.containerd.snapshotter.v1.btrfs"]
root_path = ""
[plugins."io.containerd.snapshotter.v1.devmapper"]
async_remove = false
base_image_size = ""
discard_blocks = false
fs_options = ""
fs_type = ""
pool_name = ""
root_path = ""
[plugins."io.containerd.snapshotter.v1.native"]
root_path = ""
[plugins."io.containerd.snapshotter.v1.overlayfs"]
root_path = ""
upperdir_label = false
[plugins."io.containerd.snapshotter.v1.zfs"]
root_path = ""
[plugins."io.containerd.tracing.processor.v1.otlp"]
endpoint = ""
insecure = false
protocol = ""
[proxy_plugins]
[stream_processors]
[stream_processors."io.containerd.ocicrypt.decoder.v1.tar"]
accepts = ["application/vnd.oci.image.layer.v1.tar+encrypted"]
args = ["--decryption-keys-path", "/etc/containerd/ocicrypt/keys"]
env = ["OCICRYPT_KEYPROVIDER_CONFIG=/etc/containerd/ocicrypt/ocicrypt_keyprovider.conf"]
path = "ctd-decoder"
returns = "application/vnd.oci.image.layer.v1.tar"
[stream_processors."io.containerd.ocicrypt.decoder.v1.tar.gzip"]
accepts = ["application/vnd.oci.image.layer.v1.tar+gzip+encrypted"]
args = ["--decryption-keys-path", "/etc/containerd/ocicrypt/keys"]
env = ["OCICRYPT_KEYPROVIDER_CONFIG=/etc/containerd/ocicrypt/ocicrypt_keyprovider.conf"]
path = "ctd-decoder"
returns = "application/vnd.oci.image.layer.v1.tar+gzip"
[timeouts]
"io.containerd.timeout.bolt.open" = "0s"
"io.containerd.timeout.shim.cleanup" = "5s"
"io.containerd.timeout.shim.load" = "5s"
"io.containerd.timeout.shim.shutdown" = "3s"
"io.containerd.timeout.task.state" = "2s"
[ttrpc]
address = ""
gid = 0
uid = 0
让配置文件生效
$ systemctl daemon-reload && systemctl restart containerd
安装成功后将提示如下信息:
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
按照提示执行以上三条命令
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
获取join命令
kubeadm token create --print-join-command
5安装网络插件
此处选用flannel网络插件
# 安装
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
# 查看状态
kubectl get pods -A | grep flannel
6添加Node节点
选定一个node节点,运行上述步骤 1-3
执行安装master之后生成的 kubeadm join 命令
如果提示证书ip校验失败,需要重新生成证书,并制定ip
7验证
8问题记录
kubeadm join 时报错健康检测失败
先检查kubelet 是否正常
systemctl status kubelet
显示错误
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=1/FAILURE)
查看kubelet日志
journalctl -u kubelet --no-pager
显示提示
"command failed" err="failed to validate kubelet flags: the container runtime endpoint address was not specified or empty, use --container-runtime-endpoint to set"
原因应该是containerd的默认配置有问题,替换为步骤4中containerd的配置文件,并让配置文件生效,重启containerd,然后重新join