1、总结 Underlay 和 Overlay 网络的的区别及优缺点
- Underlay 直接使用宿主机网络,无需封装解封装,性能好 基于宿主机网络物理网卡虚拟出多个网络接口(子接口),每个虚拟接口都拥有唯一的mac地址并可配置网卡子接口IP,强依赖物理网络
缺点:消耗的IP地址较多,子网要划分的足够大,报文广播也大 - Overlay 叠加网络,宿主机中封装容器网络,容器的mac封装到宿主机网络,利用宿主机网络,传输报文中的容器mac地址。效果就像L2的以太网帧在一个广播域中传输。 私有网络使用最多的网络之一 优点:对物理网络的兼容性比较好,没有额外的要求,可以实现pod的跨主机子网通信,calico和flannel等网络插件都支持overlay网络,私有云使用较多 缺点:有额外的封装与解封装性能开销
2、在 kubernetes 集群实现 underlay 网络
# 所有节点执行
apt-get -y update
apt -y install apt-transport-https ca-certificates curl software-properties-common
# 安装GPG证书
curl -fsSL http://mirrors.aliyun.com/docker-ce/linux/ubuntu/gpg | apt-key add -
# 写入软件源信息
add-apt-repository "deb [arch=amd64] http://mirrors.aliyun.com/docker-ce/linux/ubuntu $(lsb_release -cs) stable"
# 更新软件源
apt-get -y update
# 查看可安装的Docker版
apt-cache madison docker-ce docker-ce-cli
apt install -y docker-ce=5:20.10.23~3-0~ubuntu-jammy docker-ce-cli=5:20.10.23~3-0~ubuntu-jammy
systemctl start docker && systemctl enable docker
mkdir -p /etc/docker
tee /etc/docker/daemon.json <<-'EOF'
{
"exec-opts": ["native.cgroupdriver=systemd"],
"registry-mirrors": ["https://9916w1ow.mirror.aliyuncs.com"]
}
EOF
sudo systemctl daemon-reload && sudo systemctl restart docker
# 安装cri-dockerd
cd /usr/local/src/
wget https://github.com/Mirantis/cri-dockerd/releases/download/v0.3.1/cri-dockerd-0.3.1.amd64.tgz
tar xvf cri-dockerd-0.3.1.amd64.tgz
cp cri-dockerd/cri-dockerd /usr/local/bin
tee /lib/systemd/system/cri-docker.service << "EOF"
[Unit]
Description=CRI Interface for Docker Application Container Engine
Documentation=https://docs.mirantis.com
After=network-online.target firewalld.service docker.service
Wants=network-online.target
Requires=cri-docker.socket
[Service]
Type=notify
ExecStart=/usr/local/bin/cri-dockerd --network-plugin=cni --pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.9
ExecReload=/bin/kill -s HUP $MAINPID
TimeoutSec=0
RestartSec=2
Restart=always
StartLimitBurst=3
StartLimitInterval=60s
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity
TasksMax=infinity
Delegate=yes
KillMode=process
[Install]
WantedBy=multi-user.target
EOF
tee /etc/systemd/system/cri-docker.socket << "EOF"
[Unit]
Description=CRI Docker Socket for the API
PartOf=cri-docker.service
[Socket]
ListenStream=%t/cri-dockerd.sock
SocketMode=0660
SocketUser=root
SocketGroup=docker
[Install]
WantedBy=sockets.target
EOF
systemctl daemon-reload && systemctl restart cri-docker && systemctl enable cri-docker && systemctl enable --now
cri-docker.socket
systemctl status cri-docker.service
# 检查cri socket 文件
ls /var/run/cri-dockerd.sock
# 安装kubeadmin
## 设置k8s镜像源
apt-get update && apt-get install -y apt-transport-https
curl https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | apt-key add -
cat <<EOF >/etc/apt/sources.list.d/kubernetes.list
deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main
EOF
## 开始安装kubeadm:
apt-get update
apt-cache madison kubeadm
apt-get install -y kubelet=1.24.10-00 kubeadm=1.24.10-00 kubectl=1.24.10-00
kubeadm config images list --kubernetes-version v1.24.10
tee /opt/images-download.sh << "EOF"
#!/bin/bash
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-apiserver:v1.24.10
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-controller-manager:v1.24.10
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-scheduler:v1.24.10
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy:v1.24.10
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.9
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/etcd:3.5.6-0
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/coredns:1.8.6
EOF
bash /opt/images-download.sh
'''
场景1:pod可以选择overlay或者underlay,SVC使用overlay,如果是underlay需要配置SVC使用宿主机的子网
比如以下场景是overlay网络、后期会用于overlay场景的pod,service会用于overlay的svc场景。
# kubeadm init --apiserver-advertise-address=172.31.6.201 --apiserver-bind-port=6443 --kubernetes-version=v1.24.4 --pod-network- cidr=10.200.0.0/16 --service-cidr=10.100.0.0/16 --service-dns-domain=cluster.local --image-repository=registry.cn- hangzhou.aliyuncs.com/google_containers --ignore-preflight-errors=swap --cri-socket unix:///var/run/cri-dockerd.sock
场景2:pod可以选择overlay或者underlay,SVC使用underlay
underlay初始化,--pod-network-cidr=10.200.0.0/16会用于后期overlay的场景,underlay的网络CIDR后期单独指定,overlay会与underlay并存,-- service-cidr=172.31.5.0/24用于后期的underlay svc,通过SVC可以直接访问pod。
# 演示underlay初始化命令:
# kubeadm init --apiserver-advertise-address=172.31.6.201 --apiserver-bind-port=6443 --kubernetes-version=v1.24.10 --pod-network- cidr=10.200.0.0/16 --service-cidr=172.31.5.0/24 --service-dns-domain=cluster.local --image-repository=registry.cn- hangzhou.aliyuncs.com/google_containers --ignore-preflight-errors=swap --cri-socket unix:///var/run/cri-dockerd.sock
注意:后期如果要访问SVC则需要在网络设备配置静态路由,因为SVC是iptables或者IPVS规则,不会进行arp报文广播:
-A KUBE-SERVICES -d 172.31.5.148/32 -p tcp -m comment --comment "myserver/myserver-tomcat-app1-service-underlay:http cluster IP" -m
tcp --dport 80 -j KUBE-SVC-DXPW2IL54XTPIKP5
-A KUBE-SVC-DXPW2IL54XTPIKP5 ! -s 10.200.0.0/16 -d 172.31.5.148/32 -p tcp -m comment --comment "myserver/myserver-tomcat-app1- service-underlay:http cluster IP" -m tcp --dport 80 -j KUBE-MARK-MASQ
Chain KUBE-POSTROUTING (1 references)
pkts bytes target prot opt in out source destination
1260 83666 RETURN all -- * * 0.0.0.0/0 0.0.0.0/0 mark match ! 0x4000/0x4000
5 312 MARK all -- * * 0.0.0.0/0 0.0.0.0/0 MARK xor 0x4000
5 312 MASQUERADE all -- * * 0.0.0.0/0 0.0.0.0/0 /* kubernetes service traffic requiring SNAT */ rando
'''
## 初始化k8s-使用underlay网络
# 仅master执行
kubeadm init --apiserver-advertise-address=172.31.6.201 --apiserver-bind-port=6443 --kubernetes-version=v1.24.10 --pod-network-cidr=10.200.0.0/16 --service-cidr=172.31.5.0/24 --service-dns-domain=cluster.local --image-repository=registry.cn-hangzhou.aliyuncs.com/google_containers --ignore-preflight-errors=swap --cri-socket unix:///var/run/cri-dockerd.sock
'''
初始化完成后的重要信息
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 172.31.6.201:6443 --token t0ruut.j6toxlfjte31sngo \
--discovery-token-ca-cert-hash sha256:c78950990035913274f57d8e62f56c2502dab04ec2f578b6ccd58d788f3932c7
'''
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
root@k8s-master:~# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master.iclinux.com NotReady control-plane 4m4s v1.24.10
# node 节点执行
## 添加node节点,仅在node节点执行
kubeadm join 172.31.6.201:6443 --token t0ruut.j6toxlfjte31sngo --discovery-token-ca-cert-hash sha256:c78950990035913274f57d8e62f56c2502dab04ec2f578b6ccd58d788f3932c7 --cri-socket unix:///var/run/cri-dockerd.sock
root@k8s-master:~# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master.iclinux.com NotReady control-plane 10m v1.24.10
k8s-node1.iclinux.com NotReady <none> 20s v1.24.10
k8s-node2.iclinux.com NotReady <none> 11s v1.24.10
k8s-node3.iclinux.com NotReady <none> 8s v1.24.10
## 分发config master 执行
scp /root/.kube/config 172.31.6.204:/root/.kube
# 基于helm部署网络组件 hybridnet , master 节点
## 安装helm
cd /usr/local/src && wget https://get.helm.sh/helm-v3.9.0-linux-amd64.tar.gz
tar xvf helm-v3.9.0-linux-amd64.tar.gz
mv linux-amd64/helm /usr/local/bin/
# 设置helm仓库
helm repo add hybridnet https://alibaba.github.io/hybridnet/
helm repo update
## 初始化网络组件
helm install hybridnet hybridnet/hybridnet -n kube-system --set init.cidr=10.200.0.0/16
# 注意 cidr为搭建k8s集群pod网段
## 查看
root@k8s-master:/usr/local/src# kubectl get pod -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-typha-6f55876f98-hr7sm 1/1 Running 0 2m7s
kube-system calico-typha-6f55876f98-kp47t 1/1 Running 0 2m7s
kube-system calico-typha-6f55876f98-sdzmz 1/1 Running 0 2m7s
kube-system coredns-7f74c56694-bfwqd 0/1 Pending 0 85m
kube-system coredns-7f74c56694-blmcv 0/1 Pending 0 85m
kube-system etcd-k8s-master.iclinux.com 1/1 Running 0 85m
kube-system hybridnet-daemon-5c2fh 2/2 Running 0 2m7s
kube-system hybridnet-daemon-5p4fn 0/2 Init:0/1 0 2m7s
kube-system hybridnet-daemon-6hvz6 0/2 Init:0/1 0 2m7s
kube-system hybridnet-daemon-tvb7x 2/2 Running 0 2m7s
kube-system hybridnet-manager-6574dcc5fb-2wm76 0/1 Pending 0 2m7s
kube-system hybridnet-manager-6574dcc5fb-ghxn6 0/1 Pending 0 2m7s
kube-system hybridnet-manager-6574dcc5fb-l6gx6 0/1 Pending 0 2m7s
kube-system hybridnet-webhook-76dc57b4bf-cf9mj 0/1 Pending 0 2m10s
kube-system hybridnet-webhook-76dc57b4bf-klqzt 0/1 Pending 0 2m10s
kube-system hybridnet-webhook-76dc57b4bf-wbsnj 0/1 Pending 0 2m10s
kube-system kube-apiserver-k8s-master.iclinux.com 1/1 Running 0 85m
kube-system kube-controller-manager-k8s-master.iclinux.com 1/1 Running 0 85m
kube-system kube-proxy-864vx 1/1 Running 0 75m
kube-system kube-proxy-h585r 1/1 Running 0 75m
kube-system kube-proxy-m5wd5 1/1 Running 0 75m
kube-system kube-proxy-vctbh 1/1 Running 0 85m
kube-system kube-scheduler-k8s-master.iclinux.com 1/1 Running 0 85m
# 设置选择器
kubectl label node k8s-node1.iclinux.com node-role.kubernetes.io/master=
kubectl label node k8s-node2.iclinux.com node-role.kubernetes.io/master=
kubectl label node k8s-node3.iclinux.com node-role.kubernetes.io/master=
# 确保所有的pod都启动了
root@k8s-master:/usr/local/src#
root@k8s-master:/usr/local/src# kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-typha-6f55876f98-hr7sm 1/1 Running 0 5m37s
kube-system calico-typha-6f55876f98-kp47t 1/1 Running 0 5m37s
kube-system calico-typha-6f55876f98-sdzmz 1/1 Running 0 5m37s
kube-system coredns-7f74c56694-bfwqd 1/1 Running 0 89m
kube-system coredns-7f74c56694-blmcv 1/1 Running 0 89m
kube-system etcd-k8s-master.iclinux.com 1/1 Running 0 89m
kube-system hybridnet-daemon-5c2fh 2/2 Running 1 (3m2s ago) 5m37s
kube-system hybridnet-daemon-5p4fn 2/2 Running 1 (43s ago) 5m37s
kube-system hybridnet-daemon-6hvz6 2/2 Running 1 (2m44s ago) 5m37s
kube-system hybridnet-daemon-tvb7x 2/2 Running 1 (3m2s ago) 5m37s
kube-system hybridnet-manager-6574dcc5fb-2wm76 1/1 Running 0 5m37s
kube-system hybridnet-manager-6574dcc5fb-ghxn6 1/1 Running 0 5m37s
kube-system hybridnet-manager-6574dcc5fb-l6gx6 1/1 Running 0 5m37s
kube-system hybridnet-webhook-76dc57b4bf-cf9mj 1/1 Running 0 5m40s
kube-system hybridnet-webhook-76dc57b4bf-klqzt 1/1 Running 0 5m40s
kube-system hybridnet-webhook-76dc57b4bf-wbsnj 1/1 Running 0 5m40s
kube-system kube-apiserver-k8s-master.iclinux.com 1/1 Running 0 89m
kube-system kube-controller-manager-k8s-master.iclinux.com 1/1 Running 0 89m
kube-system kube-proxy-864vx 1/1 Running 0 79m
kube-system kube-proxy-h585r 1/1 Running 0 79m
kube-system kube-proxy-m5wd5 1/1 Running 0 79m
# 查看网络
root@k8s-node3:/usr/local/src# ifconfig
docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255
ether 02:42:0e:b8:b1:9c txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.31.6.204 netmask 255.255.248.0 broadcast 172.31.7.255
inet6 fe80::20c:29ff:feda:8719 prefixlen 64 scopeid 0x20<link>
ether 00:0c:29:da:87:19 txqueuelen 1000 (Ethernet)
RX packets 545289 bytes 695214333 (695.2 MB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 231360 bytes 42214413 (42.2 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
eth0.vxlan4: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet 172.31.6.204 netmask 255.255.248.0 broadcast 172.31.7.255
inet6 fe80::20c:29ff:feda:8719 prefixlen 64 scopeid 0x20<link>
ether 00:0c:29:da:87:19 txqueuelen 0 (Ethernet)
RX packets 27 bytes 2580 (2.5 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 12 bytes 1188 (1.1 KB)
TX errors 0 dropped 1 overruns 0 carrier 0 collisions 0
hybr2f5133a0152: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet6 fe80::ecee:eeff:feee:eeee prefixlen 64 scopeid 0x20<link>
ether ee:ee:ee:ee:ee:ee txqueuelen 0 (Ethernet)
RX packets 124 bytes 12975 (12.9 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 134 bytes 33940 (33.9 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
hybrf4727d8f411: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet6 fe80::ecee:eeff:feee:eeee prefixlen 64 scopeid 0x20<link>
ether ee:ee:ee:ee:ee:ee txqueuelen 0 (Ethernet)
RX packets 124 bytes 12904 (12.9 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 131 bytes 33766 (33.7 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 4030 bytes 374007 (374.0 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 4030 bytes 374007 (374.0 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
root@k8s-node3:/usr/local/src#
root@k8s-node3:/usr/local/src# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 172.31.0.2 0.0.0.0 UG 0 0 0 eth0
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
172.31.0.0 0.0.0.0 255.255.248.0 U 0 0 0 eth0
root@k8s-node3:/usr/local/src#
# 配置underlay 网络
## 给支持underlay的节点打标签
kubectl label node k8s-node1.iclinux.com network=underlay-nethost
kubectl label node k8s-node2.iclinux.com network=underlay-nethost
kubectl label node k8s-node3.iclinux.com network=underlay-nethost
## 打标错误后的处理
kubectl label --overwrite node k8s-node1.iclinux.com network=underlay-nethost
kubectl label --overwrite node k8s-node2.iclinux.com network=underlay-nethost
kubectl label --overwrite node k8s-node3.iclinux.com network=underlay-nethost
root@k8s-master:~/underlay-cases-files# cat 1.create-underlay-network.yaml
---
apiVersion: networking.alibaba.com/v1
kind: Network
metadata:
name: underlay-network1
spec:
netID: 0
type: Underlay
nodeSelector:
network: "underlay-nethost"
---
apiVersion: networking.alibaba.com/v1
kind: Subnet
metadata:
name: underlay-network1
spec:
network: underlay-network1
netID: 0
range:
version: "4" # ipv4
cidr: "172.31.0.0/21" # 整个子网的网络规划
gateway: "172.31.0.2" # 外部网关地址
start: "172.31.6.1"
end: "172.31.6.254"
root@k8s-master:~/underlay-cases-files# kubectl apply -f 1.create-underlay-network.yaml
network.networking.alibaba.com/underlay-network1 created
subnet.networking.alibaba.com/underlay-network1 created
root@k8s-master:~/underlay-cases-files# kubectl get network
NAME NETID TYPE MODE V4TOTAL V4USED V4AVAILABLE LASTALLOCATEDV4SUBNET V6TOTAL V6USED V6AVAILABLE LASTALLOCATEDV6SUBNET
init 4 Overlay 65534 2 65532 init 0 0 0
underlay-network1 0 Underlay 254 0 254 underlay-network1 0 0 0
# 验证
root@k8s-master:~/underlay-cases-files# kubectl create ns myserver
namespace/myserver created
k8s-master:~/underlay-cases-files#
root@k8s-master:~/underlay-cases-files# cat 2.tomcat-app1-overlay.yaml
kind: Deployment
apiVersion: apps/v1
metadata:
labels:
app: myserver-tomcat-app1-deployment-overlay-label
name: myserver-tomcat-app1-deployment-overlay
namespace: myserver
spec:
replicas: 1
selector:
matchLabels:
app: myserver-tomcat-app1-overlay-selector
template:
metadata:
labels:
app: myserver-tomcat-app1-overlay-selector
spec:
#nodeName: k8s-node2.example.com
containers:
- name: myserver-tomcat-app1-container
#image: tomcat:7.0.93-alpine
image: registry.cn-hangzhou.aliyuncs.com/zhangshijie/tomcat-app1:v1
imagePullPolicy: IfNotPresent
##imagePullPolicy: Always
ports:
- containerPort: 8080
protocol: TCP
name: http
env:
- name: "password"
value: "123456"
- name: "age"
value: "18"
# resources:
# limits:
# cpu: 0.5
# memory: "512Mi"
# requests:
# cpu: 0.5
# memory: "512Mi"
---
kind: Service
apiVersion: v1
metadata:
labels:
app: myserver-tomcat-app1-service-overlay-label
name: myserver-tomcat-app1-service-overlay
namespace: myserver
spec:
type: NodePort
ports:
- name: http
port: 80
protocol: TCP
targetPort: 8080
nodePort: 30003
selector:
app: myserver-tomcat-app1-overlay-selector
root@k8s-master:~/underlay-cases-files# kubectl apply -f 2.tomcat-app1-overlay.yaml
deployment.apps/myserver-tomcat-app1-deployment-overlay created
service/myserver-tomcat-app1-service-overlay created
root@k8s-master:~/underlay-cases-files# kubectl get pod -n myserver
NAME READY STATUS RESTARTS AGE
myserver-tomcat-app1-deployment-overlay-69dfff68d9-jjg45 1/1 Running 0 2m49s
root@k8s-master:~/underlay-cases-files#
root@k8s-master:~/underlay-cases-files# kubectl get pod -n myserver -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
myserver-tomcat-app1-deployment-overlay-69dfff68d9-jjg45 1/1 Running 0 3m26s 10.200.0.3 k8s-node2.iclinux.com <none> <none>
# 此时发现 pod 默认为overlay 网络
root@k8s-master:~/underlay-cases-files# kubectl get svc -n myserver -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
myserver-tomcat-app1-service-overlay NodePort 172.31.5.162 <none> 80:30003/TCP 5m4s app=myserver-tomcat-app1-overlay-selector
root@k8s-master:~/underlay-cases-files#
# 可以通过http://172.31.6.203:30003/myapp/ 来访问验证
# 创建一个underlay 网络的pod
root@k8s-master:~/underlay-cases-files# cat 3.tomcat-app1-underlay.yaml
kind: Deployment
#apiVersion: extensions/v1beta1
apiVersion: apps/v1
metadata:
labels:
app: myserver-tomcat-app1-deployment-underlay-label
name: myserver-tomcat-app1-deployment-underlay
namespace: myserver
spec:
replicas: 1
selector:
matchLabels:
app: myserver-tomcat-app1-underlay-selector
template:
metadata:
labels:
app: myserver-tomcat-app1-underlay-selector
annotations: #使用Underlay或者Overlay网络
networking.alibaba.com/network-type: Underlay
spec:
#nodeName: k8s-node2.example.com
containers:
- name: myserver-tomcat-app1-container
#image: tomcat:7.0.93-alpine
image: registry.cn-hangzhou.aliyuncs.com/zhangshijie/tomcat-app1:v2
imagePullPolicy: IfNotPresent
##imagePullPolicy: Always
ports:
- containerPort: 8080
protocol: TCP
name: http
env:
- name: "password"
value: "123456"
- name: "age"
value: "18"
# resources:
# limits:
# cpu: 0.5
# memory: "512Mi"
# requests:
# cpu: 0.5
# memory: "512Mi"
---
kind: Service
apiVersion: v1
metadata:
labels:
app: myserver-tomcat-app1-service-underlay-label
name: myserver-tomcat-app1-service-underlay
namespace: myserver
spec:
# type: NodePort
ports:
- name: http
port: 80
protocol: TCP
targetPort: 8080
#nodePort: 40003
selector:
app: myserver-tomcat-app1-underlay-selector
root@k8s-master:~/underlay-cases-files# kubectl apply -f 3.tomcat-app1-underlay.yaml
deployment.apps/myserver-tomcat-app1-deployment-underlay created
service/myserver-tomcat-app1-service-underlay created
root@k8s-master:~/underlay-cases-files# kubectl get pod -n myserver -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
myserver-tomcat-app1-deployment-overlay-69dfff68d9-hgr2h 1/1 Running 0 26m 10.200.0.4 k8s-node2.iclinux.com <none> <none>
myserver-tomcat-app1-deployment-underlay-bd7cd59cf-nskp9 1/1 Running 0 54s 172.31.6.3 k8s-node1.iclinux.com <none> <none>
root@k8s-master:~/underlay-cases-files#
#验证地址:http://172.31.6.4:8080/myapp
# 观察1节点网络
root@k8s-node1:~# ifconfig
docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255
ether 02:42:af:7b:06:5e txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.31.6.202 netmask 255.255.248.0 broadcast 172.31.7.255
inet6 fe80::20c:29ff:fe76:cc0a prefixlen 64 scopeid 0x20<link>
ether 00:0c:29:76:cc:0a txqueuelen 1000 (Ethernet)
RX packets 1258413 bytes 1662158036 (1.6 GB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 495846 bytes 142946399 (142.9 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
eth0.vxlan4: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet 172.31.6.202 netmask 255.255.248.0 broadcast 172.31.7.255
inet6 fe80::20c:29ff:fe76:cc0a prefixlen 64 scopeid 0x20<link>
ether 00:0c:29:76:cc:0a txqueuelen 0 (Ethernet)
RX packets 27 bytes 2594 (2.5 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 14 bytes 1394 (1.3 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
hybrcf33ee38b06: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet6 fe80::ecee:eeff:feee:eeee prefixlen 64 scopeid 0x20<link>
ether ee:ee:ee:ee:ee:ee txqueuelen 0 (Ethernet)
RX packets 25 bytes 2961 (2.9 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 35 bytes 3182 (3.1 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 25208 bytes 2007678 (2.0 MB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 25208 bytes 2007678 (2.0 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
root@k8s-node1:~# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 172.31.0.2 0.0.0.0 UG 0 0 0 eth0
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
172.31.0.0 0.0.0.0 255.255.248.0 U 0 0 0 eth0
# pod 地址可以固定,
# 通过service来访问pod
root@k8s-node1:~# kubectl get svc -n myserver -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
myserver-tomcat-app1-service-overlay NodePort 172.31.5.47 <none> 80:30003/TCP 36m app=myserver-tomcat-app1-overlay-selector
myserver-tomcat-app1-service-underlay ClusterIP 172.31.5.86 <none> 80/TCP 10m app=myserver-tomcat-app1-underlay-selector
# 正常不能直接访问,需要打通客户端到svc的通讯,生产环境需要网络同事添加路由,测试环境需要在测试机添加本地路由
## 配置hybiridnet的默认网络行为从underlay修改为overlay
helm upgrade hybridnet hybridnet/hybridnet -n kube-system --set defualtNetworkType=Overlay
或者:
kubectl edit deploy hybridnet-webhook -n kube-system
env:
- name: DEFAULT_NETWORK_TYPE
value: Overlay
kubectl edit deploy hybridnet-manager -n kube-system
env:
- name: DEFAULT_NETWORK_TYPE
value: Overlay
View Code
3、总结网络组件 flannel vxlan 模式的网络通信流程
1. 源pod发起请求,此时报文中源IP为pod的eth0的ip,源mac为pod的eth0的mac,目的ip为目的pod的ip,目的mac为网关(cni0)的mac
抓包命令:tcpdump -nn -vvv -i veth91d6f855 -vvv -nn ! port 22 and ! port 2379 and ! port 6443 and ! port 10250 and ! arp and ! port 53
2. 数据报文通过veth peer 发送给网关cni0,检查目的mac就是发给自己的,cni0进行目标IP检查,如果是同一个网桥的报文就直接抓发,不是的话就发送给flannel.1,此时保卫会被修改
源IP:pod IP, 10.100.2.2
目的IP:Pod ip, 10.100.1.6
源mac:源pod mac
目的mac: cni mac
抓包:tcpdump -nn -vvv -i cni0 -vvv -nn ! port 22 and ! port 2379 and ! port 6443 and ! port 10250 and ! arp and ! port 53
3. 到达flannel.1,检查目的mac就是发给自己的,开始匹配路由表,先实现ov报文的内层封装(主要修改目的pod的对端flannel.1的mac,源mac为当前宿主机flannel.1的mac)
bridge fdb show dev flannel.1
抓包命令:tcpdump -nn -vvv -i flannel.1 -vvv -nn ! port 22 and ! port 2379 and ! port 6443 and ! port 10250 and ! arp and ! port 53
4. 源宿主机基于udp封装vxlan报文
udp 源端口:随机
udp目的端口:8472
源IP:源pod所在的宿主机的物理网卡IP
目的IP:目的pod所在的宿主机的物理网卡IP
源mac: 源pod所在的宿主机的物理网卡
目的mac:目的pod所在宿主机的物理网卡
抓包命令: tcpdump -nn -vvv -i eth0 -vvv -nn ! port 22 and ! port 2379 and ! port 6443 and ! port 10250 and ! arp and ! port 53
5. 报文到达目的宿主机物理网卡,开始解封装报文
外层目的IP为本机物理网卡,解开后发现里面还有一层目的ip和目的mac,发现目的ip为10.100.1.6,目的mac为xxx(目的flannel.1的mac),然后将报文发给flannel.1
抓包命令:tcpdump -nn -vvv -i eth0 -vvv -nn ! port 22 and ! port 2379 and ! port 6443 and ! port 10250 and ! arp and ! port 53
6. 报文到达目的宿主机flannel.1
flannel.1检查报文的目的ip,发现是去本机cni0的子网,将请求报文抓发给cnio
抓包:tcpdump -nn -vvv -i flannel.1 -vvv -nn ! port 22 and ! port 2379 and ! port 6443 and ! port 10250 and ! arp and ! port 53
目的ip:10.100.1.6 目的pod
源IP:10.100.2.2 源pod
目的mac: 目的pod所在宿主机的flannel.1 mac
源mac:源pod所在宿主机的flannel.1 mac
7. 报文到达目的宿主机cni0
cni0 基于目的ip检查mac地址表,修改目的mac为目的pod mac后将将请求抓发给pod
源IP:源pod IP
目的IP: 目的pod ip
源mac:cni0的mac
目的mac:目的pod的mac
抓包: tcpdump -nn -vvv -i cni0 -vvv -nn ! port 22 and ! port 2379 and ! port 6443 and ! port 10250 and ! arp and ! port 53 -w 7.flannel-flannel-vxlan-cni0-in.pcap
8. 报文到达目的宿主机pod
cnio 收到报文发现去往10.100.1.6,检查mac地址表发现是本地接口,然后通过网桥接口发给pod
目的IP: 目的pod IP
源IP:源POD IP
目的mac: 目的pod mac
源mac:cni0的mac
抓包:tcpdump -nn -vvv -i vethf38183ee -vvv -nn ! port 22 and ! port 2379 and ! port 6443 and ! port 10250 and ! arp and ! port 53 -w 8.flannel-vxlan-vethf38183ee-in.pcap
4、总结网络组件 calico IPIP 模式的网络通信流程
1. 源pod发起请求,报文到达宿主机与pod对应的网卡
2. 报文到达在宿主机与pid对应的网卡
抓包:tcpdump -nn -vvv -i cali2b2e7c9e43e -vvv -nn ! port 22 and ! port 2379 and ! port 6443 and ! port 10250 and ! arp and ! port 53
此时,下一跳的网关都为169.254.1.1,目的mac都为ee:ee:ee:ee:ee:ee
3. 报文到达宿主机tun0
抓包:tcpdump -nn -vvv -i cali2b2e7c9e43e -vvv -nn ! port 22 and ! port 2379 and ! port 6443 and ! port 10250 and ! arp and ! port 53
4. 报文到达 源宿主机eth0
抓包:# tcpdump -nn -vvv -i eth0 -vvv -nn ! port 22 and ! port 2379 and ! port 6443 and ! port 10250 and ! arp and ! port 53 and ! port 2380 and ! host 172.31.7.101 -w 3.eth0.pca
5. 报文到达目的宿主机eth0
此时收到的是源宿主机的IPinIP报文,外层为源宿主机和目的宿主机的源mac目的mac、源IP及目的IP,内部为源pod IP及目的pod的IP,没有使用mac地址,解封后发现是去往10.200.151.205
6. 报文到达目的宿主机tunl0
抓包:# tcpdump -nn -vvv -i tunl0 -vvv -nn ! port 22 and ! port 2379 and ! port 6443 and ! port 10250 and ! arp and ! port 53 and ! port 2380 and ! host 172.31.7.101 -w 5-tunl0.pca
7. 报文到达目的pod与目的宿主机对应的网卡
源IP为源pod ip
源mac为tunl0 mac
目的ip为pod ip
目的mac为目的pod mac
随后报文被转发被目的mac(目的 pod mac)
抓包:tcpdump -nn -vvv -i cali32ecf57bfbe -vvv -nn ! port 22 and ! port 2379 and ! port 6443 and ! port 10250 and ! arp and ! port 53 and ! port 2380 and ! host 172.31.7.101 -w 6-cali32ecf57bfbe.pca
8. 报文到达目的pod
抓包:tcpdump -i eth0 -vvv -nn -w 7-dst-pod.pcap
报文到达目的pod,目的pod接受请求并构建响应报文并原路返回给源pod