说明
查看官方文档升级的操作需要做如下注意事项。
- 2.6.x 与 3.x 使用的etcd(这里只是针对 etcd 存储来说) 是不同的,2.6 的使用的是 etcdv2, 而3.x 是 etcdv3.
- 如果想从 2.6.x 升级到 3.x 至少得是2.6.5+的才行。
所以针对现有的情况,需要先升级至 2.6.5+ ,再升级 3.x。
2.6.1 升级至 2.6.12
2019/12/25
现有环境,使用 etcdv2 进行存储的 calico 数据。
[root@k8s-1 kubelet]# which etcdv2
alias etcdv2='export ETCDCTL_API=2; /bin/etcdctl --ca-file /etc/etcd/ssl/etcd-root-ca.pem --cert-file /etc/etcd/ssl/etcd.pem --key-file /etc/etcd/ssl/etcd-key.pem --endpoints https://10.111.32.239:2379,https://10.111.32.241:2379,https://10.111.32.242:2379'
[root@k8s-1 kubelet]# etcdv2 ls /calico/ipam/v2/assignment/ipv4
/calico/ipam/v2/assignment/ipv4/block
[root@k8s-1 kubelet]# etcdv2 ls /calico/ipam/v2/assignment/ipv4/block
/calico/ipam/v2/assignment/ipv4/block/10.20.134.64-26
/calico/ipam/v2/assignment/ipv4/block/10.20.253.64-26
/calico/ipam/v2/assignment/ipv4/block/10.20.28.192-26
/calico/ipam/v2/assignment/ipv4/block/10.20.51.128-26
/calico/ipam/v2/assignment/ipv4/block/10.20.78.0-26
/calico/ipam/v2/assignment/ipv4/block/10.20.112.64-26
/calico/ipam/v2/assignment/ipv4/block/10.20.15.128-26
/calico/ipam/v2/assignment/ipv4/block/10.20.235.0-26
/calico/ipam/v2/assignment/ipv4/block/10.20.53.64-26
/calico/ipam/v2/assignment/ipv4/block/10.20.72.128-26
根据文档中的说明,升级至 3.0 需要至少 2.6.5+ ,且需要进行一些手动的操作,因为 3.x 的使用 etcdv3, 而 2.6.x 的使用 etcdv2。
现在集群使用的是 2.6.1 的版本,先将其升级至 2.6.5+。
这里选择 2.6 中最新的 2.6.12
下载 calico.yaml 文件
[root@docker-182 v2.6]# wget https://docs.projectcalico.org/v2.6/getting-started/kubernetes/installation/rbac.yaml
[root@docker-182 v2.6]# wget https://docs.projectcalico.org/v2.6/getting-started/kubernetes/installation/hosted/calico.yaml
# 更改 calico.yaml 中的配置
[root@docker-182 v2.6]# sh -x modify_calico_yaml.sh
预先拉取镜像
[root@docker-182 v2.6]# grep image calico.yaml
image: quay.io/calico/node:v2.6.12
image: quay.io/calico/cni:v1.11.8
image: quay.io/calico/kube-controllers:v1.0.5
image: quay.io/calico/kube-controllers:v1.0.5
文档中说的一些升级步骤,比如先升级 calico-kube-controllers ,再升级 calico-node 的daemonset ,这里就直接 apply 新的资源文件
并不包含 calico 的 rbac 资源。
[root@docker-182 v2.6]# k239 apply -f calico.yaml
configmap "calico-config" unchanged
secret "calico-etcd-secrets" unchanged
daemonset "calico-node" configured
deployment "calico-kube-controllers" configured
deployment "calico-policy-controller" configured
serviceaccount "calico-kube-controllers" unchanged
serviceaccount "calico-node" unchanged
提交更新
提交之后, daemonset 的 calico-node 并没有更新,现在删除 pod ,使其更新
[root@k8s-1 v2.6]# kubectl -n kube-system get pod -o wide |grep calico
calico-kube-controllers-6768b96c5f-rdbjp 1/1 Running 0 4m 10.111.32.243 k8s-4.geotmt.com
calico-node-45lnh 0/1 ContainerCreating 0 4h 10.111.32.241 k8s-2.geotmt.com
calico-node-49mq7 1/1 Running 1 5h 10.111.32.243 k8s-4.geotmt.com
calico-node-m86hr 1/1 Running 0 5h 10.111.32.244 k8s-5.geotmt.com
calico-node-mm5fz 0/1 ContainerCreating 0 4h 10.111.32.239 k8s-1.geotmt.com
calico-node-shrfw 1/1 Running 0 4h 10.111.32.242 k8s-3.geotmt.com
calico-node-xx8hk 1/1 Running 0 5h 10.111.32.245 k8s-6.geotmt.com
更新后的测试
其中一个的示例,新的 calico-node 其中有两个容器。
[root@k8s-1 v2.6]# kubectl -n kube-system get pod -o wide |grep calico |grep k8s-6
calico-node-fj4t8 2/2 Running 0 25s 10.111.32.245 k8s-6.geotmt.com
测试 ping 其他节点的 pod 正常
bash-4.4# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN qlen 1000
link/ipip 0.0.0.0 brd 0.0.0.0
4: eth0@if30: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UP
link/ether 6e:20:a3:45:42:49 brd ff:ff:ff:ff:ff:ff
inet 10.20.235.12/32 scope global eth0
valid_lft forever preferred_lft forever
bash-4.4# ping 10.20.15.135
PING 10.20.15.135 (10.20.15.135): 56 data bytes
64 bytes from 10.20.15.135: seq=0 ttl=62 time=1.133 ms
64 bytes from 10.20.15.135: seq=1 ttl=62 time=0.631 ms
这个版本的仍需手动添加 toleration,以便在 master 节点上部署 pod。
升级至 2.6.12 完成。
2.6.12 升级至 3.0
- 升级前的注意事项
- You must first upgrade to Calico v2.6.5 (or a later v2.6.x release) before you can upgrade to Calico v3.0.12. (Important: Calico v2.6.5 was a special transitional release that included changes to enable upgrade to v3.0.1+; do not skip this step!)
- If you are using the etcd datastore, you should upgrade etcd to the latest stable v3 release.
上述两条都满足。
[root@k8s-1 net.d]# etcdctl version
etcdctl version: 3.3.11
API version: 3.3
- etcd datastore upgrade steps
- Install and configure calico-upgrade
- Test the data migration and check for errors
- Migrate Calico data
- Upgrade Calico
安装配置 calico-upgrade
[root@docker-182 ansible]# wget https://github.com/projectcalico/calico-upgrade/releases/download/v1.0.5/calico-upgrade
[root@docker-182 k8s_239]# ansible-playbook install_calico-upgrade.yml
使用 dry-run 执行测试
[root@k8s-1 calico-upgrade]# calico-upgrade dry-run --output-dir=tmp --apiconfigv1 /etc/calico/apiconfigv1.cfg --apiconfigv3 /etc/calico/apiconfigv3.cfg
执行升级
[root@k8s-1 calico-upgrade]# calico-upgrade start --ignore-v3-data --apiconfigv1 /etc/calico/apiconfigv1.cfg --apiconfigv3 /etc/calico/apiconfigv3.cfg
Preparing reports directory
* creating report directory if it does not exist
* validating permissions and removing old reports
Checking Calico version is suitable for migration
* determined Calico version of: v2.6.12
* the v1 API data can be migrated to the v3 API
Validating conversion of v1 data to v3
* handling FelixConfiguration (global) resource
* handling ClusterInformation (global) resource
* handling FelixConfiguration (per-node) resources
* handling BGPConfiguration (global) resource
* handling Node resources
* handling BGPPeer (global) resources
* handling BGPPeer (node) resources
* handling HostEndpoint resources
* handling IPPool resources
* handling GlobalNetworkPolicy resources
* handling Profile resources
* handling WorkloadEndpoint resources
* data conversion successful
Data conversion validated successfully
Validating the v3 datastore
* the v3 datastore is not empty
-------------------------------------------------------------------------------
Successfully validated v1 to v3 conversion.
You are about to start the migration of Calico v1 data format to Calico v3 data
format. During this time and until the upgrade is completed Calico networking
will be paused - which means no new Calico networked endpoints can be created.
No Calico configuration should be modified using calicoctl during this time.
Type "yes" to proceed (any other input cancels): yes
Pausing Calico networking
* successfully paused Calico networking in the v1 configuration
Calico networking is now paused - waiting for 15s
Querying current v1 snapshot and converting to v3
* handling FelixConfiguration (global) resource
* handling ClusterInformation (global) resource
* handling FelixConfiguration (per-node) resources
* handling BGPConfiguration (global) resource
* handling Node resources
* handling BGPPeer (global) resources
* handling BGPPeer (node) resources
* handling HostEndpoint resources
* handling IPPool resources
* handling GlobalNetworkPolicy resources
* handling Profile resources
* handling WorkloadEndpoint resources
* data converted successfully
Storing v3 data
* Storing resources in v3 format
* success: resources stored in v3 datastore
Migrating IPAM data
* listing and converting IPAM allocation blocks
* listing and converting IPAM affinity blocks
* listing IPAM handles
* storing IPAM data in v3 format
* IPAM data migrated successfully
Data migration from v1 to v3 successful
* check the output for details of the migrated resources
* continue by upgrading your calico/node versions to Calico v3.x
-------------------------------------------------------------------------------
Successfully migrated Calico v1 data to v3 format.
Follow the detailed upgrade instructions available in the release documentation
to complete the upgrade. This includes:
* upgrading your calico/node instances and orchestrator plugins (e.g. CNI) to
the required v3.x release
* running 'calico-upgrade complete' to complete the upgrade and resume Calico
networking
See report(s) below for details of the migrated data.
Reports:
- name conversion: /root/calico-upgrade/calico-upgrade-report/convertednames
下载 v3.0 资源文件
[root@docker-182 v3.0]# wget https://docs.projectcalico.org/v3.0/getting-started/kubernetes/installation/rbac.yaml
[root@docker-182 v3.0]# wget https://docs.projectcalico.org/v3.0/getting-started/kubernetes/installation/hosted/calico.yaml
3.0的改变可参考 3.0release note
预先下载所需镜像
[root@docker-182 v3.0]# grep image calico.yaml
image: quay.io/calico/node:v3.0.12
image: quay.io/calico/cni:v3.0.12
image: quay.io/calico/kube-controllers:v3.0.12
执行升级
[root@docker-182 v3.0]# k239 apply -f calico.yaml
configmap "calico-config" configured
secret "calico-etcd-secrets" unchanged
daemonset "calico-node" configured
deployment "calico-kube-controllers" configured
serviceaccount "calico-kube-controllers" unchanged
serviceaccount "calico-node" unchanged
这里的 pod 可以实现滚动重启,待pod 都升级完成后。
执行 calico-upgrade 命令确定升级完成
[root@k8s-1 calico-upgrade]# calico-upgrade complete --apiconfigv1 /etc/calico/apiconfigv1.cfg --apiconfigv3 /etc/calico/apiconfigv3.cfg
You are about to complete the upgrade process to Calico v3. At this point, the
v1 format data should have been successfully converted to v3 format, and all
calico/node instances and orchestrator plugins (e.g. CNI) should be running
Calico v3.x.
Type "yes" to proceed (any other input cancels): yes
Completing upgrade
Enabling Calico networking for v3
* successfully resumed Calico networking in the v3 configuration (updated
ClusterInformation)
Upgrade completed successfully
-------------------------------------------------------------------------------
Successfully completed the upgrade process.
如不执行上述命令,会有如下报错
E1225 19:56:04.837028 3281 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "demo-deployment-6f4c6779b-b8zqq_default(1dd28cf0-270d-11ea-bd6c-c6a864ab864a)" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "demo-deployment-6f4c6779b-b8zqq_default" network: Calico is currently not ready to process requests
E1225 19:56:04.837049 3281 kuberuntime_manager.go:647] createPodSandbox for pod "demo-deployment-6f4c6779b-b8zqq_default(1dd28cf0-270d-11ea-bd6c-c6a864ab864a)" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "demo-deployment-6f4c6779b-b8zqq_default" network: Calico is currently not ready to process requests
E1225 19:56:04.837167 3281 pod_workers.go:186] Error syncing pod 1dd28cf0-270d-11ea-bd6c-c6a864ab864a ("demo-deployment-6f4c6779b-b8zqq_default(1dd28cf0-270d-11ea-bd6c-c6a864ab864a)"), skipping: failed to "CreatePodSandbox" for "demo-deployment-6f4c6779b-b8zqq_default(1dd28cf0-270d-11ea-bd6c-c6a864ab864a)" with CreatePodSandboxError: "CreatePodSandbox for pod \"demo-deployment-6f4c6779b-b8zqq_default(1dd28cf0-270d-11ea-bd6c-c6a864ab864a)\" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod \"demo-deployment-6f4c6779b-b8zqq_default\" network: Calico is currently not ready to process requests"
升级至 3.0.12 成功。
3.0.12 升级至 3.11
根据 3.11 的 Upgrading Calico on Kubernetes
说明。升级时,只需要提交新的资源文件即可(本环境不涉及 Application Layer Policy
)。
这个版本的 calico 已经可以完整支持 k8s api 的datastore, 更新时要注意下载文件时是否与自己的环境契合。
本环境下载 etcd datastore 的版本。
下载资源文件
[root@docker-182 v3.11]# wget https://docs.projectcalico.org/v3.11/manifests/calico-etcd.yaml
# 修改其中关于 etcd 的配置
[root@docker-182 v3.11]# bash -x modify_calico_yaml.sh
预先下载镜像
[root@docker-182 v3.11]# grep image calico-etcd.yaml
image: calico/cni:v3.11.1
image: calico/pod2daemon-flexvol:v3.11.1
image: calico/node:v3.11.1
image: calico/kube-controllers:v3.11.1
提交新版本
[root@docker-182 v3.11]# k239 apply -f calico-etcd.yaml
secret "calico-etcd-secrets" unchanged
configmap "calico-config" configured
clusterrole "calico-kube-controllers" configured
clusterrolebinding "calico-kube-controllers" configured
clusterrole "calico-node" configured
clusterrolebinding "calico-node" configured
daemonset "calico-node" configured
serviceaccount "calico-node" unchanged
deployment "calico-kube-controllers" configured
serviceaccount "calico-kube-controllers" unchanged
验证新版本
查看新版本的 pod, 每个 pod 内只有一个容器,这个版本的将 install-cni 和 flexvol-driver(旧版本没有) 作为了 initContainers
,所以常驻的就只有一个容器了
[root@docker-182 ~]# k239 -n kube-system get pod -o wide |grep calico
calico-kube-controllers-85dc4fd46b-4wnmt 1/1 Running 0 1m 10.111.32.243 k8s-4.geotmt.com
calico-node-4bgkc 1/1 Running 0 59s 10.111.32.241 k8s-2.geotmt.com
calico-node-5jg2t 1/1 Running 0 31s 10.111.32.244 k8s-5.geotmt.com
calico-node-9fn6r 1/1 Running 0 43s 10.111.32.245 k8s-6.geotmt.com
calico-node-9n7dn 1/1 Running 0 1m 10.111.32.243 k8s-4.geotmt.com
calico-node-fxr46 1/1 Running 0 1m 10.111.32.239 k8s-1.geotmt.com
calico-node-pgh5c 1/1 Running 0 1m 10.111.32.242 k8s-3.geotmt.com
测试 pod 的跨主机通信
[root@k8s-1 ~]# kubectl exec -it demo-deployment-6f4c6779b-b8zqq /bin/bash
bash-4.4# ping 10.20.235.12
PING 10.20.235.12 (10.20.235.12): 56 data bytes
64 bytes from 10.20.235.12: seq=0 ttl=62 time=1.232 ms
^C
--- 10.20.235.12 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 1.232/1.232/1.232 ms
bash-4.4# ping 10.20.253.80
PING 10.20.253.80 (10.20.253.80): 56 data bytes
64 bytes from 10.20.253.80: seq=0 ttl=62 time=1.730 ms
64 bytes from 10.20.253.80: seq=1 ttl=62 time=1.385 ms
^C
--- 10.20.253.80 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 1.385/1.557/1.730 ms
bash-4.4# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN qlen 1000
link/ipip 0.0.0.0 brd 0.0.0.0
4: eth0@if51: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UP
link/ether fa:d1:55:42:ab:6c brd ff:ff:ff:ff:ff:ff
inet 10.20.15.163/32 scope global eth0
valid_lft forever preferred_lft forever
测试pod重建分配地址,成功
[root@k8s-1 ~]# kubectl delete pod nginx-deployment-7b66d98974-2rh87
pod "nginx-deployment-7b66d98974-2rh87" deleted
[root@k8s-1 ~]# kubectl get pod nginx-deployment-7b66d98974-nd8h7 -o wide
NAME READY STATUS RESTARTS AGE IP NODE
nginx-deployment-7b66d98974-nd8h7 1/1 Running 0 1m 10.20.253.86 k8s-4.geotmt.com
calico 3.0.12 升级至 3.11.1 成功。
参考
- https://docs.projectcalico.org/v3.0/getting-started/kubernetes/upgrade/ : calico 3.0 升级文档
- https://docs.projectcalico.org/v2.6/getting-started/kubernetes/installation/hosted/hosted : Standard Hosted Install (2.6 的安装文档)
- https://docs.projectcalico.org/v2.6/getting-started/kubernetes/upgrade : 2.6 的升级文档
- https://docs.projectcalico.org/v3.0/getting-started/kubernetes/installation/hosted/hosted : calico 3.0 安装文档
- https://docs.projectcalico.org/v3.11/maintenance/kubernetes-upgrade : Upgrading Calico on Kubernetes (3.11的升级文档)