1、容器资源限制概述
在使用docker作为容器引擎的时候,可以通过添加--memory、--cpus及更多参数来限制容器可用的cpu和内存,具体参数可以参考docker 资源限制[1],docker对容器进行限制的原理实际上是利用Linux内核的cgroups实现的,cgroups可以限制、记录、隔离进程组所使用的物理资源(包括:CPU、memory、IO 等),为容器实现虚拟化提供了基本保证,是构建Docker等一系列虚拟化管理工具的基石
关于cgroups资源限制实现可以参考Docker 背后的内核知识-cgroups 资源限制[2]
2、问题背景
对于某些容器中运行的服务,通常会自动对当前环境的可用资源数量进行检测,接着根据这些数据来合理分配相应资源
例如nginx容器,nginx通过在配置文件中指定nginx worker_processes[3]选项,默认这个选项参数的值为1,表示nginx仅启动 1 个worker进程
如果需要在大并发环境下优化nginx性能,可以将这个值手动设置成对应环境的cpu核数,或者直接配置成auto让其自动设置,两种设置方法中前者需要将配置文件进行挂载并手动变更配置,后者更为灵活但在容器环境下会有一定问题,因为不管是通过docker直接运行的容器还是通过k8s运行的最小化单元Pod中的容器,识别到的cpu和内存都是所在node节点机器的资源信息,因此对nginx来说并不能直接通过auto参数对cpu进行正确的自动识别,例如我这里的一台node节点及节点上的pod资源信息
# kubectl describe nodes k8s-node-07|grep -A 5 "Capacity"
Capacity:
cpu: 16
ephemeral-storage: 74408452Ki
hugepages-2Mi: 0
memory: 16430184Ki
pods: 110
# docker info|grep -A 6 "Kernel"
Kernel Version: 4.4.247-1.el7.elrepo.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 16
Total Memory: 15.67GiB
Name: k8s-node-07
# kubectl exec -it test-pod-5dff4b89fd-bsh6b -- bash
root@test-pod-5dff4b89fd-bsh6b:/# free -m
total used free shared buff/cache available
Mem: 16045 7915 2354 1002 5775 6222
Swap: 0 0 0
root@test-pod-5dff4b89fd-bsh6b:/# head -2 /proc/meminfo
MemTotal: 16430184 kB
MemFree: 2374064 kB
如果在k8s中通过resources限制了Pod的cpu和内存,例如
resources:
limits:
cpu: "1"
memory: 2Gi
requests:
cpu: 200m
memory: 512Mi
可以在创建出来的pod所在节点机器上通过docker命令查看具体的资源信息
# docker inspect b1f4bfb53a2c|grep -i cgroup
"Cgroup": "",
"CgroupParent": "/kubepods/burstable/podc4a25564-225b-4562-afee-fab8cc5d694f",
"DeviceCgroupRules": null,
# cat /sys/fs/cgroup/cpu/kubepods/burstable/podc4a25564-225b-4562-afee-fab8cc5d694f/cpu.cfs_quota_us
100000
# cat /sys/fs/cgroup/cpu/kubepods/burstable/podc4a25564-225b-4562-afee-fab8cc5d694f/cpu.cfs_period_us
100000
通过查找相关资料得知,对nginx来说,获取CPU核心数是通过系统调用sysconf(_SC_NPROCESSORS_ONLN)来获取的,实际上是通过读取文件/sys/devices/system/cpu/online来获取的,而默认情况下pod中的这个文件信息和宿主机是一样的,因此nginx的worker_processes参数如果设置成auto,那么最终启动的 worker 进程数将会是16个,而nginx所在的Pod本身的cpu限制配置较小时,导致每个worker分配的时间片比较少,这会带来明显的响应慢的问题
# kubectl exec -it test-pod-5dff4b89fd-bsh6b -- cat /sys/devices/system/cpu/online
0-15
3、引入 lxcfs
lxcfs[4]是一个的小型FUSE文件系统,旨在使Linux容器更像一个虚拟机,能够帮助容器正确的识别自身资源,处理对以下文件的信息
/proc/cpuinfo
/proc/diskstats
/proc/meminfo
/proc/stat
/proc/swaps
/proc/uptime
/sys/devices/system/cpu/online
当容器启动时,容器中的/proc/xxx会被挂载成host上lxcfs的目录。例如当容器内的应用如果需要读取/proc/meminfo的信息时,请求就会被导向lxcfs,而lxcfs又会通过cgroup的信息来返回正确的值最终使得容器内的应用正确识别
3.1 在 k8s 中部署 lxcfs
基于k8s部署的lxcfs文件系统的项目地址:https://github.com/denverdino/lxcfs-admission-webhook
其最终利用的原理是基于k8s的动态准入控制 AdmissionWebhook[5]
我这里的k8s集群版本如下
# kubectl version -o yaml
clientVersion:
buildDate: "2020-12-08T17:59:43Z"
compiler: gc
gitCommit: af46c47ce925f4c4ad5cc8d1fca46c7b77d13b38
gitTreeState: clean
gitVersion: v1.20.0
goVersion: go1.15.5
major: "1"
minor: "20"
platform: darwin/amd64
serverVersion:
buildDate: "2019-06-19T16:32:14Z"
compiler: gc
gitCommit: e8462b5b5dc2584fdcd18e6bcfe9f1e4d970a529
gitTreeState: clean
gitVersion: v1.15.0
goVersion: go1.12.5
major: "1"
minor: "15"
platform: linux/amd64
首先获取资源清单并通过脚本一键部署
# git clone https://github.com/denverdino/lxcfs-admission-webhook.git
# cd lxcfs-admission-webhook
# ls deployment
deployment.yaml lxcfs-daemonset.yaml mutatingwebhook.yaml uninstall.sh web.yaml webhook-patch-ca-bundle.sh
install.sh mutatingwebhook-ca-bundle.yaml service.yaml validatingwebhook.yaml webhook-create-signed-cert.sh
# kubectl apply -f deployment/lxcfs-daemonset.yaml
daemonset.apps/lxcfs created
# ./deployment/install.sh
creating certs in tmpdir /var/folders/8n/11ndbfq95jv79gds8wqj2scc0000gn/T/tmp.c6OKXi4L
Generating RSA private key, 2048 bit long modulus
.......................................+++
...............+++
e is 65537 (0x10001)
certificatesigningrequest.certificates.k8s.io/lxcfs-admission-webhook-svc.default created
NAME AGE REQUESTOR CONDITION
lxcfs-admission-webhook-svc.default 0s admin Pending
certificatesigningrequest.certificates.k8s.io/lxcfs-admission-webhook-svc.default approved
W0327 16:35:14.764281 8953 helpers.go:553] --dry-run is deprecated and can be replaced with --dry-run=client.
secret/lxcfs-admission-webhook-certs created
NAME TYPE DATA AGE
lxcfs-admission-webhook-certs Opaque 2 0s
deployment.apps/lxcfs-admission-webhook-deployment created
service/lxcfs-admission-webhook-svc created
mutatingwebhookconfiguration.admissionregistration.k8s.io/mutating-lxcfs-admission-webhook-cfg created
查看部署结果,会运行一个名为lxcfs-admission-webhook-deployment的pod,以及在所有节点上以ds的方式运行一个lxcfs的pod
kubectl get pods -o wide|grep lxcfs
lxcfs-admission-webhook-deployment-6896958c4c-56k54 1/1 Running 0 80s 172.20.7.51 172.16.1.111 <none> <none>
lxcfs-67cgk 1/1 Running 0 94s 172.20.0.25 172.16.1.100 <none> <none>
lxcfs-c4lkx 1/1 Running 0 93s 172.20.1.25 172.16.1.101 <none> <none>
...
3.2 开启命名空间注入
# kubectl label namespace default lxcfs-admission-webhook=enabled
为指定的命名空间开启lxcfs注入,开启后该命名空间下所有新创建的Pod都将被注入lxcfs
3.3 还原
如果是要还原安装的环境,执行目录中的卸载脚本即可
# ./deployment/uninstall.sh
mutatingwebhookconfiguration.admissionregistration.k8s.io "mutating-lxcfs-admission-webhook-cfg" deleted
service "lxcfs-admission-webhook-svc" deleted
deployment.apps "lxcfs-admission-webhook-deployment" deleted
secret "lxcfs-admission-webhook-certs" deleted
# kubectl delete -f deployment/lxcfs-daemonset.yaml
daemonset.apps "lxcfs" deleted
4、测试
克隆下来的代码中提供了一个用于测试的httpd pod的yaml,可以直接部署
# kubectl apply -f deployment/web.yaml
deployment.apps/web created
# kubectl get pods -l app=web
NAME READY STATUS RESTARTS AGE
web-5ff5cd75f8-74pr6 1/1 Running 0 27s
web-5ff5cd75f8-bcm2x 1/1 Running 0 27s
进入容器查看资源
kubectl exec -it web-5ff5cd75f8-74pr6 -- bash
root@web-5ff5cd75f8-74pr6:/usr/local/apache2# free -m
total used free shared buffers cached
Mem: 256 15 240 0 0 0
-/+ buffers/cache: 14 241
Swap: 0 0 0
root@web-5ff5cd75f8-74pr6:/usr/local/apache2# cat /proc/cpuinfo| grep "processor"| wc -l
1
实际上通过lxcfs+动态准入控制,在创建新的pod时自动挂载了主机的相关文件,可以通过下面的方式查看
# kubectl describe pods web-5ff5cd75f8-74pr6
...
Mounts:
/proc/cpuinfo from lxcfs-proc-cpuinfo (rw)
/proc/diskstats from lxcfs-proc-diskstats (rw)
/proc/loadavg from lxcfs-proc-loadavg (rw)
/proc/meminfo from lxcfs-proc-meminfo (rw)
/proc/stat from lxcfs-proc-stat (rw)
/proc/swaps from lxcfs-proc-swaps (rw)
/proc/uptime from lxcfs-proc-uptime (rw)
/sys/devices/system/cpu/online from lxcfs-sys-devices-system-cpu-online (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-jtj98 (ro)
...
5、小结
容器中的pod已经能正确的读取到cpu及内存的限制值了,如果是自身应用要读取所在环境的资源配置,如果出现问题,一定要从底层弄清楚是如何获取到的环境资源
通过上面的测试可以看到lxcfs也自动挂载了nginx需要的/sys/devices/system/cpu/online文件到pod中了,因此nginx容器中worker process自动设置的问题经过测试验证也已得到了解决
参考资料
[1]
docker 资源限制: https://docs.docker.com/config/containers/resource_constraints/
[2]
Docker 背后的内核知识-cgroups 资源限制:
[3]
nginx worker_processes: http://nginx.org/en/docs/ngx_core_module.html#worker_processes
[4]
lxcfs: https://github.com/lxc/lxcfs
lxcfs-admission-webhook: https://github.com/denverdino/lxcfs-admission-webhook
[5]
动态准入控制 AdmissionWebhook: https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#admission-webhooks