背景

随着微服务概念的深入人心,随着docker开发的持续进行,我们在生产的过程中将会产生大量的docker镜像,这些镜像会随着版本迭代的过程中,这些镜像将会占用大量的存储空间,本文将分析影响镜像大小的因素,随后提供镜像瘦身的思路。

Dockerfile、Docker 镜像和 Docker 容器的关系

不可避免地,我们在docker学习的过程中一定绕不开理解这三者的关系,从研发流程的角度来看来看,Dockerfile 是软件的原材料,Docker 镜像是软件的交付品,而 Docker 容器则可以认为是软件的运行的状态。从应用软件的角度来看,Dockerfile、Docker 镜像与 Docker 容器分别代表软件的三个不同阶段,Dockerfile 面向开发,Docker 镜像成为交付标准,Docker 容器则涉及部署与运维,三者缺一不可,合力充当 Docker 体系的基石。
简单来讲,Dockerfile构建出Docker镜像,通过Docker镜像运行Docker容器。后续有机会再详细分析这三者的关系

精简镜像

以下是我们精简镜像的目的:

  • 更快的构建速度
  • 更小的Docker镜像大小
  • 更少的Docker镜像层
  • 充分利用镜像缓存
  • 增加Dockerfile可读性
  • 让Docker容器使用起来更简单

原理

简单来讲就是:最基础的镜像+合并操作指令(减少操作指令)

瘦身行动

我们即将通过一系列的dockerfile开发,对镜像打包进行一次次的试验,以尽最大努力的减小镜像的大小
原始文件
这是一份非常糟糕的Dockerfile文件打包之后,仅作为分析用例

FROM python:3.5.6-slim-stretch
RUN apt-get update
RUN echo python -V 
RUN mkdir /code 
RUN mkdir /code/db
ADD . /code/
WORKDIR /code
RUN pip install -r requirement

运行命令:docker build -t spidermax:test_01 .
将会对镜像进行打包,docker相关程序会读取该文件(Dockerfile),以下是镜像层的部署过程

Sending build context to Docker daemon 289.8kB
Step 1/9 : FROM python:3.5.6-slim-stretch
---> 86669b8e5771
Step 2/9 : RUN apt-get update
---> Using cache
---> 1b77542b82f4
Step 3/9 : RUN echo python -V
---> Using cache
---> de6cc0a916d7
Step 4/9 : RUN mkdir /code
---> Using cache
---> 996828abe04e
Step 5/9 : RUN mkdir /code/db
---> Using cache
---> 2db7c4dfb918
Step 6/9 : ADD . /code/
---> d9d92db45b29
Step 7/9 : WORKDIR /code
---> Running in 5ae405de28fc
Removing intermediate container 5ae405de28fc
---> a7e3696c6a5b
Step 8/9 : RUN pip install -U pip
---> Running in dcf5b22f6f33
Collecting pip
Downloading https://files.pythonhosted.org/packages/d8/f3/413bab4ff08e1fc4828dfc59996d721917df8e8583ea85385d51125dceff/pip-19.0.3-py2.py3-none-any.whl (1.4MB)
Installing collected packages: pip
Found existing installation: pip 18.1
Uninstalling pip-18.1:
Successfully uninstalled pip-18.1
Successfully installed pip-19.0.3
Removing intermediate container dcf5b22f6f33
---> 949d89aabe2b
Step 9/9 : RUN pip install -r requirement
---> Running in 697c65e4161a
Collecting Django==2.1.7 (from -r requirement (line 1))
Downloading https://files.pythonhosted.org/packages/c7/87/fbd666c4f87591ae25b7bb374298e8629816e87193c4099d3608ef11fab9/Django-2.1.7-py3-none-any.whl (7.3MB)
Collecting EasyProcess==0.2.5 (from -r requirement (line 2))
Downloading https://files.pythonhosted.org/packages/45/3a/4eecc0c7995a13a64739bbedc0d3691fc574245b7e79cff81905aa0c2b38/EasyProcess-0.2.5.tar.gz
Collecting PyMySQL==0.9.3 (from -r requirement (line 3))
Downloading https://files.pythonhosted.org/packages/ed/39/15045ae46f2a123019aa968dfcba0396c161c20f855f11dea6796bcaae95/PyMySQL-0.9.3-py2.py3-none-any.whl (47kB)
Collecting pytz==2018.9 (from -r requirement (line 4))
Downloading https://files.pythonhosted.org/packages/61/28/1d3920e4d1d50b19bc5d24398a7cd85cc7b9a75a490570d5a30c57622d34/pytz-2018.9-py2.py3-none-any.whl (510kB)
Collecting selenium==3.141.0 (from -r requirement (line 5))
Downloading https://files.pythonhosted.org/packages/80/d6/4294f0b4bce4de0abf13e17190289f9d0613b0a44e5dd6a7f5ca98459853/selenium-3.141.0-py2.py3-none-any.whl (904kB)
Collecting urllib3==1.24.1 (from -r requirement (line 6))
Downloading https://files.pythonhosted.org/packages/62/00/ee1d7de624db8ba7090d1226aebefab96a2c71cd5cfa7629d6ad3f61b79e/urllib3-1.24.1-py2.py3-none-any.whl (118kB)
Building wheels for collected packages: EasyProcess
Building wheel for EasyProcess (setup.py): started
Building wheel for EasyProcess (setup.py): finished with status 'done'
Stored in directory: /root/.cache/pip/wheels/41/22/19/af15ef6264c58b625a82641ed7483ad05e258fbd8925505227
Successfully built EasyProcess
Installing collected packages: pytz, Django, EasyProcess, PyMySQL, urllib3, selenium
Successfully installed Django-2.1.7 EasyProcess-0.2.5 PyMySQL-0.9.3 pytz-2018.9 selenium-3.141.0 urllib3-1.24.1
Removing intermediate container 697c65e4161a
---> f59692faac28
Successfully built f59692faac28
Successfully tagged spidermax:test_01

通过docker images查看镜像大小
docker镜像瘦身行动
通过docker history我们来观察镜像层打包的过程
docker镜像瘦身行动
这里有若干个missing层,其实是docker打包程序程序自动优化的过程。Docker为了加快镜像构建速度,也会将每一个镜像层缓存下来。利用这个特性,可以提升工作效率,不过能用到这个特性却是有点尴尬:

  • 上一个指令能够在上一层缓存中找到
  • 缓存中存在一个镜像层,而且指令和即将打包的指令一模一样,即使有空格也不行!
  • 如果有类似COPY或者ADD指令,如果他们引用的文件的元数据发生变化,缓存也无效

优化1

接下来我们对相同属性指令进行合并,比如RUN命令等

FROM python:3.5.6-slim-stretch
RUN apt-get update &&\
    echo python -V &&\
    mkdir /code &&\
    mkdir /code/db
ADD . /code/
WORKDIR /code
RUN pip install -U pip &&\
    pip install -r requirement

以下是编译日志

docker build -t spidermax:test_02 .
Sending build context to Docker daemon 289.8kB
Step 1/5 : FROM python:3.5.6-slim-stretch
---> 86669b8e5771
Step 2/5 : RUN apt-get update && echo python -V && mkdir /code && mkdir /code/db
---> Running in 2a879b85221f
Get:1 http://security.debian.org/debian-security stretch/updates InRelease [94.3 kB]
Get:2 http://security.debian.org/debian-security stretch/updates/main amd64 Packages [481 kB]
Ign:3 http://deb.debian.org/debian stretch InRelease
Get:4 http://deb.debian.org/debian stretch-updates InRelease [91.0 kB]
Get:5 http://deb.debian.org/debian stretch Release [118 kB]
Get:6 http://deb.debian.org/debian stretch-updates/main amd64 Packages [11.1 kB]
Get:7 http://deb.debian.org/debian stretch Release.gpg [2434 B]
Get:8 http://deb.debian.org/debian stretch/main amd64 Packages [7084 kB]
Fetched 7881 kB in 8s (966 kB/s)
Reading package lists...
python -V
Removing intermediate container 2a879b85221f
---> 5ce22d4611e9
Step 3/5 : ADD . /code/
---> af87134918ca
Step 4/5 : WORKDIR /code
---> Running in 2a5c72670be3
Removing intermediate container 2a5c72670be3
---> 0324df92356d
Step 5/5 : RUN pip install -U pip && pip install -r requirement
---> Running in 439cc1586076
Collecting pip
Downloading https://files.pythonhosted.org/packages/d8/f3/413bab4ff08e1fc4828dfc59996d721917df8e8583ea85385d51125dceff/pip-19.0.3-py2.py3-none-any.whl (1.4MB)
Installing collected packages: pip
Found existing installation: pip 18.1
Uninstalling pip-18.1:
Successfully uninstalled pip-18.1
Successfully installed pip-19.0.3
Collecting Django==2.1.7 (from -r requirement (line 1))
Downloading https://files.pythonhosted.org/packages/c7/87/fbd666c4f87591ae25b7bb374298e8629816e87193c4099d3608ef11fab9/Django-2.1.7-py3-none-any.whl (7.3MB)
Collecting EasyProcess==0.2.5 (from -r requirement (line 2))
Downloading https://files.pythonhosted.org/packages/45/3a/4eecc0c7995a13a64739bbedc0d3691fc574245b7e79cff81905aa0c2b38/EasyProcess-0.2.5.tar.gz
Collecting PyMySQL==0.9.3 (from -r requirement (line 3))
Downloading https://files.pythonhosted.org/packages/ed/39/15045ae46f2a123019aa968dfcba0396c161c20f855f11dea6796bcaae95/PyMySQL-0.9.3-py2.py3-none-any.whl (47kB)
Collecting pytz==2018.9 (from -r requirement (line 4))
Downloading https://files.pythonhosted.org/packages/61/28/1d3920e4d1d50b19bc5d24398a7cd85cc7b9a75a490570d5a30c57622d34/pytz-2018.9-py2.py3-none-any.whl (510kB)
Collecting selenium==3.141.0 (from -r requirement (line 5))
Downloading https://files.pythonhosted.org/packages/80/d6/4294f0b4bce4de0abf13e17190289f9d0613b0a44e5dd6a7f5ca98459853/selenium-3.141.0-py2.py3-none-any.whl (904kB)
Collecting urllib3==1.24.1 (from -r requirement (line 6))
Downloading https://files.pythonhosted.org/packages/62/00/ee1d7de624db8ba7090d1226aebefab96a2c71cd5cfa7629d6ad3f61b79e/urllib3-1.24.1-py2.py3-none-any.whl (118kB)
Building wheels for collected packages: EasyProcess
Building wheel for EasyProcess (setup.py): started
Building wheel for EasyProcess (setup.py): finished with status 'done'
Stored in directory: /root/.cache/pip/wheels/41/22/19/af15ef6264c58b625a82641ed7483ad05e258fbd8925505227
Successfully built EasyProcess
Installing collected packages: pytz, Django, EasyProcess, PyMySQL, urllib3, selenium
Successfully installed Django-2.1.7 EasyProcess-0.2.5 PyMySQL-0.9.3 pytz-2018.9 selenium-3.141.0 urllib3-1.24.1
Removing intermediate container 439cc1586076
---> d767a8a2d9f2
Successfully built d767a8a2d9f2
Successfully tagged spidermax:test_02

虽然在大小和原来没有啥区别
docker镜像瘦身行动
但是在镜像打包上,步骤上少了很多,这样在一定程度上就缩短镜像打包的时间,这一点从刚才的打包日志也有所体现,因为test01版本是需要9步,优化1版本是仅需要5步,打包效率提升将近1倍
docker镜像瘦身行动

优化2

我们所使用的基础镜像是python:3.5.6-slim-stretch,该基础镜像导致我们每次打包都需要在其之上进行操作,这样也导致我们的奖项将会变得非常的大,毕竟基础就已经达到135M之大,虽然说我们不应该重复造轮子,但是这样的轮子我们要不起啊
docker镜像瘦身行动
我们不妨重新反思一下我们所使用的镜像功能,无非就是python的编译器、pip以及apt功能,至于说另外的django等文件,是我们通过requirement来安装的,编译器自带的若干依赖我们没怎么用到(这么描述,不是很严谨),那我们为什么不干脆自己部署一个python环境,只要有apt,还有什么事情没有不能做的?
这里推荐使用apline这款镜像,简直是恶魔!竟然只有4M左右
docker镜像瘦身行动
Alpine 的意思是“高山的”,比如 Alpine plants高山植物,Alpine skiing高山滑雪、the alpine resort阿尔卑斯山胜地。
Alpine Linux 网站首页注明“Small!Simple!Secure!Alpine Linux is a security-oriented, lightweight Linux distribution based on musl libc and busybox.”概括了以下特点:

  1. 小巧:基于Musl libc和busybox,和busybox一样小巧,最小的Docker镜像只有4MB;
  2. 安全:面向安全的轻量发行版;
  3. 简单:提供APK包管理工具,软件的搜索、安装、删除、升级都非常方便。
  4. 适合容器使用:由于小巧、功能完备,非常适合作为容器的基础镜像。
    docker镜像瘦身行动
    test03版本的dockerfile如下
    FROM alpine:3.7
    RUN apk update && apk upgrade &&\
    apk add python3 &&\
    apk add python3-dev &&\
    python3 -m ensurepip &&\
    if [ ! -e /usr/bin/pip ]; then ln -s pip3 /usr/bin/pip ; fi &&\
    if [[ ! -e /usr/bin/python ]]; then ln -sf /usr/bin/python3 /usr/bin/python; fi &&\
    mkdir /code &&  mkdir /code/db
    ADD . /code/
    WORKDIR /code
    RUN pip install -U pip && pip install -r requirement

    因为安装python3的时候默认会是pip3,为了安装方便我们需要将pip3修改为pip,此外还有将python3修改为python,方便我们对镜像的启动,以下是运行的日志
    docker build -t spidermax:test_03 .
    Sending build context to Docker daemon 290.3kB
    Step 1/5 : FROM alpine:3.7
    ---> 6d1ef012b567
    Step 2/5 : RUN apk update && apk upgrade && apk add python3 && apk add python3-dev && python3 -m ensurepip && if [ ! -e /usr/bin/pip ]; then ln -s pip3 /usr/bin/pip ; fi && if [[ ! -e /usr/bin/python ]]; then ln -sf /usr/bin/python3 /usr/bin/python; fi && mkdir /code && mkdir /code/db
    ---> Running in ef033831c1a9
    fetch http://dl-cdn.alpinelinux.org/alpine/v3.7/main/x86_64/APKINDEX.tar.gz
    fetch http://dl-cdn.alpinelinux.org/alpine/v3.7/community/x86_64/APKINDEX.tar.gz
    v3.7.3-6-g3eacd5f9a6 [http://dl-cdn.alpinelinux.org/alpine/v3.7/main]
    v3.7.3-4-g7296a289a6 [http://dl-cdn.alpinelinux.org/alpine/v3.7/community]
    OK: 9049 distinct packages available
    OK: 4 MiB in 13 packages
    (1/11) Installing libbz2 (1.0.6-r6)
    (2/11) Installing expat (2.2.5-r0)
    (3/11) Installing libffi (3.2.1-r4)
    (4/11) Installing gdbm (1.13-r1)
    因为安装python3的时候默认会是pip3,为了安装方便我们需要将pip3修改为pip,此外还有将python3修改为python,方便我们对镜像的启动,以下是运行的日志

    docker build -t spidermax:test_03 .
    Sending build context to Docker daemon 290.3kB
    Step 1/5 : FROM alpine:3.7
    ---> 6d1ef012b567
    Step 2/5 : RUN apk update && apk upgrade && apk add python3 && apk add python3-dev && python3 -m ensurepip && if [ ! -e /usr/bin/pip ]; then ln -s pip3 /usr/bin/pip ; fi && if [[ ! -e /usr/bin/python ]]; then ln -sf /usr/bin/python3 /usr/bin/python; fi && mkdir /code && mkdir /code/db
    ---> Running in ef033831c1a9
    fetch http://dl-cdn.alpinelinux.org/alpine/v3.7/main/x86_64/APKINDEX.tar.gz
    fetch http://dl-cdn.alpinelinux.org/alpine/v3.7/community/x86_64/APKINDEX.tar.gz
    v3.7.3-6-g3eacd5f9a6 [http://dl-cdn.alpinelinux.org/alpine/v3.7/main]
    v3.7.3-4-g7296a289a6 http://dl-cdn.alpinelinux.org/alpine/v3.7/community Installing xz-libs (5.2.3-r1)
    (6/11) Installing ncurses-terminfo-base (6.0_p20171125-r1)
    (7/11) Installing ncurses-terminfo (6.0_p20171125-r1)
    (8/11) Installing ncurses-libs (6.0_p20171125-r1)
    (9/11) Installing readline (7.0.003-r0)
    (10/11) Installing sqlite-libs (3.25.3-r0)
    (11/11) Installing python3 (3.6.5-r0)
    Executing busybox-1.27.2-r11.trigger
    OK: 66 MiB in 24 packages
    (1/2) Installing pkgconf (1.3.10-r0)
    (2/2) Installing python3-dev (3.6.5-r0)
    Executing busybox-1.27.2-r11.trigger
    OK: 79 MiB in 26 packages
    Requirement already satisfied: setuptools in /usr/lib/python3.6/site-packages
    Requirement already satisfied: pip in /usr/lib/python3.6/site-packages
    Removing intermediate container ef033831c1a9
    ---> 4250724684eb
    Step 3/5 : ADD . /code/
    ---> 8f1a939f8ac1
    Step 4/5 : WORKDIR /code
    ---> Running in 137e7b42decb
    Removing intermediate container 137e7b42decb
    ---> 29586daada18
    Step 5/5 : RUN pip install -U pip && pip install -r requirement
    ---> Running in 36baec208714
    Collecting pip
    Downloading https://files.pythonhosted.org/packages/d8/f3/413bab4ff08e1fc4828dfc59996d721917df8e8583ea85385d51125dceff/pip-19.0.3-py2.py3-none-any.whl (1.4MB)
    Installing collected packages: pip
    Found existing installation: pip 9.0.3
    Uninstalling pip-9.0.3:
    Successfully uninstalled pip-9.0.3
    Successfully installed pip-19.0.3
    Collecting Django==2.1.7 (from -r requirement (line 1))
    Downloading https://files.pythonhosted.org/packages/c7/87/fbd666c4f87591ae25b7bb374298e8629816e87193c4099d3608ef11fab9/Django-2.1.7-py3-none-any.whl (7.3MB)
    Collecting EasyProcess==0.2.5 (from -r requirement (line 2))
    Downloading https://files.pythonhosted.org/packages/45/3a/4eecc0c7995a13a64739bbedc0d3691fc574245b7e79cff81905aa0c2b38/EasyProcess-0.2.5.tar.gz
    Collecting PyMySQL==0.9.3 (from -r requirement (line 3))
    Downloading https://files.pythonhosted.org/packages/ed/39/15045ae46f2a123019aa968dfcba0396c161c20f855f11dea6796bcaae95/PyMySQL-0.9.3-py2.py3-none-any.whl (47kB)
    Collecting pytz==2018.9 (from -r requirement (line 4))
    Downloading https://files.pythonhosted.org/packages/61/28/1d3920e4d1d50b19bc5d24398a7cd85cc7b9a75a490570d5a30c57622d34/pytz-2018.9-py2.py3-none-any.whl (510kB)
    Collecting selenium==3.141.0 (from -r requirement (line 5))
    Downloading https://files.pythonhosted.org/packages/80/d6/4294f0b4bce4de0abf13e17190289f9d0613b0a44e5dd6a7f5ca98459853/selenium-3.141.0-py2.py3-none-any.whl (904kB)
    Collecting urllib3==1.24.1 (from -r requirement (line 6))
    Downloading https://files.pythonhosted.org/packages/62/00/ee1d7de624db8ba7090d1226aebefab96a2c71cd5cfa7629d6ad3f61b79e/urllib3-1.24.1-py2.py3-none-any.whl (118kB)
    Installing collected packages: pytz, Django, EasyProcess, PyMySQL, urllib3, selenium
    Running setup.py install for EasyProcess: started
    Running setup.py install for EasyProcess: finished with status 'done'
    Successfully installed Django-2.1.7 EasyProcess-0.2.5 PyMySQL-0.9.3 pytz-2018.9 selenium-3.141.0 urllib3-1.24.1
    Removing intermediate container 36baec208714
    ---> e59b7105697a
    Successfully built e59b7105697a
    Successfully tagged spidermax:test_03

检查镜像大小,会发现这个镜像和原来相比我们将其缩小了近一倍

docker镜像瘦身行动
那这样已经达到目的了吗?不还不够!作为工程师,优化永不止步!

优化3

每一次的镜像打包,docker都会去云端请求,下载相关的软件,并安装,那安装之后的安装包会怎么处理呢?——还算存留在镜像中,也就是所谓的缓存,这些是相对应用层来说的,他们只会占用我们存储空间,并没有过多的用处,因此需要的就是将这个缓存的内容删除,实现起来也很简单的带上--no-cache即可,如图见apk操作详情
docker镜像瘦身行动
修改之后的脚本为

FROM alpine:3.7
RUN apk update --no-cache && apk upgrade --update-cache --available &&\
    apk add --no-cache python3 &&\
    apk add --no-cache python3-dev &&\
    python3 -m ensurepip &&\
    if [ ! -e /usr/bin/pip ]; then ln -s pip3 /usr/bin/pip ; fi &&\
    if [[ ! -e /usr/bin/python ]]; then ln -sf /usr/bin/python3 /usr/bin/python; fi &&\
    rm -rf /var/lib/apk/* &&\
    mkdir /code &&  mkdir /code/db
ADD . /code/
WORKDIR /code
RUN pip install -U pip && pip install -r requirement

这样的话,会导致一个问题,就是每次打包都需要一次又一次的重新下载资源,这个其实也是一个效率上的矛盾,一个互相博弈的过程,镜像大小变化
docker镜像瘦身行动
当然了,我们可删除apk的缓存当然可以删除pip的缓存啦,我们可以通过直接删除pip的安装包的形式来减小文件的大小,代码如下:

FROM alpine:3.7
RUN apk update && apk upgrade --update-cache --available &&\
    apk add python3 &&\
    apk add python3-dev &&\
    python3 -m ensurepip &&\
    rm -r /usr/lib/python*/ensurepip &&\
    if [ ! -e /usr/bin/pip ]; then ln -s pip3 /usr/bin/pip ; fi &&\
    if [[ ! -e /usr/bin/python ]]; then ln -sf /usr/bin/python3 /usr/bin/python; fi &&\
    rm -rf /var/lib/apk/* &&\
    mkdir /code && mkdir /code/db
ADD . /code/
WORKDIR /code
RUN pip install -U pip && pip install -r requirement && rm -rf ~/.cache/pip

镜像的变化如下:
docker镜像瘦身行动
通过我们的一次次优化之后,我们将一个高达203M的镜像进行了优化,缩小了将近一倍,在这一次次的编译过程中,也体会到了打包速度一次比一次的提高,这样才是优化思想的应用,一次又一次的超越自己

修改之后

最后给这个dockerfile标记维护者的信息,表示说这个镜像是我提供的,如果有问题了需要帮忙解决可以联系我,也表示我对这个镜像拥有著作权,虽然我已经将这个镜像开源了

FROM alpine:3.7
MAINTAINER yerikyu "yerik_shu@139.com"
RUN apk update && apk upgrade --update-cache --available &&\
    apk add python3 &&\
    apk add python3-dev &&\
    python3 -m ensurepip &&\
    rm -r /usr/lib/python*/ensurepip &&\
    if [ ! -e /usr/bin/pip ]; then ln -s pip3 /usr/bin/pip ; fi &&\
    if [[ ! -e /usr/bin/python ]]; then ln -sf /usr/bin/python3 /usr/bin/python; fi &&\
    rm -rf /var/lib/apk/* &&\
    mkdir /code && mkdir /code/db
ADD . /code/
WORKDIR /code
RUN pip install -U pip && pip install -r requirement && rm -rf ~/.cache/pip

打包结果如下:
docker镜像瘦身行动

总结

  • 编写.dockerignore文件(这个用来忽略某些不需要的缓存文件类似gitignore)
  • 容器只运行单个应用(微服务的一个体现)
  • 将多个RUN指令合并为一个
  • 基础镜像的标签不要用latest
  • 每个RUN指令后删除多余文件
  • 选择合适的基础镜像(alpine版本最好)
  • 设置WORKDIR和CMD
  • 使用ENTRYPOINT (可选)
  • 在entrypoint脚本中使用exec(本次实验未体现)
  • COPY与ADD优先使用前者
  • 合理调整COPY与RUN的顺序
  • 设置默认的环境变量,映射端口和数据卷(这个我是在docker-compose进行设置的,后面有机会再讨论)

    参考资料

    docker镜像层:http://blog.daocloud.io/principle-of-docker-image/
    alpine官方资料:https://alpinelinux.org/