docker构建本地仓库
Every day I build containers locally. Many of us do — Docker is a crucial tool in the developer arsenal these days. However, when you build and maintain 1–2 containers — that’s great, but when you maintain an app built from 10 containers? 15? 20? Each local build, even with cache enabled, will take a significant amount of time. Multiply that by a number of daily rebuilds — and you’ll see hours of your precious time leaking away!
我每天都在本地构建容器。 我们许多人都这样做了-如今,Docker已成为开发人员手中的重要工具。 但是,当您构建和维护1-2个容器时-很好,但是当您维护由10个容器构建的应用程序时? 15吗20吗即使启用了缓存,每个本地构建也会花费大量时间。 将其乘以每天的大量重建工作,您会发现数小时的宝贵时间流失了!
Sure, there’s CI/CD tools capable of parallel building, but what about local builds? I’ve spent some time googling for a solution to this small problem, and, to my surprise, wasn’t able to find one.
当然,有能够并行构建的CI / CD工具,但是本地构建又如何呢? 我花了一些时间来搜寻这个小问题的解决方案,但令我惊讶的是,找不到任何解决方案。
So, I’ve thought I could make a useful tool for myself, plus a blog post for others :)
所以,我认为我可以为自己做一个有用的工具,再为其他人写一篇博客文章:)
Step 0. Let’s define the problem and requirements.
步骤0。让我们定义问题和要求。
I need an app, I’ll call it Krane, that is able to build multiple Docker images in parallel. There are a few special requirements for this app:
我需要一个名为Krane的应用程序,该应用程序可以并行构建多个Docker映像。 此应用程序有一些特殊要求:
- Build configuration must be persistent. I’d hate using CLI to describe 10–20–30 images. So, JSON or YAML build configuration would be good.
- Krane should take care of internal dependencies: if image2 depends on image1, it will guarantee that image2 will be built only after image1 is built.
- Krane should be aware of build outcome: if one of the images fails the build — all remaining images should fail too.
- Krane should be 100% compatible with Minikube, which I use for local development. However, this requirement will be satisfied automatically if I just use Docker executable.
Since this app needs concurrent execution, Python wouldn’t be the first tool to use for me. Golang will fit way better, I believe.
由于此应用需要并发执行,因此Python并不是我第一个使用的工具。 我相信Golang会更好。
Step 1. Build configuration in a file.
步骤1.在文件中构建配置。
My new tool should be able to read build configuration from a file, passed as an argument into the app. Something like app -f configFile.yml.
我的新工具应该能够从文件中读取构建配置,并将其作为参数传递给应用程序。 类似于app -f configFile.yml 。
Golang provides JSON support out of the box, without any external dependencies. However, since Docker/Kubernetes environment is YAML-centric, it would make sense to use YAML for build configuration as well.
Golang开箱即用地提供JSON支持,没有任何外部依赖关系。 但是,由于Docker / Kubernetes环境是以YAML为中心的,因此也可以使用YAML进行构建配置。
Golang has no issues with YAML as well, there are multiple libraries providing YAML support. I prefer this one.
Golang的YAML也没有问题,有多个提供YAML支持的库。 我喜欢这个。
type Image struct {
ContainerName string `yaml:"containerName"`
Dockerpath string `yaml:"dockerpath"`
ForbidCache bool `yaml:"noCache"`
}
type BuildConfiguration struct {
Images []Image `yaml:"build"`
Threads int `yaml:"threads"`
}
/*
This function provides YAML deserialization of given byte slice
*/
func ParseBytes(conf []byte) (bc BuildConfiguration, err error) {
err = yaml.Unmarshal(conf, &bc)
if err == nil {
SortImages(&bc)
}
return
}
Now, since I’m able to read random YAML files, time to make sure I’m able to pass the configuration file as a CLI argument.
现在,由于我能够读取随机的YAML文件,因此有时间确保我能够将配置文件作为CLI参数传递。
Golang has a builtin package for that as well: flag. Defining all the needed input arguments is really trivial:
Golang也有一个内置的包:flag。 定义所有必需的输入参数确实很简单:
func main() {
var configFile string
var dryRun bool
// parse configuration flags from command line
flag.StringVar(&configFile, "f", "", "Path to build configuration file")
flag.BoolVar(&dryRun, "d", false, "Don't run docker, only build and print sorted map")
flag.Parse()
// Exit if something is off
_ = ValidatePath(configFile, true)
// get configuration
buildConfiguration, err := ParseFile(configFile)
if err != nil {
fmt.Printf("%v\n", err.Error())
os.Exit(1)
}
Now, something like krane -f config.yml is definitely going to work :)
现在,类似krane -f config.yml东西肯定会起作用:)
Step 2. Find dependencies within the build task.
步骤2.在构建任务中找到依赖项。
When you’re building a bunch of independent containers — dependencies tracking is not a problem — by definition. But if some images depend on other images within the task — it means that dependencies must be built before images that depend on them.
根据定义,当您构建一堆独立的容器时-依赖关系跟踪不是问题。 但是,如果某些图像依赖于任务中的其他图像,则意味着必须先建立依赖关系,然后再依赖它们。
In other words: I have to search for internal dependencies before building anything. Thanks to Docker developers, the Dockerfile format is pretty straightforward: there’s a dedicated <FROM> keyword, so old good regular expressions will do the job.
换句话说:在构建任何东西之前,我必须搜索内部依赖项。 感谢Docker开发人员,Dockerfile格式非常简单:有一个专用的< FROM >关键字,因此可以使用旧的良好正则表达式来完成这项工作。
/*
This function scans Dockerfile, given as string with commands, and extracts image names it depends
*/
func findDockerDependencies(dockerfile string) (deps []string, err error) {
re := regexp.MustCompile(`(?im)FROM (.*?[|$| |\n])`)
substrings := re.FindAllStringSubmatch(dockerfile, -1)
for _, v := range substrings {
for i, dep := range v {
// skip first match, since it's full match
if i == 0 {
continue
}
dep = strings.TrimSpace(dep)
if !strings.Contains(dep, ":") {
// if no tag given, assume we're on the latest tag then
dep += ":latest"
}
deps = append(deps, dep)
}
}
if len(deps) == 0 {
err = fmt.Errorf("no docker dependencies found. wrong Dockerfile was passed in?")
}
return
}
When applied to every Dockerfile in the job, it’ll get me a full map of dependencies, where the key is a container name, and the value is a slice of containers it depends on.
当将其应用于作业中的每个Dockerfile时,它将获得完整的依赖关系图,其中的键是容器名称,值是其依赖的容器切片。
Step 3. Organizing the build process.
步骤3.组织构建过程。
The map of dependencies is good, but how can I use it to organize the build process? One of the simplest ways is to represent the build it as a sequence of sequences of independent build steps. Basically topological sort of the graph, when the outcome isn’t 1D sequence, but 2D instead, to allow parallelism.
依赖关系图很好,但是如何使用它来组织构建过程? 最简单的方法之一是将其构建表示为一系列独立构建步骤的序列。 图的基本拓扑结构,当结果不是1D序列而是2D时,允许并行。
It might sound tough, but it’s really trivial. Imagine the following algorithm:
听起来可能很难,但这确实是微不足道的。 想象一下以下算法:
- All independent containers are built first, in parallel. Let’s call it “Layer”.
- Containers that depend on the previous layer are built in parallel.
The last step is repeated until all containers are built.
重复最后一步,直到构建完所有容器。
With this approach, the executor will dispatch individual build jobs to separate goroutines on each Layer. Once all jobs dispatched, the executor will wait until all jobs are finished, before switching to the next layer
通过这种方法,执行者将分派单独的构建作业,以在每个图层上分配独立的goroutine。 一旦分配了所有作业,执行程序将等待直到所有作业完成,然后再切换到下一层
Step 4. Handling the outcome.
步骤4.处理结果。
The last requirement I have is the build state handling and transfer: if one of the jobs fails — it shouldn’t be silently swallowed. I must be aware of the problems as soon as they arise: there’s no sense waiting for the full build to finish if one of the jobs failed. So, early stopping would be a “really nice to have” feature.
我的最后一个要求是构建状态处理和传输:如果其中一项作业失败,则不应默默地吞下它。 我必须尽快意识到问题的存在:如果其中一项工作失败,就没有理由等待完整的构建完成。 因此,尽早停止将是一个“真的很高兴”的功能。
Luckily, Golang has channels for communications between goroutines, so each worker will get a channel for reading build jobs, and channel for reporting. The reporting channel will be used for tracking the outcome of each build.
幸运的是,Golang拥有goroutine之间的通信渠道,因此每个工作人员将获得一个读取构建作业的通道和一个报告通道。 报告渠道将用于跟踪每个构建的结果。
// storage for the reports
var failed []Report
var succeed []Report
// dispatch all jobs one by one
jobsCounter := 0
for i := 0; i < len(executableMap); i++ {
dispatched := 0
// each layer is an array of images
layer, _ := executableMap[i]
for _, image := range layer {
workers[jobsCounter%config.Threads] <- image
jobsCounter += 1
dispatched++
}
// now, when all jobs on this layer were dispatched - wait for them to finish
for i := 0; i < dispatched; i++ {
report := <-requeue
if !report.Success {
failed = append(failed, report)
} else {
succeed = append(succeed, report)
}
}
// do something better here?
if len(failed) > 0 {
return fmt.Errorf("At least %v out of %v jobs failed", len(failed), len(config.Images))
}
}
Final step. Comparing the apples.
最后一步。 比较苹果。
It’s time to see numbers. For the performance test, I’ve mastered pretty much realistic sample deployment: 4 containers building React apps (frontend part), 2 containers building Go apps (backend part), and an ML-deployment container (almost static one). I will compare build time twice: the first run with a no-cache option, and the second run without it.
现在该看看数字了。 对于性能测试,我已经掌握了许多实际的示例部署:4个构建React应用程序的容器(前端部分),2个构建Go应用程序的容器(后端部分)和ML部署容器(几乎是静态部署容器)。 我将两次比较构建时间:第一次运行时使用无缓存选项,第二次运行时不使用该选项。
no-cache sequential build time:
无缓存顺序构建时间:
real 7m15,544s
user 0m5,941s
sys 0m7,696s
no-cache parallel build time:
无缓存并行构建时间:
real 2m46,595s
user 0m5,800s
sys 0m8,451s
partially cached sequential build time:
部分缓存的顺序构建时间:
real 2m57,410s
user 0m6,304s
sys 0m7,320s
partially cached parallel build time:
部分缓存的并行构建时间:
real 0m17,323s
user 0m6,111s
sys 0m7,808s
So, the relative speedup is somewhere between x2.5 and x10, which is just great for me: my typical builds are partially cached. I’ll save lots of my time using this small tool.
因此,相对速度在x2.5到x10之间,这对我来说很棒:我的典型构建已部分缓存。 使用这个小工具,我会节省很多时间。
I hope you’ll find it useful too.
我希望您也会发现它有用。
Feel free to contact me if you have any questions :) As usual, the source code for this app is available on GitHub.
翻译自: https://medium.com/the-innovation/speeding-up-your-local-docker-builds-9b670a114c4c
docker构建本地仓库