本文将介绍如何使用Python控制和监测Docker容器。
使用Python来控制你的Docker容器!本文是一个入门指南。它基于对处理表格数据的三个流行软件包进行的测试。
【Git Repo】: https://github.com/martinkarlssonio/polars-pandas-spark
1. 代码演示
下图展示了代码的整体思路。main.py
代码用于构建镜像并启动、控制和监视12个Docker容器。
2. 控制你的镜像和容器
这个解决方案的核心是Docker Python软件包。在requirements.txt
中(使用pip
安装),本文指定了docker
软件包。
docker==6.0.1
在本文的代码main.py
中,我们导入了docker
并建立了一个docker
客户端。
import docker
dockerClient = docker.DockerClient()
3. 获取容器列表
获取所有容器的列表。这里的代码还会筛选出使用给定镜像名称构建的容器。
def getContainers():
containersReturn = []
containers = dockerClient.containers.list(all=True)
for container in containers:
if imageName in str(container.image):
containersReturn.append(container)
return containersReturn
4. 删除容器
删除容器列表(在开始新的测试运行之前,删除容器总是一个好习惯)。
def removeContainers():
containers = getContainers()
for container in containers:
if imageName in str(container.image):
print("################################################### Deleting old container {}".format(container.name))
try:
container.stop()
container.remove()
except Exception as e:
print("################################################### Error deleting old container {}".format(container.name))
print(e)
5. 构建镜像并启动容器
从现有镜像启动一个新容器,或者如果该镜像不存在,则创建该镜像。启动容器时,我们还会传入一组环境变量。
def runContainer(testType,dataframeN):
images = dockerClient.images.list(all=True)
if imageName in ' '.join(map(str, images)):
print("################################################### Image exist, starting container..")
dockerClient.containers.run(imageName+":latest", environment = {"TEST_TYPE":testType,"DATAFRAME_N":dataframeN,"CALC_N":calcN})
else:
print("################################################### Image doesn't exist, need to create it!")
dockerClient.images.build(path = "./", tag = imageName)
dockerClient.containers.run(imageName+":latest", environment = {"TEST_TYPE":testType,"DATAFRAME_N":dataframeN,"CALC_N":calcN})
用于构建镜像的Dockerfile如下所示。
# 以Ubuntu为基础
FROM ubuntu:focal
## UPDATE
RUN apt-get update -y
RUN apt-get install nano
RUN apt-get update -y
## PYTHON
RUN apt-get install -y python3-pip python3-dev
RUN pip3 install --upgrade pip
COPY requirements-container.txt /requirements.txt
RUN pip3 install -r requirements.txt
## JAVA
RUN DEBIAN_FRONTEND=noninteractive TZ=Etc/UTC apt-get -y install tzdata
RUN apt install -y openjdk-8-jdk
## 复制文件
COPY test.py /test.py
CMD python3 /test.py
6. 监控容器指标
这里有一个使用Dockers的stats
命令监控CPU、内存和执行时间的函数。
CPU使用率的计算可能需要一些额外解释。它将上次读取的CPU使用率与当前的CPU使用率进行比较,然后将其与系统CPU使用率乘以内核数量进行比较。
以下是用文字描述的逻辑:
cpuDelta = Total CPU Usage (now) - Total Usage (last read)
systemDelta = System CPU Usage (now) - System CPU Usage (last read)
cpuPercent = (cpuDelta/systemDelta) * CPU Cores * 100
以下是记录容器的CPU、内存和执行时间的完整函数。
def dockerLog(event,testType,dataframeN,cardinality):
cpuLog = []
memLog = []
print("################################################### dockerLog started")
startFlag = True
while True:
try:
containers = getContainers()
for container in containers:
status = container.stats(decode=None, stream = False)
try:
# 计算读数之间容器CPU使用率的变化
# 考虑CPU的内核数量
cpuDelta = status["cpu_stats"]["cpu_usage"]["total_usage"] - status["precpu_stats"]["cpu_usage"]["total_usage"]
systemDelta = status["cpu_stats"]["system_cpu_usage"] - status["precpu_stats"]["system_cpu_usage"]
#print("systemDelta: "+str(systemDelta)+" cpuDelta: "+str(cpuDelta))
cpuPercent = (cpuDelta / systemDelta) * (status["cpu_stats"]["online_cpus"]) * 100
cpuPercent = int(cpuPercent)
#print("cpuPercent: "+str(cpuPercent)+"%")
# 获取容器的内存消耗
mem = status["memory_stats"]["usage"]
mem = int(mem/1000000)
if startFlag == True and cpuPercent == 0: # 在测试代码执行之前,不记录容器启动期间 CPU 的增加。
startFlag = False
startEpoch = int(time.time()*1000)
print("Startflag set to False - let's go!")
if startFlag == False:
cpuLog.append(cpuPercent)
memLog.append(mem)
except Exception as e:
#print("Error: "+str(e))
#print(json.dumps(status["memory_stats"]))
#status = container.stats(decode=None, stream = False)
#print(json.dumps(status))
#print(json.dumps(status, indent=4, sort_keys=True))
break
except Exception as e:
print("Error: "+str(e))
pass
if event.is_set():
break
# 将日志写入文件
endEpoch = int(time.time()*1000)
with open("output/"+str(dataframeN)+"_"+str(cardinality)+"_"+testType+"_"+containerStatsName, "w") as f:
json.dump({"mem":memLog,"cpu":cpuLog,"timeSpent":float((endEpoch-startEpoch)/1000)}, f)
print("################################################### dockerLog ended")
7. 在单独的线程中执行日志记录
本文希望在一个单独的线程中运行dockerLog
函数,因此本文使用"threading"
包来启动该函数。这意味着不会阻塞代码的执行,而且可以并行地启动容器。
from threading import Thread
from threading import Event
event = Event()
dockerLogThread = Thread(target=dockerLog, args=(event,testType,dataframeN,))
dockerLogThread.start()
接下来,运行本文的容器,当该函数完成(容器执行完毕)时,我们设置事件并等待日志记录线程将统计信息写入.json
文件,从而完成收尾工作。
runContainer(testType,dataframeN)
event.set()
dockerLogThread.join()
8. 指标可视化
接下来使用"matplotlib"
软件包对docker
日志进行可视化!
## 可视化内存和CPU
def visMemCpu():
from os import listdir
from os.path import isfile, join
outputFiles = [f for f in listdir("output/") if isfile(join("output/", f))]
combDict = {}
for file in outputFiles:
if "containerStats.json" in file:
with open("output/"+file, "r") as fp:
data = json.load(fp)
fileSplit = file.split("_")
rows = str(fileSplit[0])
cardinality = str(fileSplit[1])
lib = fileSplit[2]
try:
combDict[rows][lib][cardinality] = {}
combDict[rows][lib][cardinality] ["mem"] = data["mem"]
combDict[rows][lib][cardinality] ["cpu"] = data["cpu"]
combDict[rows][lib][cardinality] ["timeSpent"] = int(data["timeSpent"])
except:
try:
combDict[rows][lib] = {}
except:
combDict[rows] = {}
combDict[rows][lib] = {}
combDict[rows][lib][cardinality] = {}
combDict[rows][lib][cardinality] ["mem"] = data["mem"]
combDict[rows][lib][cardinality] ["cpu"] = data["cpu"]
combDict[rows][lib][cardinality] ["timeSpent"] = int(data["timeSpent"])
with open("output/all.json", "w") as f:
json.dump(combDict, f)
for cardinality in cardinalities:
cardinality = str(cardinality)
for rows in combDict.keys():
# 为每个库存储颜色的字典
colorDict = {}
colorDict["pandas"] = "tab:blue"
colorDict["polars"] = "tab:orange"
colorDict["spark"] = "tab:green"
textString = " | "
plt.clf()
for lib in combDict[rows].keys():
plt.plot(combDict[rows][lib][cardinality]["mem"], label=lib, color=colorDict[lib])
textString += lib + " " + str(combDict[rows][lib][cardinality]["timeSpent"]) + "s | "
plt.xticks([], [])
plt.ylabel("Memory MB")
plt.title(textString)
plt.xlabel("Cardinality : "+ str(cardinality) + " Dataframe rows : " + rows)
plt.legend(framealpha=1, frameon=True)
plt.savefig("output/"+"{}_{}_mem.png".format(str(rows),str(cardinality)), dpi=300)
plt.clf()
for lib in combDict[rows].keys():
plt.plot(combDict[rows][lib][cardinality]["cpu"], label=lib, color=colorDict[lib])
plt.xticks([], [])
plt.ylabel("CPU %")
plt.xlabel("Cardinality : "+ str(cardinality) + " Dataframe rows : " + rows)
plt.title(textString)
plt.legend(framealpha=1, frameon=True)
plt.savefig("output/"+"{}_{}_cpu.png".format(str(rows),str(cardinality)), dpi=300)
plt.yticks([], [])
2500行
25000行
250000行
2500000行
9. 结论
综上所述,现在可以使用Python来控制Docker镜像和容器,还可以使用指标记录器。