如果本文帮助到了你,欢迎[点赞、收藏、关注]哦~

【教程】创建NVIDIA Docker共享使用主机的GPU_GPU

这套是我跑完整理的。直接上干货,复制粘贴即可!

# 先安装toolkit
sudo apt-get update
sudo apt-get install -y ca-certificates curl gnupg lsb-release

sudo mkdir -p /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg

echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
docker --version

distribution=$(. /etc/os-release; echo $ID$VERSION_ID) && \
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg && \
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
sed 's#deb #deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] #' | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit

sudo mkdir -p /etc/docker
sudo tee /etc/docker/daemon.json <<EOF
{ "registry-mirrors": ["https://"] }
EOF
sudo systemctl daemon-reload
sudo systemctl restart docker

sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

# 然后启动容器。(这个地方记得先删除#注释内容)
docker run -itd \
           --gpus all  \  										# 挂载所有GPU
           --shm-size=128g \									# 设置共享内存大小
           # -v /dev/shm:/dev/shm \ 				        	# 共享宿主机的共享内存
           # --network host \       				        	# 分布式系统中推荐打开
           --name cu12_sxf \									# 容器命名
           -v /mnt/disk/:/mnt/disk/ \  	# 挂载目录
           -v /home/user/Desktop:/Desktop \  	# 挂载目录
           nvidia/cuda:12.1.0-base-ubuntu20.04

进入创建的容器:

docker exec -it cu12_sxf /bin/bash

保存容器为新镜像:

# container_id是"docker ps -a"显示的id
docker commit <container_id> cu12_sxf:latest

使用新镜像创建容器:

docker run -itd \
           --gpus all  \  								# 挂载所有GPU
           --shm-size=128g \							# 设置共享内存大小
           # -v /dev/shm:/dev/shm \ 					# 共享宿主机的共享内存
           # --network host \       					# 分布式系统中推荐打开
           --name cu12_sxf \							# 容器命名
           -v /mnt/disk:/mnt/disk\  	                # 挂载目录
           -v /home/user/Desktop:/Desktop \  	        # 挂载目录
           cu12_sxf:latest

保存镜像到文件:

# cu12_sxf"docker images"显示的名称
docker save -o cu12_sxf.tar cu12_sxf:latest

从文件加载镜像:

docker load -i cu12_sxf.tar

为容器中的用户设置密码:

# 进入容器后设置密码:
passwd

最终在容器内查询GPU信息效果:

【教程】创建NVIDIA Docker共享使用主机的GPU_运维_02


打标签:

docker tag cu12_sxf:latest <服务器IP>:5000/cu12_sxf:latest

推送镜像到私有仓库:

docker push <服务器IP>:5000/<image_name>:<tag>

从私有仓库拉取镜像:

docker pull <服务器IP>:5000/<image_name>:<tag>