在GPU上部署docker gpu环境搭建

转载

智能开发者 2023-12-18 14:19:18

文章标签 在GPU上部署docker tensorflow gpu 深度学习 pytorch 文章分类 Docker 云计算

前言

本篇给需要配置 GPU 版本 TensorFlow 和 Pytorch 的朋友。

先前写过一篇配置环境的流程，但我自己还是踩了很多版本的坑，所以！这次我把 cuda 文件直接放到百度云给你们，不需要自己到官网挑选啦！

先快速把环境搭起来再说！我们没必要花太多时间卡在这种地方，配环境是真的很气人，欲哭无泪。

这里没有对很多地方进行原理性解释，有些步骤我一笔带过，小白可以到这篇《深度学习TensorFlow开发环境搭建教程》里边查看具体做法，或者自行百度。

1. 安装 cuda

先安装Anaconda，然后安装 CUDA 10.1，直接到百度云下载。（就装这个版本算了，不要皮，求稳）

链接：https://pan.baidu.com/s/19Sr66HybqudCbJ6Ela750Q 提取码：mzes

双击安装cuda，默认往下走，别多看，看也看不懂。然后将 cudnn 的文件放到对应位置，配置环境变量，对照下图。

在GPU上部署docker gpu环境搭建_gpu

2. GPU版本TensorFlow

创建环境：conda create --name tf-gpu python=3.7.1
激活环境：activate tf-gpu
下载GPU版本的TensorFlow：pip install tensorflow-gpu==2.3.0 -i http://pypi.douban.com/simple --trusted-host pypi.douban.com
输入：python，然后开始输入代码检查 gpu 是否可用

# 测试gpu是否可用
import tensorflow as tf
tf.test.is_gpu_available()

# 输出
True

到这一步为止可能会报错，如果没有报错则跳过下面这段。

# 一步之遥，有些dll读不到，但是 cuda 路径下文件一个不差
>>> tf.test.is_gpu_available()
...
2021-03-16 19:49:25.587973: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library nvcuda.dll
2021-03-16 19:49:25.638884: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1060 3GB computeCapability: 6.1
coreClock: 1.7335GHz coreCount: 9 deviceMemorySize: 3.00GiB deviceMemoryBandwidth: 178.99GiB/s
2021-03-16 19:49:25.654811: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found
2021-03-16 19:49:25.663812: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'cublas64_10.dll'; dlerror: cublas64_10.dll not found
2021-03-16 19:49:25.674031: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'cufft64_10.dll'; dlerror: cufft64_10.dll not found
2021-03-16 19:49:25.683725: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'curand64_10.dll'; dlerror: curand64_10.dll not found
2021-03-16 19:49:25.693968: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'cusolver64_10.dll'; dlerror: cusolver64_10.dll not found
2021-03-16 19:49:25.703211: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'cusparse64_10.dll'; dlerror: cusparse64_10.dll not found
2021-03-16 19:49:25.791687: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2021-03-16 19:49:25.799832: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1753] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2021-03-16 19:49:25.915500: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-03-16 19:49:25.923543: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0
2021-03-16 19:49:25.927894: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N
2021-03-16 19:49:25.934997: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x2544290c000 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-03-16 19:49:25.944446: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GTX 1060 3GB, Compute Capability 6.1
False

先检查环境变量，纠正后还不行再往下看。

将报错中需要的 dll 放到系统路径就能拿到了，路径如下截图所示C:/Windows/System。

在GPU上部署docker gpu环境搭建_在GPU上部署docker_02

# 继续报错，而后重启 Prompt 就可以了！
2021-03-16 20:26:27.550551: E tensorflow/stream_executor/cuda/cuda_driver.cc:1398] failed to query total available memory: CUDA_ERROR_UNKNOWN: unknown error
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\Luo\AppData\Local\conda\conda\envs\lww-tf-gpu\lib\site-packages\tensorflow\python\util\deprecation.py", line 324, in new_func
    return func(*args, **kwargs)
  File "C:\Users\Luo\AppData\Local\conda\conda\envs\lww-tf-gpu\lib\site-packages\tensorflow\python\framework\test_util.py", line 1563, in is_gpu_available
    for local_device in device_lib.list_local_devices():
  File "C:\Users\Luo\AppData\Local\conda\conda\envs\lww-tf-gpu\lib\site-packages\tensorflow\python\client\device_lib.py", line 43, in list_local_devices
    _convert(s) for s in _pywrap_device_lib.list_devices(serialized_config)
RuntimeError: cudaGetDevice() failed. Status: cudaGetErrorString symbol not found.

3. GPU版本Pytorch

创建环境：conda create --name torch-gpu python=3.7.1
激活环境：activate torch-gpu
下载GPU版本的 Pytorch：
conda install pytorch cudatoolkit=10.1 -i http://mirrors.aliyun.com/pypi/simple --trusted-host mirrors.aliyun.com

千万不要去官网下载，700M要下载一整天，还会断网下载失败……

如果能够安装成功，输入：python，检查 gpu 是否可用

# 测试语句
import torch
torch.cuda.is_available()

# 输出
True

输出True就成功啦！如果不幸是false，唉，就麻烦了，百度去吧，可能NVIDA驱动版本问题。

（我这里保证 cuda, cudnn, torch 的版本一定是匹配的，因为跟我本人用的是同一套，祝好运！）

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：android 手机蓝牙芯片版本手机蓝牙的版本

下一篇：模拟生成身份证的java代码是什么模拟生份证号

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯