目录
- 前言
- 1. 安装 cuda
- 2. GPU版本TensorFlow
- 3. GPU版本Pytorch
前言
本篇给需要配置 GPU 版本 TensorFlow 和 Pytorch 的朋友。
先前写过一篇配置环境的流程,但我自己还是踩了很多版本的坑,所以!这次我把 cuda 文件直接放到百度云给你们,不需要自己到官网挑选啦!
先快速把环境搭起来再说!我们没必要花太多时间卡在这种地方,配环境是真的很气人,欲哭无泪。
这里没有对很多地方进行原理性解释,有些步骤我一笔带过,小白可以到这篇《深度学习TensorFlow开发环境搭建教程》里边查看具体做法,或者自行百度。
1. 安装 cuda
先安装Anaconda,然后安装 CUDA 10.1,直接到百度云下载。(就装这个版本算了,不要皮,求稳)
链接:https://pan.baidu.com/s/19Sr66HybqudCbJ6Ela750Q 提取码:mzes
双击安装cuda,默认往下走,别多看,看也看不懂。然后将 cudnn 的文件放到对应位置,配置环境变量,对照下图。
2. GPU版本TensorFlow
- 创建环境:
conda create --name tf-gpu python=3.7.1
- 激活环境:
activate tf-gpu
- 下载GPU版本的TensorFlow:
pip install tensorflow-gpu==2.3.0 -i http://pypi.douban.com/simple --trusted-host pypi.douban.com
- 输入:
python
,然后开始输入代码检查 gpu 是否可用
# 测试gpu是否可用
import tensorflow as tf
tf.test.is_gpu_available()
# 输出
True
到这一步为止可能会报错,如果没有报错则跳过下面这段。
# 一步之遥,有些dll读不到,但是 cuda 路径下文件一个不差
>>> tf.test.is_gpu_available()
...
2021-03-16 19:49:25.587973: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library nvcuda.dll
2021-03-16 19:49:25.638884: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1060 3GB computeCapability: 6.1
coreClock: 1.7335GHz coreCount: 9 deviceMemorySize: 3.00GiB deviceMemoryBandwidth: 178.99GiB/s
2021-03-16 19:49:25.654811: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found
2021-03-16 19:49:25.663812: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'cublas64_10.dll'; dlerror: cublas64_10.dll not found
2021-03-16 19:49:25.674031: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'cufft64_10.dll'; dlerror: cufft64_10.dll not found
2021-03-16 19:49:25.683725: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'curand64_10.dll'; dlerror: curand64_10.dll not found
2021-03-16 19:49:25.693968: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'cusolver64_10.dll'; dlerror: cusolver64_10.dll not found
2021-03-16 19:49:25.703211: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'cusparse64_10.dll'; dlerror: cusparse64_10.dll not found
2021-03-16 19:49:25.791687: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2021-03-16 19:49:25.799832: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1753] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2021-03-16 19:49:25.915500: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-03-16 19:49:25.923543: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263] 0
2021-03-16 19:49:25.927894: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0: N
2021-03-16 19:49:25.934997: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x2544290c000 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-03-16 19:49:25.944446: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce GTX 1060 3GB, Compute Capability 6.1
False
先检查环境变量,纠正后还不行再往下看。
将报错中需要的 dll 放到系统路径就能拿到了,路径如下截图所示C:/Windows/System
。
# 继续报错,而后重启 Prompt 就可以了!
2021-03-16 20:26:27.550551: E tensorflow/stream_executor/cuda/cuda_driver.cc:1398] failed to query total available memory: CUDA_ERROR_UNKNOWN: unknown error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\Luo\AppData\Local\conda\conda\envs\lww-tf-gpu\lib\site-packages\tensorflow\python\util\deprecation.py", line 324, in new_func
return func(*args, **kwargs)
File "C:\Users\Luo\AppData\Local\conda\conda\envs\lww-tf-gpu\lib\site-packages\tensorflow\python\framework\test_util.py", line 1563, in is_gpu_available
for local_device in device_lib.list_local_devices():
File "C:\Users\Luo\AppData\Local\conda\conda\envs\lww-tf-gpu\lib\site-packages\tensorflow\python\client\device_lib.py", line 43, in list_local_devices
_convert(s) for s in _pywrap_device_lib.list_devices(serialized_config)
RuntimeError: cudaGetDevice() failed. Status: cudaGetErrorString symbol not found.
3. GPU版本Pytorch
- 创建环境:
conda create --name torch-gpu python=3.7.1
- 激活环境:
activate torch-gpu
- 下载GPU版本的 Pytorch:
conda install pytorch cudatoolkit=10.1 -i http://mirrors.aliyun.com/pypi/simple --trusted-host mirrors.aliyun.com
千万不要去官网下载,700M要下载一整天,还会断网下载失败……
如果能够安装成功,输入:python
,检查 gpu 是否可用
# 测试语句
import torch
torch.cuda.is_available()
# 输出
True
输出True就成功啦!如果不幸是false,唉,就麻烦了,百度去吧,可能NVIDA驱动版本问题。
(我这里保证 cuda, cudnn, torch 的版本一定是匹配的,因为跟我本人用的是同一套,祝好运!)