环境18.04,nvidia410,cuda9.0,cudnn7.0
1、paddlepaddle官网
http://paddlepaddle.org/paddle
在页面中间有一个块度安装,在这里选择本机的系统,然后会推荐符合条件的最新安装包
当前paddlepaddle的最新版本是1.4.1,
https://pypi.org/project/paddlepaddle-gpu/#history
从上面这个网站可以看到1.4.1的版本后面会有一个后缀post87、post97等,
这里需要注意一下,默认安装书post97,也就是对应cuda9,cudnn7,安装命令
pip3 install paddlepaddle-gpu
环境是cuda8的时候需要根据cudnn选择版本,如下定义版本
pip3 install paddlepaddle-gpu==1.4.1.post87
以上是pip安装,最为方便,此外还有docker,直接编译等,推荐一个教程,很详细
我在安装的时候直接在虚拟环境中使用pip安装成功了
2、bug搜索以及提交
bug提交在github的paddle项目的issue上面
https://github.com/PaddlePaddle/Paddle
其他分支提交的bug比较少,解决也比较少。
最后github虽然可以国内登陆,但是同一个问题,使用百度搜索如果可以搜到5条相关的github上的回答,那么使用google就可以搜到10条,要找到类似问题以及相关解决相对要容易一些。
3、代码
paddlepaddle官网提供了API文档,
在文档中提供了一下简单的深度学习的例子以及讲解,讲解里面有分段的代码功能讲解
同时,paddlepaddle的github里面也有这些例子的官方代码
https://github.com/PaddlePaddle/book
1到9都可以单独执行
然后,上面那个详细讲解安装教程的博主也提供了他的学习过程的代码,
https://github.com/yeyupiaoling/LearnPaddle2
特别是博主提供了自定义数据的处理过程
上面这些都是基础算法,基本上使用cpu或者单gpu运行就可以了。除了这些基础方法以外,官方还提供了一些复杂模型的代码。
https://github.com/PaddlePaddle/models
4、运行
我的系统环境是联想T470笔记本,单GPU,2G,
在运行的时候,如果选择cpu,我的代码是直接开始输出epoch的
如果使用gpu运行,会在开头输出系统情况
W0617 17:32:52.020673 2926 device_context.cc:261] Please NOTE: device: 0, CUDA Capability: 50, Driver API Version: 10.1, Runtime API Version: 9.0
W0617 17:32:52.020849 2926 dynamic_loader.cc:107] Can not find library: libcudnn.so. Please try to add the lib path to LD_LIBRARY_PATH.
如果不是cudnn7.3,可能还会有
W0617 17:32:52.020872 2926 dynamic_loader.cc:165] Failed to find dynamic library: libcudnn.so ( libcudnn.so: cannot open shared object file: No such file or directory )
不过这些就是输出,不会影响正常运行,如果有报错,那就从下文开始找报错内容。
File "/home/zz/program/MNIST-paddle/MNIST-test-paddle.py", line 364, in <module>
test()
File "/home/zz/program/MNIST-paddle/MNIST-test-paddle.py", line 281, in test
exe.run(fluid.default_startup_program())
File "/home/zz/env_python3/lib/python3.6/site-packages/paddle/fluid/executor.py", line 565, in run
use_program_cache=use_program_cache)
File "/home/zz/env_python3/lib/python3.6/site-packages/paddle/fluid/executor.py", line 642, in _run
exe.run(program.desc, scope, 0, True, True, fetch_var_name)
paddle.fluid.core.EnforceNotMet: Invoke operator fill_constant error.
Python Callstacks:
File "/home/zz/env_python3/lib/python3.6/site-packages/paddle/fluid/framework.py", line 1725, in _prepend_op
attrs=kwargs.get("attrs", None))
File "/home/zz/env_python3/lib/python3.6/site-packages/paddle/fluid/initializer.py", line 167, in __call__
stop_gradient=True)
File "/home/zz/env_python3/lib/python3.6/site-packages/paddle/fluid/framework.py", line 1517, in create_var
kwargs['initializer'](var, self)
File "/home/zz/env_python3/lib/python3.6/site-packages/paddle/fluid/layer_helper_base.py", line 382, in set_variable_initializer
initializer=initializer)
File "/home/zz/env_python3/lib/python3.6/site-packages/paddle/fluid/layers/tensor.py", line 152, in create_global_var
value=float(value), force_cpu=force_cpu))
File "/home/zz/env_python3/lib/python3.6/site-packages/paddle/fluid/optimizer.py", line 136, in _create_global_learning_rate
persistable=True)
File "/home/zz/env_python3/lib/python3.6/site-packages/paddle/fluid/optimizer.py", line 275, in _create_optimization_pass
self._create_global_learning_rate()
File "/home/zz/env_python3/lib/python3.6/site-packages/paddle/fluid/optimizer.py", line 441, in apply_gradients
optimize_ops = self._create_optimization_pass(params_grads)
File "/home/zz/env_python3/lib/python3.6/site-packages/paddle/fluid/optimizer.py", line 469, in apply_optimize
optimize_ops = self.apply_gradients(params_grads)
File "/home/zz/env_python3/lib/python3.6/site-packages/paddle/fluid/optimizer.py", line 500, in minimize
loss, startup_program=startup_program, params_grads=params_grads)
File "/home/zz/program/MNIST-paddle/MNIST-test-paddle.py", line 253, in test
optimizer.minimize(avg_loss)
File "/home/zz/program/MNIST-paddle/MNIST-test-paddle.py", line 364, in <module>
test()
C++ Callstacks:
Enforce failed. Expected allocating <= available, but received allocating:1837034932 > available:1395654400.
Insufficient GPU memory to allocation. at [/paddle/paddle/fluid/platform/gpu_info.cc:262]
PaddlePaddle Call Stacks:
我在在笔记本上运行该程序的时候报以上错误,在有运行一些代码的4核台机上运行也报以上错误
exe.run(program.desc, scope, 0, True, True, fetch_var_name)
paddle.fluid.core.EnforceNotMet: Invoke operator fill_constant error.
Insufficient GPU memory to allocation. at [/paddle/paddle/fluid/platform/gpu_info.cc:262
调用运算符填充\常量错误。分配的GPU内存不足。
最后在paddlepaddle的issue里面找到了相同的问题,
https://github.com/PaddlePaddle/Paddle/issues/18173
以下是自问自答的提问
https://github.com/PaddlePaddle/Paddle/issues/18173
5、多gpu
按照教程中的使用我在代码中加入一下内容
compiled_prog = fluid.compiler.CompiledProgram(
fluid.default_main_program()).with_data_parallel(
loss_name=avg_loss.name)
想要直接复制到多gpu,但是运行的时候再次报错
W0618 19:40:39.706670 10145 device_context.cc:261] Please NOTE: device: 1, CUDA Capability: 61, Driver API Version: 10.0, Runtime API Version: 9.0
W0618 19:40:39.710006 10145 device_context.cc:269] device: 1, cuDNN Version: 7.0.
W0618 19:40:41.227665 10145 graph.h:204] WARN: After a series of passes, the current graph can be quite different from OriginProgram. So, please avoid using the `OriginProgram()` method!
2019-06-18 19:40:41,227-WARNING:
You can try our memory optimize feature to save your memory usage:
# create a build_strategy variable to set memory optimize option
build_strategy = compiler.BuildStrategy()
build_strategy.enable_inplace = True
build_strategy.memory_optimize = True
# pass the build_strategy to with_data_parallel API
compiled_prog = compiler.CompiledProgram(main).with_data_parallel(
loss_name=loss.name, build_strategy=build_strategy)
!!! Memory optimize is our experimental feature !!!
some variables may be removed/reused internal to save memory usage,
in order to fetch the right value of the fetch_list, please set the
persistable property to true for each variable in fetch_list
# Sample
conv1 = fluid.layers.conv2d(data, 4, 5, 1, act=None)
# if you need to fetch conv1, then:
conv1.persistable = True
I0618 19:40:46.276876 10145 build_strategy.cc:285] SeqOnlyAllReduceOps:0, num_trainers:1
Traceback (most recent call last):
File "/home/cj1/zz/book/02.recognize_digits/train.py", line 267, in <module>
main(use_cuda=use_cuda, nn_type=predict)
File "/home/cj1/zz/book/02.recognize_digits/train.py", line 249, in main
params_filename=params_filename)
File "/home/cj1/zz/book/02.recognize_digits/train.py", line 165, in train
fetch_list=[avg_loss, acc])
File "/home/cj1/env-python3/lib/python3.6/site-packages/paddle/fluid/executor.py", line 580, in run
return_numpy=return_numpy)
File "/home/cj1/env-python3/lib/python3.6/site-packages/paddle/fluid/executor.py", line 446, in _run_parallel
exe.run(fetch_var_names, fetch_var_name)
paddle.fluid.core.EnforceNotMet: Invoke operator mul error.
Python Callstacks:
File "/home/cj1/env-python3/lib/python3.6/site-packages/paddle/fluid/framework.py", line 1654, in append_op
attrs=kwargs.get("attrs", None))
File "/home/cj1/env-python3/lib/python3.6/site-packages/paddle/fluid/layer_helper.py", line 43, in append_op
return self.main_program.current_block().append_op(*args, **kwargs)
File "/home/cj1/env-python3/lib/python3.6/site-packages/paddle/fluid/layers/nn.py", line 323, in fc
"y_num_col_dims": 1})
File "/home/cj1/zz/book/02.recognize_digits/train.py", line 43, in loss_net
prediction = fluid.layers.fc(input=hidden, size=10, act='softmax')
File "/home/cj1/zz/book/02.recognize_digits/train.py", line 79, in convolutional_neural_network
return loss_net(conv_pool_2, label)
File "/home/cj1/zz/book/02.recognize_digits/train.py", line 124, in train
prediction, avg_loss, acc = net_conf(img, label)
File "/home/cj1/zz/book/02.recognize_digits/train.py", line 249, in main
params_filename=params_filename)
File "/home/cj1/zz/book/02.recognize_digits/train.py", line 267, in <module>
main(use_cuda=use_cuda, nn_type=predict)
C++ Callstacks:
The places of matrices must be same at [/paddle/paddle/fluid/operators/math/blas_impl.h:392]
PaddlePaddle Call Stacks:
0 0x7f2ff70bed00p void paddle::platform::EnforceNotMet::Init<char const*>(char const*, char const*, int) + 352
1 0x7f2ff70bf079p paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int) + 137
2 0x7f2ff77a48f4p void paddle::operators::math::Blas<paddle::platform::CUDADeviceContext>::MatMul<float>(paddle::framework::Tensor const&, bool, paddle::framework::Tensor const&, bool, float, paddle::framework::Tensor*, float) const + 388
3 0x7f2ff77a4ef6p paddle::operators::MulKernel<paddle::platform::CUDADeviceContext, float>::Compute(paddle::framework::ExecutionContext const&) const + 662
4 0x7f2ff77a50e3p std::_Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CUDAPlace, false, 0ul, paddle::operators::MulKernel<paddle::platform::CUDADeviceContext, float>, paddle::operators::MulKernel<paddle::platform::CUDADeviceContext, double>, paddle::operators::MulKernel<paddle::platform::CUDADeviceContext, paddle::platform::float16> >::operator()(char const*, char const*, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::_M_invoke(std::_Any_data const&, paddle::framework::ExecutionContext const&) + 35
5 0x7f2ff8d4e376p paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&, paddle::framework::RuntimeContext*) const + 662
6 0x7f2ff8d4eae4p paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) const + 292
7 0x7f2ff8d4c40cp paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) + 332
8 0x7f2ff8b5acaap paddle::framework::details::ComputationOpHandle::RunImpl() + 250
9 0x7f2ff8b4dd60p paddle::framework::details::OpHandleBase::Run(bool) + 160
10 0x7f2ff8ab542dp
11 0x7f2ff7e28a73p std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, void> >::_M_invoke(std::_Any_data const&) + 35
12 0x7f2ff718b567p std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&) + 39
13 0x7f30579da827p
14 0x7f2ff8ab4fc2p
15 0x7f2ff718c8a4p ThreadPool::ThreadPool(unsigned long)::{lambda()#1}::operator()() const + 404
16 0x7f30514799e0p
17 0x7f30579d26dbp
18 0x7f3057d0b88fp clone + 63 0x7fQ»
使用官方model,里面的多gpu复制函数是fluid.ParallelExecutor,可以正常运行