PipeCNN是一种基于OpenCL的大规模卷积神经网络FPGA加速器。在FPGA界中,利用高级合成(HLS)工具来设计和实现FPGA上的定制电路的趋势越来越大。与基于RTL的设计方法相比,HLS工具通过将高级语言(如C/C++)中的算法自动合成为RTL/硬件,提供了更快的硬件开发周期。开放运算语言™ 是一种开放、新兴的跨平台并行编程语言,可用于GPU和FPGA开发。该项目的主要目标是在FPGA上提供一种通用的、但高效的基于OpenCL的CNN加速器设计。PipeCNN利用Pipelined CNN函数内核来提高推理计算的吞吐量。设计在性能和硬件资源方面都是可扩展的,因此可以部署在各种FPGA平台上。PipeCNN支持Intel OpenCL SDK和基于Xilinx Vitis的FPGA设计流程。
./run.exe conv.aocx
• VGG-16
• ResNet-50
目前,正在使用Intel的OpenCL SDK和Xilinx Vitis工具包来编译OpenCL/HLS代码,并在FPGA上实现生成的RTL。
• Intel OpenCL SDK Pro v20.1
• Xilinx Vitis 2020.1
Tested Boards
• Terasic's DE5a-net-ddr4 (Arria-10 GX1150 FPGA)
• Intel's Arria-10 Dev Kit (Arria-10 GX1150 FPGA)
• Xilinx's U50 Acceleration Card (VU35P FPGA)
• Xilinx's ZCU102 Dev Board (ZU9EG FPGA)
• Xilinx's ZC706 Dev Board (Zynq-7045 FPGA)
PipeCNN也可以在其他FPGA板上运行,包括Terasic的DE10标准/DE10 nano、Intel的PAC卡、Xilinx Ultra96-v2板。然而,由于时间和资源有限,尚未对此进行验证。

*Note: ResNet-50被用作benchmark. Image size is 227x227x3.
•	Python API
•	Command line API
•	C++ API
•	Q-learning
•	To build
•	Development
•	Changes
OpenCL library to train deep convolutional networks
•	C++
•	OpenCL
•	Deep convolutional
•	Python wrappers
•	Lua wrappers
•	Q-learning
•	Python
•	c++
•	command-line
Layer types:
•	convolutional
•	max-pooling
•	normalization
•	activation
•	dropout
•	random translations
•	random patches
•	loss
Loss layer types:
•	softmax
•	cross-entropy (synonymous with multinomial logistic, etc)
•	square loss
•	Anneal
•	Nesterov
•	Adagrad
•	Rmsprop
•	Adadelta
•	tanh
•	scaled tanh (1.7519 * tanh(2/3x) )
•	linear
•	sigmoid
•	relu
•	elu (new!)
Loader formats:
•	jpegs
•	mnist
•	kgsv2
•	norb
Weight initializers:
•	original
•	uniform
•	more possible...



使用来自kgsgo v2数据集的3360万个训练样本,在下一步预测任务中获得了37.2%的测试准确率

使用了命令行/deepcl_train数据集=kgsgoal netdef=12*(32c5z relu)-500n-tanh-361n numerages=15 learningrate=0.0001

2个时期,每个时期2天,在Amazon GPU实例上,包括一半NVidia GRID K520 GPU(大约是GTX780的一半)

在MNIST上获得99.5%的测试准确率,使用netdef=rt2-8c5z-relu-mp2-16c5z-relu-mp3-150n-tanh-10n numeracs=20多组=6学习率=0.002

历元时间99.8秒,使用Amazon GPU实例,即半个NVidia GRID K520 GPU(因为正在并行学习6个网络,所以每个网络历元时间16.6秒)





•	OpenCL-enabled GPU or APU, along with appropriate OpenCL driver installed
•	Tested using Windows 2012 RC2, and (New!) Visual Studio 2015, this is how the CI builds run
•	Download latest binary zip file from (eg from v8.0.0rc8)
•	unzip it, which creates the dist folder
•	To test it:
o	open a cmd
o	run call dist\bin\activate.bat (adjusting the path appropriately for wherever you downloaded deepcl binaries to)
o	now, eg try deepcl_unittests
o	(New!), you can choose which gpu to run tests on now, eg: deepcl_unittests gpuindex=1
•	OpenCL-enabled GPU or APU, along with appropriate OpenCL driver installed (can check by running clinfo, which should show your desired GPU device)
•	Tested using Ubuntu 14.04 32-bit/64-bit
•	Download latest tar file from (eg from v8.0.0rc8)
•	untar it, which creates the dist sub-folder
•	in a bash prompt, run source dist/bin/ (adjust the path appropriate for wherever you untarred the binaries tar file to)
•	test by doing, from the same bash prompt, eg deepcl_unittests
o	(New!), you can choose which gpu to run tests on now, eg: deepcl_unittests gpuindex=1
Note that you need to "activate" the installation each time you open a new bash prompt (or you can call from your .bashrc file)
Python wrappers
•	make sure you already installed the native library, and "activate"d it, by doing call dist\bin\activate.bat, or source dist/bin/
•	run pip install --pre DeepCL
•	test by doing python -c "import PyDeepCL; cl = PyDeepCL.DeepCL()"
To build from source
Building from source is only needed if installing from binaries doesn't work for your configuration, or if you want to modify DeepCL.
What if it doesn't run?
•	Check if you have an OpenCL-enabled device on your system
o	ideally a GPU, or accelerator, since there is no attempt to optimize DeepCL for CPUs (at least, not currently, could change, feel free to submit a pull request :-) )
•	Try running gpuinfo (from EasyCL, but built as part of this project too, for ease of use )
o	it should output at least one OpenCL-enabled device
o	if it doesn't, then you need to make sure you have an OpenCL-enabled device, and that appropriate drivers are installed, and that the ICD is configured appropriately (registry in Windows, and /etc/OpenCL/vendors in linux)
What if I need a new feature?
Please raise an issue, let me know you're interested.
•	If it's on my list of things I was going to do sooner or later anyway (see below), I might do it sooner rather than later.
•	If it's to do with usability, I will try to make that a priority
What if I want to contribute myself?
•	please feel free to fork this repository, tweak things, send a pull request. Or get in contact. Or both :-)
Third-party libraries
•	EasyCL
•	clew
•	libpng++
•	lua
•	cogapp
Hardware/driver specific issues
•	If you're using Clover, you might want to look at:
o	this thread #35
o	this branch
o	Note that Clover is NOT supported, these are just provided as "starting-points", in case someone wants to dabble in this :)
Mozilla Public License 2.0
Recent changes
•	2017 May 2nd:
o	branch update-easycl-mac updated to latest EasyCL, and unit-tests tested on Mac Sierra against:
	Intel HD Graphics 530 GPU
	Radeon Pro 450 GPU
o	This latest EasyCL lets you use environment variable CL_GPUOFFSET to select gpus, eg set to 1 for second GPU, or 2 for third
o	Thank you to my employer ASAPP for providing me use of said Mac Sierra :-)
•	7th August 2016:
o	"standard" version of windows compiler changed from msvc2010 to msvc2015 update 3 (no change to linux/mac)
o	"standard" version of python 3.x on windows changed from 3.4 to 3.5 (no change to linux/mac)
o	(note: python2.7 continues to work as before on all of Windows 32/64, linux, Mac)
o	standard c++ version on linux/mac changed from c++0x to c++11
•	29th July 2016:
o	python fixes:
	CHANGE: must use numpy tensors now, array.array no longer accepted
	New feature: can provide numpy tensors as 4d tensors now, no longer have to be 1d tensors
	Bug fix: q-learning working again now (hopefully)
•	26th July 2016:
o	fixed some bugs in manifest loader
o	no longer need to specify the number of images in the first line of the manifest file
o	added gpuindex= option to deepcl_unittests (quite beta for now...)
•	4th January 2016:
o	fixed a number of build warnings on Mac, both in OpenCL build, and C++ build
•	3rd January 2016:
o	create Mac OS X build on Travis, and fix the build,
•	27th November:
o	added ELU
•	Week of 26th October:
o	created branch clblas-2.8.0, which works with Visual Studio 2015. It uses the latest 2.8.x release of clBLAS. Thank you to jakakonda for helping to test this and get it working.
•	Aug 28th:
o	merged 8.x branch to master, will release first version of 8.x shortly
o	installation of 8.x from binaries on Windows works now, by doing, eg on 32-bit Windows 7, and assuming you already activated an appropriate python environment (assumes 7-zip is installed, in default location, otherwise do the unzip by hand):
powershell Set-ExecutionPolicy unrestricted
rem following command is like `wget` in linux:
powershell.exe -Command (new-object System.Net.WebClient).DownloadFile('', '')
rem following command is like `tar -xf` in linux:
"c:\program files\7-Zip\7z.exe" x
call dist\bin\activate.bat
pip install --pre DeepCL
python -c "import PyDeepCL; cl = PyDeepCL.DeepCL()"
# (last line is just to check works ok)
•	Aug 26th: installation of 8.x from binaries on linux works now, by doing, eg on 64-bit Ubuntu 14.04:
mkdir 8.0.0rc4
cd 8.0.0rc4
tar -xf deepcl-linux64-v8.0.0rc4.tar.bz2
virtualenv env
source env/bin/activate
source dist/bin/
pip install --pre DeepCL
python -c "import PyDeepCL; cl = PyDeepCL.DeepCL()"
(last line is just to check works ok)
•	Aug 21st-24th:
o	8.x finally builds again on all CI tested configurations!
	ubuntu 14.04 32-bit Python 2.7
	ubuntu 14.04 32-bit Python 3.4
	ubuntu 14.04 64-bit Python 2.7
	ubuntu 14.04 64-bit Python 3.4
	visual studio 2010 32-bit python 2.7
	visual studio 2010 32-bit python 3.4
	visual studio 2010 64-bit python 2.7
	visual studio 2010 64-bit python 3.4
•	Aug 19th-20th:
o	Python wrappers now built using a very thin layer, on top of the standard native DeepCL build
•	Aug 18th:
o	added BackwardIm2Col layer, which uses im2col for backward propagation
o	added BackpropWeightsIm2Col layer, which uses im2col for weight update
o	added BackwardAuto layer, which automatically selects fastest Backward layer
o	added BackpropWeightsAuto layer, which automatically selects faster weight update layer
o	under the covers:
	created ClBlasHelper, to handle Gemm and Gemv
	factorized im2col into Im2Col class
•	week up to Aug 17th:
o	added forward and backward im2col layer
o	forward im2col automatically used during forward propagation, where appropriate
o	backwards has yet to be integrated
o	under the covers:
	added clBLAS
	migrated the Python build process to use cmake, rather than (whether this turns out to be good or bad is a bit up in the air for now)
•	June 22nd:
o	removed lua wrappers
o	if you want to use lua with OpenCL, please consider using cltorch and clnn

