OpenCL神经网络FPGA加速器与DeepCL
概述
PipeCNN是一种基于OpenCL的大规模卷积神经网络FPGA加速器。在FPGA界中,利用高级合成(HLS)工具来设计和实现FPGA上的定制电路的趋势越来越大。与基于RTL的设计方法相比,HLS工具通过将高级语言(如C/C++)中的算法自动合成为RTL/硬件,提供了更快的硬件开发周期。开放运算语言™ 是一种开放、新兴的跨平台并行编程语言,可用于GPU和FPGA开发。该项目的主要目标是在FPGA上提供一种通用的、但高效的基于OpenCL的CNN加速器设计。PipeCNN利用Pipelined CNN函数内核来提高推理计算的吞吐量。设计在性能和硬件资源方面都是可扩展的,因此可以部署在各种FPGA平台上。PipeCNN支持Intel OpenCL SDK和基于Xilinx Vitis的FPGA设计流程。
如何使用
首先,从PipeCNN自己的ModelZoo下载预先训练的CNN模型、输入测试向量和黄金参考文件(说明位于每个项目文件夹中的“数据”文件夹中)。将数据放在正确的文件夹中。然后,使用提供的Makefile编译项目。完成编译后,只需键入以下命令即可运行PipeCNN:
./run.exe conv.aocx
ModelZoo现在为以下网络提供预量化模型:
• VGG-16
• ResNet-50
ModelZoo现在为以下网络提供预量化模型:
有关详细说明,请查看用户说明。
支持的工具
目前,正在使用Intel的OpenCL SDK和Xilinx Vitis工具包来编译OpenCL/HLS代码,并在FPGA上实现生成的RTL。
• Intel OpenCL SDK Pro v20.1
• Xilinx Vitis 2020.1
Tested Boards
以下Boards板已通过测试:
• Terasic's DE5a-net-ddr4 (Arria-10 GX1150 FPGA)
• Intel's Arria-10 Dev Kit (Arria-10 GX1150 FPGA)
• Xilinx's U50 Acceleration Card (VU35P FPGA)
• Xilinx's ZCU102 Dev Board (ZU9EG FPGA)
• Xilinx's ZC706 Dev Board (Zynq-7045 FPGA)
PipeCNN也可以在其他FPGA板上运行,包括Terasic的DE10标准/DE10 nano、Intel的PAC卡、Xilinx Ultra96-v2板。然而,由于时间和资源有限,尚未对此进行验证。
Demos
现在,可以使用PipeCNN在ImageNet数据集上运行分类,并测量不同CNN模型的前1/5精度。
要运行这个演示,首先在Makefile中设置USE_OPENCV=1。其次,下载ImageNet验证数据集,提取并将所有图片放在“/data”文件夹中。重命名主机文件中的变量“picture_file_path_head”,以指示正确的图像数据集路径。最后,重新编译主机程序并运行PipeCNN。
下面的示意图显示,Demos在自己的计算机上运行,带有DE5网络板。
Performances
这个项目发布已经四年了。深度学习体系结构(DLA)不断发展,已经发明了许多新技术来提高DLA的效率。PipeCNN的性能已无法与最先进的设计相比。因此,本项目的当前目标是提供一个完整的设计,可用于学习DLA并尝试新想法。
下表列出了用作参考的一些电路板的性能和成本信息。对于每个FPGA设备,需要执行设计空间探索(使用硬件参数VEC_SIZE、LANE_NUM和CONV_GP_SIZE_X),以找到最大化吞吐量或最小化执行时间的最佳设计。此处总结了上述板的建议硬件参数。由于不断优化设计并更新代码,下表中的性能数据可能已过时,请使用最新版本获取exect数据。提供其他供应商/研究人员提供其他FPGA平台/板的最新性能和成本信息。

*Note: ResNet-50被用作benchmark. Image size is 227x227x3.
Citation
引用PipeCNN的工作,如果对研究有帮助:
Dong Wang, Ke Xu and Diankun Jiang, “PipeCNN: An OpenCL-Based Open-Source FPGA Accelerator for Convolution Neural Networks”, FPT 2017.
可以进行架构和算法级别的优化,以进一步提高PipeCNN的性能。列出了一些基于PipeCNN的最新研究成果,以供参考:
通过引入一种新的opencl友好的稀疏卷积算法来提高吞吐量
Dong Wang, Ke Xu, Qun Jia and Soheil Ghiasi, “ABM-SpConv: A Novel Approach to FPGA-Based Acceleration of Convolutional Neural Network Inference”, DAC 2019.
研究机会
研究实验室也在寻找对FPGA上的深度学习算法设计硬件加速器感兴趣的优秀学生。
相关工程
还有其他FPGA加速器也采用基于HLS的设计方案。下面列出了一些杰出的作品。请注意,PipeCNN是第一个,也是唯一一个开源的。
• U. Aydonat, S. O'Connell, D. Capalija, A. C. Ling, and G. R. Chiu. "An OpenCL™ Deep Learning Accelerator on Arria 10," in Proc. FPGA 2017.
• N. Suda, V. Chandra, G. Dasika, A. Mohanty, Y. F. Ma, S. Vrudhula, J. S. Seo, and Y. Cao, "Throughput-Optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks," in Proc. FPGA 2016.
• C. Zhang, P. Li, G. Sun, Y. Guan, B. J. Xiao, and J. Cong, "Optimizing FPGA-based accelerator design for deep convolutional neural networks," in Proc. FPGA 2015.

DeepCL
•	Python API
•	Command line API
•	C++ API
•	Q-learning
•	To build
•	Development
•	Changes
DeepCL
OpenCL library to train deep convolutional networks
•	C++
•	OpenCL
•	Deep convolutional
•	Python wrappers
•	Lua wrappers
•	Q-learning
APIs:
•	Python
•	c++
•	command-line
Layer types:
•	convolutional
•	max-pooling
•	normalization
•	activation
•	dropout
•	random translations
•	random patches
•	loss
Loss layer types:
•	softmax
•	cross-entropy (synonymous with multinomial logistic, etc)
•	square loss
Trainers:
•	SGD
•	Anneal
•	Nesterov
•	Adagrad
•	Rmsprop
•	Adadelta
Activations:
•	tanh
•	scaled tanh (1.7519 * tanh(2/3x) )
•	linear
•	sigmoid
•	relu
•	elu (new!)
Loader formats:
•	jpegs
•	mnist
•	kgsv2
•	norb
Weight initializers:
•	original
•	uniform
•	more possible...


也可以使用多列网络,如McDnn

示例用法

使用来自kgsgo v2数据集的3360万个训练样本,在下一步预测任务中获得了37.2%的测试准确率

使用了命令行/deepcl_train数据集=kgsgoal netdef=12*(32c5z relu)-500n-tanh-361n numerages=15 learningrate=0.0001

2个时期,每个时期2天,在Amazon GPU实例上,包括一半NVidia GRID K520 GPU(大约是GTX780的一半)

在MNIST上获得99.5%的测试准确率,使用netdef=rt2-8c5z-relu-mp2-16c5z-relu-mp3-150n-tanh-10n numeracs=20多组=6学习率=0.002

历元时间99.8秒,使用Amazon GPU实例,即半个NVidia GRID K520 GPU(因为正在并行学习6个网络,所以每个网络历元时间16.6秒)

安装

本机库安装

本节安装本机库和命令行工具。您始终需要完成这一部分,即使将使用Python包装器。
Windows

前提条件:

•	OpenCL-enabled GPU or APU, along with appropriate OpenCL driver installed
•	Tested using Windows 2012 RC2, and (New!) Visual Studio 2015, this is how the CI builds run
Procedure:
•	Download latest binary zip file from http://deepcl.hughperkins.com/Downloads/ (eg from v8.0.0rc8)
•	unzip it, which creates the dist folder
•	To test it:
o	open a cmd
o	run call dist\bin\activate.bat (adjusting the path appropriately for wherever you downloaded deepcl binaries to)
o	now, eg try deepcl_unittests
o	(New!), you can choose which gpu to run tests on now, eg: deepcl_unittests gpuindex=1
每次打开新的cmd提示符时都需要“激活”安装(或者可以使用“控制面板”|“系统”|“高级系统设置”|“环境变量”永久添加适当的环境变量)
Linux
Pre-requisites:
•	OpenCL-enabled GPU or APU, along with appropriate OpenCL driver installed (can check by running clinfo, which should show your desired GPU device)
•	Tested using Ubuntu 14.04 32-bit/64-bit
Procedure:
•	Download latest tar file from http://deepcl.hughperkins.com/Downloads/ (eg from v8.0.0rc8)
•	untar it, which creates the dist sub-folder
•	in a bash prompt, run source dist/bin/activate.sh (adjust the path appropriate for wherever you untarred the binaries tar file to)
•	test by doing, from the same bash prompt, eg deepcl_unittests
o	(New!), you can choose which gpu to run tests on now, eg: deepcl_unittests gpuindex=1
Note that you need to "activate" the installation each time you open a new bash prompt (or you can call activate.sh from your .bashrc file)
Python wrappers
•	make sure you already installed the native library, and "activate"d it, by doing call dist\bin\activate.bat, or source dist/bin/activate.sh
•	run pip install --pre DeepCL
•	test by doing python -c "import PyDeepCL; cl = PyDeepCL.DeepCL()"
To build from source
Building from source is only needed if installing from binaries doesn't work for your configuration, or if you want to modify DeepCL.
See Build.md
What if it doesn't run?
•	Check if you have an OpenCL-enabled device on your system
o	ideally a GPU, or accelerator, since there is no attempt to optimize DeepCL for CPUs (at least, not currently, could change, feel free to submit a pull request :-) )
•	Try running gpuinfo (from EasyCL, but built as part of this project too, for ease of use )
o	it should output at least one OpenCL-enabled device
o	if it doesn't, then you need to make sure you have an OpenCL-enabled device, and that appropriate drivers are installed, and that the ICD is configured appropriately (registry in Windows, and /etc/OpenCL/vendors in linux)
What if I need a new feature?
Please raise an issue, let me know you're interested.
•	If it's on my list of things I was going to do sooner or later anyway (see below), I might do it sooner rather than later.
•	If it's to do with usability, I will try to make that a priority
What if I want to contribute myself?
•	please feel free to fork this repository, tweak things, send a pull request. Or get in contact. Or both :-)
Third-party libraries
•	EasyCL
•	clew
•	libpng++
•	lua
•	cogapp
Hardware/driver specific issues
•	If you're using Clover, you might want to look at:
o	this thread #35
o	this branch https://github.com/hughperkins/DeepCL/tree/clover-compatibility
o	Note that Clover is NOT supported, these are just provided as "starting-points", in case someone wants to dabble in this :)
Related projects
•	kgsgo-dataset-preprocessor Dataset based on kgsgo games; 33 million data points
•	cltorch
•	clnn
License
Mozilla Public License 2.0
Recent changes
•	2017 May 2nd:
o	branch update-easycl-mac updated to latest EasyCL, and unit-tests tested on Mac Sierra against:
	Intel HD Graphics 530 GPU
	Radeon Pro 450 GPU
o	This latest EasyCL lets you use environment variable CL_GPUOFFSET to select gpus, eg set to 1 for second GPU, or 2 for third
o	Thank you to my employer ASAPP for providing me use of said Mac Sierra :-)
•	7th August 2016:
o	"standard" version of windows compiler changed from msvc2010 to msvc2015 update 3 (no change to linux/mac)
o	"standard" version of python 3.x on windows changed from 3.4 to 3.5 (no change to linux/mac)
o	(note: python2.7 continues to work as before on all of Windows 32/64, linux, Mac)
o	standard c++ version on linux/mac changed from c++0x to c++11
•	29th July 2016:
o	python fixes:
	CHANGE: must use numpy tensors now, array.array no longer accepted
	New feature: can provide numpy tensors as 4d tensors now, no longer have to be 1d tensors
	Bug fix: q-learning working again now (hopefully)
•	26th July 2016:
o	fixed some bugs in manifest loader
o	no longer need to specify the number of images in the first line of the manifest file
o	added gpuindex= option to deepcl_unittests (quite beta for now...)
•	4th January 2016:
o	fixed a number of build warnings on Mac, both in OpenCL build, and C++ build
•	3rd January 2016:
o	create Mac OS X build on Travis, and fix the build, https://travis-ci.org/hughperkins/DeepCL
•	27th November:
o	added ELU
•	Week of 26th October:
o	created branch clblas-2.8.0, which works with Visual Studio 2015. It uses the latest 2.8.x release of clBLAS. Thank you to jakakonda for helping to test this and get it working.
•	Aug 28th:
o	merged 8.x branch to master, will release first version of 8.x shortly
o	installation of 8.x from binaries on Windows works now, by doing, eg on 32-bit Windows 7, and assuming you already activated an appropriate python environment (assumes 7-zip is installed, in default location, otherwise do the unzip by hand):
powershell Set-ExecutionPolicy unrestricted
rem following command is like `wget` in linux:
powershell.exe -Command (new-object System.Net.WebClient).DownloadFile('http://deepcl.hughperkins.com/Downloads/deepcl-win32-v8.0.0rc8.zip', 'deepcl-win32-v8.0.0rc8.zip')
rem following command is like `tar -xf` in linux:
"c:\program files\7-Zip\7z.exe" x deepcl-win32-v8.0.0rc8.zip
call dist\bin\activate.bat
pip install --pre DeepCL
python -c "import PyDeepCL; cl = PyDeepCL.DeepCL()"
# (last line is just to check works ok)
•	Aug 26th: installation of 8.x from binaries on linux works now, by doing, eg on 64-bit Ubuntu 14.04:
mkdir 8.0.0rc4
cd 8.0.0rc4
wget http://deepcl.hughperkins.com/Downloads/deepcl-linux64-v8.0.0rc4.tar.bz2
tar -xf deepcl-linux64-v8.0.0rc4.tar.bz2
virtualenv env
source env/bin/activate
source dist/bin/activate.sh
pip install --pre DeepCL
python -c "import PyDeepCL; cl = PyDeepCL.DeepCL()"
(last line is just to check works ok)
•	Aug 21st-24th:
o	8.x finally builds again on all CI tested configurations!
	ubuntu 14.04 32-bit Python 2.7
	ubuntu 14.04 32-bit Python 3.4
	ubuntu 14.04 64-bit Python 2.7
	ubuntu 14.04 64-bit Python 3.4
	visual studio 2010 32-bit python 2.7
	visual studio 2010 32-bit python 3.4
	visual studio 2010 64-bit python 2.7
	visual studio 2010 64-bit python 3.4
•	Aug 19th-20th:
o	Python wrappers now built using a very thin setup.py layer, on top of the standard native DeepCL build
•	Aug 18th:
o	added BackwardIm2Col layer, which uses im2col for backward propagation
o	added BackpropWeightsIm2Col layer, which uses im2col for weight update
o	added BackwardAuto layer, which automatically selects fastest Backward layer
o	added BackpropWeightsAuto layer, which automatically selects faster weight update layer
o	under the covers:
	created ClBlasHelper, to handle Gemm and Gemv
	factorized im2col into Im2Col class
•	week up to Aug 17th:
o	added forward and backward im2col layer
o	forward im2col automatically used during forward propagation, where appropriate
o	backwards has yet to be integrated
o	under the covers:
	added clBLAS
	migrated the Python build process to use cmake, rather than setup.py (whether this turns out to be good or bad is a bit up in the air for now)
•	June 22nd:
o	removed lua wrappers
o	if you want to use lua with OpenCL, please consider using cltorch and clnn

参考文献链接
https://github.com/hughperkins/DeepCLhttps://github.com/doonny/PipeCNN

人工智能芯片与自动驾驶