一、环境搭建

工作需要,用到了算能的1684芯片,在此记录一下基于官方手册的实操过程

参考官方手册:BMNNSDK2 入门手册

链接:https://sophgo-doc.gitbook.io/bmnnsdk2-bm1684

1.1 服务器环境

        SDK复现需要借助一定的开发环境,这里基于公司公有服务器,通过ssh方式使用,步骤如下: 

  1. 申请特定服务器账号
  2. 借助ssh工具,登录到服务器,由于电脑没法安装软件,这里采用了win10自带的cmd终端(需要接入内网),也可以采用常用的MobaXterm、SecureCRT等软件
  3. m1 docker desktop 安装 docker安装smb_算能

  4. 下载必要的成果物:SDK最新包+docker镜像,可以采用wget命令:
# docker镜像
wget https://sophon-file.sophon.cn/sophon-prod-s3/drive/22/03/19/13/bmnnsdk2-bm1684-ubuntu-docker-py37.zip
#SDK
wget https://sophon-file.sophon.cn/sophon-prod-s3/drive/22/05/31/11/bmnnsdk2_bm1684_v2.7.0_20220531patched.zip

下载好的成果物如下:

m1 docker desktop 安装 docker安装smb_ubuntu_02

1.2 SDK环境

        通过上述操作,我们已经下载了必备的成果物,这里先解压SDK整包,解压后,可以通过校验MD5码,防止文件被篡改,带来一些不必要的麻烦,命令如下:

(base) xxx@bitmain-SYS-4028GR-TR2:~$unzip bmnnsdk2_bm1684_v2.7.0_20220531patched.zip
Archive:  bmnnsdk2_bm1684_v2.7.0_20220531patched.zip
   creating: bmnnsdk2_bm1684_v2.7.0_20220531patched/
  inflating: bmnnsdk2_bm1684_v2.7.0_20220531patched/bmnnsdk2.MD5
  inflating: __MACOSX/bmnnsdk2_bm1684_v2.7.0_20220531patched/._bmnnsdk2.MD5
  inflating: bmnnsdk2_bm1684_v2.7.0_20220531patched/release_version.txt
  inflating: __MACOSX/bmnnsdk2_bm1684_v2.7.0_20220531patched/._release_version.txt
  inflating: bmnnsdk2_bm1684_v2.7.0_20220531patched/bmnnsdk2-bm1684_v2.7.0.tar.gz
  inflating: __MACOSX/bmnnsdk2_bm1684_v2.7.0_20220531patched/._bmnnsdk2-bm1684_v2.7.0.tar.gz
(base) xxx@bitmain-SYS-4028GR-TR2:~/bmnnsdk2_bm1684_v2.7.0_20220531patched$cat bmnnsdk2.MD5
6ae7d9b5a8564eb66f4f820319c2d39f  ./bmnnsdk2-bm1684_v2.7.0.tar.gz
bf2c860701575909e43b964011694c8f  ./release_version.txt
(base) xxx@bitmain-SYS-4028GR-TR2:~/bmnnsdk2_bm1684_v2.7.0_20220531patched$md5sum ./*
6ae7d9b5a8564eb66f4f820319c2d39f  ./bmnnsdk2-bm1684_v2.7.0.tar.gz
7719bf8cd5d5de8388ebcddda6f2c4be  ./bmnnsdk2.MD5
bf2c860701575909e43b964011694c8f  ./release_version.txt

 继续解压缩SDK真正成果物,如下:

(base) xxx@bitmain-SYS-4028GR-TR2:~/bmnnsdk2_bm1684_v2.7.0_20220531patched$tar -zxvf bmnnsdk2-bm1684_v2.7.0.tar.gz
bmnnsdk2-bm1684_v2.7.0/
bmnnsdk2-bm1684_v2.7.0/release_version.txt
......

至此,SDK包的环境已经处理完毕。

1.3 docker环境

        经过上述操作,我们已经进入到服务器环境,并且下载好了相关成果物。为了方便便捷复现SDK,这里直接基于官方docker镜像,不再采用自搭docker。

        docker采用ubuntu-docker-py37,首先需要解压该docker压缩包,解压缩后,可以通过校验MD5码,防止文件被篡改,带来一些不必要的麻烦,命令如下:

base) xxx@bitmain-SYS-4028GR-TR2:~$unzip bmnnsdk2-bm1684-ubuntu-docker-py37.zip
Archive:  bmnnsdk2-bm1684-ubuntu-docker-py37.zip
   creating: bmnnsdk2-bm1684-ubuntu-docker-py37/
 extracting: bmnnsdk2-bm1684-ubuntu-docker-py37/bmnnsdk2-bm1684-ubuntu.docker

 extracting: bmnnsdk2-bm1684-ubuntu-docker-py37/bmnnsdk2.MD5
 extracting: bmnnsdk2-bm1684-ubuntu-docker-py37/Dockerfile.bm1684
 extracting: bmnnsdk2-bm1684-ubuntu-docker-py37/release_version.txt
(base) xxx@bitmain-SYS-4028GR-TR2:~/bmnnsdk2-bm1684-ubuntu-docker-py37$cat bmnnsdk2.MD5
cf91eb0ff60f28e368bba1c357d2e7e5  ./Dockerfile.bm1684
c181ce60245b4fe07596d8a360944903  ./release_version.txt
105a4d5d13a41d97353fd2dab88b4802  ./bmnnsdk2-bm1684-ubuntu.docker
(base) xxx@bitmain-SYS-4028GR-TR2:~/bmnnsdk2-bm1684-ubuntu-docker-py37$md5sum ./*
105a4d5d13a41d97353fd2dab88b4802  ./bmnnsdk2-bm1684-ubuntu.docker
7b1fdecee114e6d2d82c21286e9b1a39  ./bmnnsdk2.MD5
cf91eb0ff60f28e368bba1c357d2e7e5  ./Dockerfile.bm1684
c181ce60245b4fe07596d8a360944903  ./release_version.txt

        参考官方说明,SDK包中有docker运行的脚本docker_run_bmnnsdk.sh,不过考虑到当前公用服务器,该脚本大概率会被执行了很多遍,相关container已经被多次创建,这里为了方便识别,需要修改脚本中内容,重命名container名称,脚本修改点如下:

if [ -c "/dev/bm-sophon0" ]; then
  for dev in $(ls /dev/bm-sophon*);
  do
    mount_options+="--device="$dev:$dev" "
  done
  CMD="docker run \
      --name ubuntu16.0-py37-wnb \
      --network=host \
      --workdir=/workspace \
      --privileged=true \
      ${mount_options} \
      --device=/dev/bmdev-ctl:/dev/bmdev-ctl \
      -v /dev/shm --tmpfs /dev/shm:exec \
      -v $WORKSPACE:/workspace \
      -v /dev:/dev \
      -v /etc/localtime:/etc/localtime \
      -e LOCAL_USER_ID=`id -u` \
      -it $REPO/$IMAGE:$TAG \
      bash
  "
else
  CMD="docker run \
      --name ubuntu16.0-py37-wnb \
      --network=host \
      --workdir=/workspace \
      --privileged=true \
      -v $WORKSPACE:/workspace \
      -v /dev/shm --tmpfs /dev/shm:exec \
      -v /etc/localtime:/etc/localtime \
      -e LOCAL_USER_ID=`id -u` \
      -it $REPO/$IMAGE:$TAG \
      bash
  "
fi

下面创建container,采用官方脚本,容器创建后,会默认进入,命令如下:

(base) xxx@bitmain-SYS-4028GR-TR2:~/bmnnsdk2_bm1684_v2.7.0_20220531patched/bmnnsdk2-bm1684_v2.7.0$./docker_run_bmnnsdk.sh
/mnt/sdb2/xxx/bmnnsdk2_bm1684_v2.7.0_20220531patched/bmnnsdk2-bm1684_v2.7.0
/mnt/sdb2/xxx/bmnnsdk2_bm1684_v2.7.0_20220531patched/bmnnsdk2-bm1684_v2.7.0
bmnnsdk2-bm1684/dev:ubuntu16.04
docker run --name ubuntu16.0-py37-wnb --network=host --workdir=/workspace --privileged=true --device=/dev/bm-sophon0:/dev/bm-sophon0 --device=/dev/bm-sophon1:/dev/bm-sophon1 --device=/dev/bm-sophon2:/dev/bm-sophon2 --device=/dev/bm-sophon3:/dev/bm-sophon3 --device=/dev/bm-sophon4:/dev/bm-sophon4 --device=/dev/bm-sophon5:/dev/bm-sophon5 --device=/dev/bm-sophon6:/dev/bm-sophon6 --device=/dev/bm-sophon7:/dev/bm-sophon7 --device=/dev/bm-sophon8:/dev/bm-sophon8 --device=/dev/bmdev-ctl:/dev/bmdev-ctl -v /dev/shm --tmpfs /dev/shm:exec -v /mnt/sdb2/xxx/bmnnsdk2_bm1684_v2.7.0_20220531patched/bmnnsdk2-bm1684_v2.7.0:/workspace -v /dev:/dev -v /etc/localtime:/etc/localtime -e LOCAL_USER_ID=1032 -it bmnnsdk2-bm1684/dev:ubuntu16.04 bash
root@bitmain-SYS-4028GR-TR2:/workspace#

注:

        上述方式运行的container,在退出后,container会自动退出,为了方便反复使用,可以通过如下命令进入:

(base) xxx@bitmain-SYS-4028GR-TR2:~/bmnnsdk2_bm1684_v2.7.0_20220531patched/bmnnsdk2-bm1684_v2.7.0$docker start ubuntu16.0-py37-wnb
ubuntu16.0-py37-wnb
(base) xxx@bitmain-SYS-4028GR-TR2:~/bmnnsdk2_bm1684_v2.7.0_20220531patched/bmnnsdk2-bm1684_v2.7.0$docker exec -it ubuntu16.0-py37-wnb bash
root@bitmain-SYS-4028GR-TR2:/workspace#

至此,基本环境就搭建完毕了。

二、example重现

下面基于上述环境,进行SDK中example重现,目录结构如下:

#examples目录结构
.
|-- Resnet_classify
|-- RetinaFace
|-- SSD_object
|-- YOLOX_object
|-- YOLOv3_object
|-- YOLOv5_object
|-- calibration
|-- centernet
|-- multimedia
|-- nntc
|-- okkernel
`-- sail

在复现example之前,还需要在docker中安装SDK中必须库和设置环境变量,命令如下:

root@bitmain-SYS-4028GR-TR2:/workspace/scripts# ./install_lib.sh nntc
linux is Ubuntu16.04.5LTS\n\l
bmnetc and bmlang USING_CXX11_ABI=1
Install lib done !
root@bitmain-SYS-4028GR-TR2:/workspace/scripts# source envsetup_pcie.sh
/workspace/scripts /workspace/scripts
......
Successfully installed Flask-2.1.2 brotli-1.0.9 click-8.1.3 dash-2.5.1 dash-bootstrap-components-1.2.0 dash-core-components-2.0.0 dash-cytoscape-0.3.0 dash-draggable-0.1.2 dash-html-components-2.0.0 dash-split-pane-1.0.0 dash-table-5.0.0 flask-compress-1.12 ipykernel-5.3.4 itsdangerous-2.1.2 jsonschema-3.2.0 ufw-1.0.0 ufwio-0.9.0
root@bitmain-SYS-4028GR-TR2:/workspace/scripts# source envsetup_cmodel.sh
/workspace/scripts /workspace/scripts
......
Installing collected packages: ufw
Successfully installed ufw-1.0.0

2.1 SSD_object(caffe)

2.1.1模型迁移

        首先,下载原生caffe模型,并做软连接,采用model目录下脚本实现,如下:

root@bitmain-SYS-4028GR-TR2:/workspace/examples/SSD_object/model# ./download_ssd_model.sh
Downloading models_VGGNet_VOC0712_SSD_300x300.tar.gz...
......
All done!
  • bmodel(fp32)模型生成:将原生模型转换成适合算能TPU的bmodel(fp32)模型,命令如下:
root@bitmain-SYS-4028GR-TR2:/workspace/examples/SSD_object/model# ./gen_bmodel.sh
/workspace/examples/SSD_object/model
......
Success: combined to [out/fp32_ssd300.bmodel].
#生成的模型文件
./out/
|-- fp32_ssd300.bmodel
|-- ssd300
`-- ssd300_4batch
  • bmodel(int8)模型生成:将原生模型转换成int8的bmodel模型,中间会将模型先转换为fp32格式的umodel格式(UFamework下的模型格式)模型,之后再借助该中间模型生成int8的umodel模型,最后再生成int8的bmodel模型,命令如下:
root@bitmain-SYS-4028GR-TR2:/workspace/examples/SSD_object/model# ./gen_umodel_int8bmodel.sh
/workspace/examples/SSD_object/model
/workspace/examples/SSD_object/model /workspace/examples/SSD_object/model
......
Success: combined to [out/int8_ssd300.bmodel].
combine bmodel ok
/workspace/examples/SSD_object/model

此时,可以看到该目录下有新目录out生成,该目录结构如下:

.
|-- fp32_ssd300.bmodel
|-- int8_ssd300.bmodel
|-- ssd300
`-- ssd300_4batch

2.1.2 精度回归

上章我们将原生caffe模型编译,生成了fp32、int8的bmodel,这里通过自带精度校验工具进行模型精度回归。

        该回归需要借助模型迁移中生成的输入、输出数据,命令如下:        

root@bitmain-SYS-4028GR-TR2:/workspace/examples/SSD_object/model# bmrt_test --context_dir=./out/ssd300
[BMRT][deal_with_options:1412] INFO:Loop num: 1
......
[BMRT][bmrt_test:1043] INFO:+++ The network[VGG_VOC0712_SSD_300x300_deploy] stage[0] cmp success +++
[BMRT][bmrt_test:1063] INFO:load input time(s): 0.031876
[BMRT][bmrt_test:1064] INFO:calculate  time(s): 0.037262
[BMRT][bmrt_test:1065] INFO:get output time(s): 0.000046
[BMRT][bmrt_test:1066] INFO:compare    time(s): 0.006667

2.1.3 算法迁移

        该部分做的主要工作是使用SDK提供的软件接口,实现模型前后处理逻辑。这里基于example中已经替换过的CPP进行编译、测试,算法源码如下,这里摘取一部分作为示例,主要是其中一些格式转换、缩放等接口替换为SDK中实现:

// resize && split by bmcv
  for (size_t i = 0; i < input.size(); i++) {
    LOG_TS(ts_, "ssd pre-process-vpp")
    bmcv_image_vpp_convert (bm_handle_, 1, input[i], &resize_bmcv_[i], &crop_rect_);
    LOG_TS(ts_, "ssd pre-process-vpp")
  }

  // do linear transform
  LOG_TS(ts_, "ssd pre-process-linear_tranform")
  bmcv_image_convert_to (bm_handle_, input.size(), linear_trans_param_, resize_bmcv_, linear_trans_bmcv_);
  LOG_TS(ts_, "ssd pre-process-linear_tranform")

        下面进行源码编译,【环境搭建】章节中,已经将编译需要的依赖及工具链配置好,这里直接编译即可,编译完之后,会在当前目录生成pcie、arm版本的可执行程序:

root@bitmain-SYS-4028GR-TR2:/workspace/examples/SSD_object/cpp_cv_bmcv_bmrt# make -f Makefile.pcie

root@bitmain-SYS-4028GR-TR2:/workspace/examples/SSD_object/cpp_cv_bmcv_bmrt# make -f Makefile.arm

#成果物
|-- ssd300_cv_bmcv_bmrt.arm
`-- ssd300_cv_bmcv_bmrt.pcie

        由于docker环境下是通过PCIE方式插入BM1684(可以通过lspci命令确认),这里可以直接运行ssd300_cv_bmcv_bmrt.pcie,发现如下报错:

root@bitmain-SYS-4028GR-TR2:/workspace/examples/SSD_object/cpp_cv_bmcv_bmrt# ./ssd300_cv_bmcv_bmrt.p
cie image /workspace/res/image/vehicle_1.jpg ../model/out/fp32_ssd300.bmodel 1 0
./ssd300_cv_bmcv_bmrt.pcie: error while loading shared libraries: libavcodec.so.58: cannot open shared object file: No such file or directory

        通过排查,发现是环境配置章节中,需要根据环境,配置PCIE或者SOC模式,按照PCIE模式重新配置后,再运行后,demo能够正常执行:

root@bitmain-SYS-4028GR-TR2:/workspace/examples/SSD_object/cpp_cv_bmcv_bmrt# ./ssd300_cv_bmcv_bmrt.pcie image /workspace/res/image/vehicle_1.jpg ../model/out/fp32_ssd300.bmodel 1 0
[/home/jenkins/workspace/all_in_one_sa5/daily_build/bmetc/sa5/middleware-soc/bm_opencv/modules/core/src/cv_bmcpu.cpp:49->InternalBMCpuRegister]total 9 devices need to enable on-chip CPU. It may need serveral minutes                     for loading, please be patient....
......
[         ssd overall]  loops:    1 avg: 679449 us
[          read image]  loops:    1 avg: 391943 us
[        attach input]  loops:    1 avg: 2291 us
[           detection]  loops:    1 avg: 86327 us
[     ssd pre-process]  loops:    1 avg: 48232 us
[ ssd pre-process-vpp]  loops:    1 avg: 1300 us
[ssd pre-process-linear_tranform]  loops:    1 avg: 46928 us
[       ssd inference]  loops:    1 avg: 37930 us
[    ssd post-process]  loops:    1 avg: 161 us

[/home/jenkins/workspace/all_in_one_sa5/daily_build/bmetc/sa5/middleware-soc/bm_opencv/modules/core/src/cv_bmcpu.cpp:113->~InternalBMCpuRegister]deconstructor function is called

2.2 VQ-VAE(tensorflow)

        直接参考官方SDK中examples/nntc/bmnett示例,命令如下,直接执行模型转换脚本:

root@bitmain-SYS-4028GR-TR2:/workspace/examples/nntc/bmnett# ./bmnett_build_bmodel.sh
Namespace(check_ops=True, cmp=True, const_names=None, descs=None, dyn=False, enable_profile=False, input_folder='', input_names=('P
......
BMLIB Send Quit Message
Compiling succeeded.

#成果物目录
./output/
`-- vqvae
    |-- compilation.bmodel
    |-- input_ref_data.dat
    |-- io_info.dat
    `-- output_ref_data.dat

2.3 LeNet(MXNet)

直接参考官方SDK中examples/nntc/bmnetm示例,命令如下,直接执行模型转换脚本:

root@bitmain-SYS-4028GR-TR2:/workspace/examples/nntc/bmnetm# ./bmnetm_build_bmodel.sh
args: Namespace(cmp=None, debug=0, dyn=False, enable_profile=False, input_data='', input_names='data', list_ops=False, log_dir='',
......
I0712 11:56:00.312815  1480 bmcompiler_bmodel.cpp:154] [BMCompiler:I] save_tensor output name [softmax_output]
BMLIB Send Quit Message

#生成物目录
./output/
`-- lenet
    |-- compilation.bmodel
    |-- input_ref_data.dat
    |-- io_info.dat
    `-- output_ref_data.dat

2.4 Anchors(Pytorch)

        直接参考官方SDK中examples/nntc/bmnetp示例,命令如下,直接执行模型转换脚本:

root@bitmain-SYS-4028GR-TR2:/workspace/examples/nntc/bmnetp# ./bmnetp_build_bmodel.sh
Namespace(cmp=True, desc=None, descs=None, dyn=False, enable_profile=False, input_structure=None, log_dir
......
BMLIB Send Quit Message
Compiling succeeded.

#生成物目录
./output/
`-- anchors
    |-- compilation.bmodel
    |-- input_ref_data.dat
    |-- io_info.dat
    `-- output_ref_data.dat

2.5 Yolov3-tiny(Darknet)

        直接参考官方SDK中examples/nntc/bmnetd示例,命令如下,直接执行模型转换脚本:

root@bitmain-SYS-4028GR-TR2:/workspace/examples/nntc/bmnetd# ./bmnetd_build_bmodel.sh
......
*** Store bmodel of BMCompiler...
============================================================
BMLIB Send Quit Message
#生成物目录
./output/
`-- anchors
    |-- compilation.bmodel
    |-- input_ref_data.dat
    |-- io_info.dat
    `-- output_ref_data.dat

2.6 Onnx&Paddle

其他深度学习框架的模型均能够转换到onnx格式,官方example未给具体示例展示

三、实战

3.1 模型迁移

为了减少运算量、提高模型性能等,一般都需要将模型转换为INT8,步骤如下图所示:

m1 docker desktop 安装 docker安装smb_ubuntu_03

 

3.1.1 量化数据集准备

        参考官方SDK中examples/calibration/create_lmdb_demo,先下载数据集,这里采用的是coco128数据集,命令如下(如果无法运行,可以通过chmod增加运行权限,官方未加该权限):

root@bitmain-SYS-4028GR-TR2:/workspace/examples/calibration/create_lmdb_demo# chmod +x download_coco128.sh 
root@bitmain-SYS-4028GR-TR2:/workspace/examples/calibration/create_lmdb_demo# ./download_coco128.sh
......
inflating: coco128/README.txt

        之后制作lmdb数据库文件,后面校准需要使用到该格式数据集,注意根据实际图片路径配置,官方给的路径参数有误,命令如下:

root@bitmain-SYS-4028GR-TR2:/workspace/examples/calibration/create_lmdb_demo# python3 convert_imageset.py --imageset_rootfolder=./coco128/images/train2017 --imageset_lmdbfolder=./coco128 --resize_height=256 --resize_width=256 --shuffle=True --bgr2rgb=False --gray=False

reading image /workspace/examples/calibration/create_lmdb_demo/coco128/images/train2017/000000000634.jpg
......
reading image /workspace/examples/calibration/create_lmdb_demo/coco128/images/train2017/000000000359.jpg
original shape: (332, 500, 3)
cv_imge after resize (256, 256, 3)

#目录结构
coco128/
|-- LICENSE
|-- README.txt
|-- data.mdb        //即制作的数据库文件
|-- images
`-- labels