【嵌入式AI】使用tvm交叉编译和RPC部署在raspberry pi和rk3399教程

原创

极智视界 2022-04-19 16:29:19 博主文章分类：嵌入式AI ©著作权

文章标签 深度学习交叉编译 rpc raspberry pi 嵌入式 文章分类 代码人生

©著作权归作者所有：来自51CTO博客作者极智视界的原创作品，请联系作者获取转载授权，否则将追究法律责任

使用tvm rpc模块可通过ip、端口远程在编译设备部署，把繁杂且耗内存的调优工作在服务器上做，而在边缘端只需编译代价相对小得多的tvm runtime执行库。

raspberry pi rpc 部署

1、在raspberry pi上编译 rvm runtime（device端）

git clone --recursive https://github.com/apache/tvm
cd tvm
make runtime -j2

配置系统环境 python 接口

vim ~/.bashrc

在文件末尾添加如下内容：

export PYTHONPATH=$PYTHONPATH:~/tvm/python

使配置生效

source ~/.bashrc

2、在device端配置rpc服务

通过如下命在raspberry上开启rpc服务：

python -m tvm.exec.rpc_server --host ip --port=9090

将上述ip修改为raspberry的实际ip
【嵌入式AI】使用tvm交叉编译和RPC部署在raspberry pi和rk3399教程_raspberry pi

3、在本地服务器交叉编译 kernel

以下例子是tvm官网给出的例程：https://tvm.apache.org/docs/tutorials/get_started/cross_compilation_and_rpc.html#tutorial-cross-compilation-and-rpc

import numpy as np

import tvm
from tvm import te
from tvm import rpc
from tvm.contrib import utils

n = tvm.runtime.convert(1024)
A = te.placeholder((n,), name="A")
B = te.compute((n,), lambda i: A[i] + 1.0, name="B")
s = te.create_schedule(B.op)

# cross compile kernel
target = "llvm -mtriple=armv7l-linux-gnueabihf"
func = tvm.build(s, [A, B], target=target, name="add_one")
# save the lib at a local temp folder
temp = utils.tempdir()
path = temp.relpath("lib.tar")
func.export_library(path)

4、通过rpc在cpu端运行kernel

host = "10.77.1.162"  # 设置成自己raspberry的ip
port = 9090           # 设置端口
remote = rpc.connect(host, port)

# upload the lib to remote device, invoke a device local compiler to relink them
remote.upload(path)
func = remote.load_module("lib.tar")

# create arrays on the remote device
dev = remote.cpu()
a = tvm.nd.array(np.random.uniform(size=1024).astype(A.dtype), dev)
b = tvm.nd.array(np.zeros(1024, dtype=A.dtype), dev)
# the function will run on the remote device
func(a, b)
np.testing.assert_equal(b.asnumpy(), a.asnumpy() + 1)

time_f = func.time_evaluator(func.entry_name, dev, number=10)
cost = time_f(a, b).mean
print("%g secs/op" % cost)

5、输出信息

本地服务器输出：
【嵌入式AI】使用tvm交叉编译和RPC部署在raspberry pi和rk3399教程_raspberry pi_02

raspberry device 端输出：
【嵌入式AI】使用tvm交叉编译和RPC部署在raspberry pi和rk3399教程_raspberry pi_03

rk3399 rpc 部署

1、在 rk3399上编译 tvm runtime (device 端)

与前面raspberry上编译不同的，在rk3399上需要打开 opencl支持，我这里用的设备是 EAIDK610，搭载rk3399主控芯片。

cp cmake/config.cmake .
sed -i "s/USE_OPENCL OFF/USE_OPENCL ON/" config.cmake
make runtime -j16

其他环境配置参考 raspberry 上的配置。

2、在rk3399上通过tvm rpc执行

opencl_device_host = "10.77.1.145"   # 设置成自己raspberry的ip
opencl_device_port = 9090            # 设置端口
target = tvm.target.Target("opencl", host="llvm -mtriple=aarch64-linux-gnu")

# create schedule for the above "add one" compute declaration
s = te.create_schedule(B.op)
xo, xi = s[B].split(B.op.axis[0], factor=32)
s[B].bind(xo, te.thread_axis("blockIdx.x"))
s[B].bind(xi, te.thread_axis("threadIdx.x"))
func = tvm.build(s, [A, B], target=target)

remote = rpc.connect(opencl_device_host, opencl_device_port)

# export and upload
path = temp.relpath("lib_cl.tar")
func.export_library(path)
remote.upload(path)
func = remote.load_module("lib_cl.tar")

# run
dev = remote.cl()
a = tvm.nd.array(np.random.uniform(size=1024).astype(A.dtype), dev)
b = tvm.nd.array(np.zeros(1024, dtype=A.dtype), dev)
func(a, b)
np.testing.assert_equal(b.asnumpy(), a.asnumpy() + 1)
print("OpenCL test passed!")