使用tvm rpc模块可通过ip、端口远程在编译设备部署,把繁杂且耗内存的调优工作在服务器上做,而在边缘端只需编译代价相对小得多的tvm runtime执行库。

raspberry pi rpc 部署
1、在raspberry pi上编译 rvm runtime(device端)
git clone --recursive https://github.com/apache/tvm
cd tvm
make runtime -j2

​  配置系统环境 python 接口

vim ~/.bashrc

​  在文件末尾添加如下内容:

export PYTHONPATH=$PYTHONPATH:~/tvm/python

​  使配置生效

source ~/.bashrc

2、在device端配置rpc服务

​  通过如下命在raspberry上开启rpc服务:

python -m tvm.exec.rpc_server --host ip --port=9090

​  将上述ip修改为raspberry的实际ip
【嵌入式AI】使用tvm交叉编译和RPC部署在raspberry pi和rk3399教程_raspberry pi

3、在本地服务器交叉编译 kernel

​  以下例子是tvm官网给出的例程:https://tvm.apache.org/docs/tutorials/get_started/cross_compilation_and_rpc.html#tutorial-cross-compilation-and-rpc

import numpy as np

import tvm
from tvm import te
from tvm import rpc
from tvm.contrib import utils

n = tvm.runtime.convert(1024)
A = te.placeholder((n,), name="A")
B = te.compute((n,), lambda i: A[i] + 1.0, name="B")
s = te.create_schedule(B.op)

# cross compile kernel
target = "llvm -mtriple=armv7l-linux-gnueabihf"
func = tvm.build(s, [A, B], target=target, name="add_one")
# save the lib at a local temp folder
temp = utils.tempdir()
path = temp.relpath("lib.tar")
func.export_library(path)

4、通过rpc在cpu端运行kernel
host = "10.77.1.162"  # 设置成自己raspberry的ip
port = 9090           # 设置端口
remote = rpc.connect(host, port)

# upload the lib to remote device, invoke a device local compiler to relink them
remote.upload(path)
func = remote.load_module("lib.tar")

# create arrays on the remote device
dev = remote.cpu()
a = tvm.nd.array(np.random.uniform(size=1024).astype(A.dtype), dev)
b = tvm.nd.array(np.zeros(1024, dtype=A.dtype), dev)
# the function will run on the remote device
func(a, b)
np.testing.assert_equal(b.asnumpy(), a.asnumpy() + 1)

time_f = func.time_evaluator(func.entry_name, dev, number=10)
cost = time_f(a, b).mean
print("%g secs/op" % cost)
5、输出信息

​  本地服务器输出:
【嵌入式AI】使用tvm交叉编译和RPC部署在raspberry pi和rk3399教程_raspberry pi_02

​  raspberry device 端输出:
【嵌入式AI】使用tvm交叉编译和RPC部署在raspberry pi和rk3399教程_raspberry pi_03



rk3399 rpc 部署
1、在 rk3399上编译 tvm runtime (device 端)

​  与 前面raspberry上编译不同的,在rk3399上需要打开 opencl支持,我这里用的设备是 EAIDK610,搭载rk3399主控芯片。

cp cmake/config.cmake .
sed -i "s/USE_OPENCL OFF/USE_OPENCL ON/" config.cmake
make runtime -j16

​  其他环境配置参考 raspberry 上的配置。


2、在rk3399上通过tvm rpc执行
opencl_device_host = "10.77.1.145"   # 设置成自己raspberry的ip
opencl_device_port = 9090            # 设置端口
target = tvm.target.Target("opencl", host="llvm -mtriple=aarch64-linux-gnu")

# create schedule for the above "add one" compute declaration
s = te.create_schedule(B.op)
xo, xi = s[B].split(B.op.axis[0], factor=32)
s[B].bind(xo, te.thread_axis("blockIdx.x"))
s[B].bind(xi, te.thread_axis("threadIdx.x"))
func = tvm.build(s, [A, B], target=target)

remote = rpc.connect(opencl_device_host, opencl_device_port)

# export and upload
path = temp.relpath("lib_cl.tar")
func.export_library(path)
remote.upload(path)
func = remote.load_module("lib_cl.tar")

# run
dev = remote.cl()
a = tvm.nd.array(np.random.uniform(size=1024).astype(A.dtype), dev)
b = tvm.nd.array(np.zeros(1024, dtype=A.dtype), dev)
func(a, b)
np.testing.assert_equal(b.asnumpy(), a.asnumpy() + 1)
print("OpenCL test passed!")

3、输出信息

  本地服务器输出:
【嵌入式AI】使用tvm交叉编译和RPC部署在raspberry pi和rk3399教程_嵌入式_04

​  rk3399 device 端输出:
【嵌入式AI】使用tvm交叉编译和RPC部署在raspberry pi和rk3399教程_raspberry pi_05




扫描下方二维码即可关注我的微信公众号【极智视界】,获取更多实践项目资源和读书分享,让我们用极致+极客的心态来迎接AI !
【嵌入式AI】使用tvm交叉编译和RPC部署在raspberry pi和rk3399教程_深度学习_06