使用TensorRT Python API搭建yolov5网络

  • 网络总览
  • 创建网络定义对象
  • Backbone
  • Focus
  • CBL
  • CSP
  • Neck
  • PANet
  • Head
  • 附录
  • 参考


图1 YOLOv5s网络

注意: 本文以yolov5s-v5.0网络为基础,上图是yolov5s网络总体结构,仅作参考,实际结构以代码为准,存在少量差异。




import tensorrt as trt

TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
builder = trt.Builder(TRT_LOGGER)
network = builder.create_network()


with trt.Builder(TRT_LOGGER) as builder, builder.create_network() as network:

builder是构建器,他会自动搜索cuda内核目录以获得最快的可用实现,构建和运行时的GPU需要保持一致。由builder构建的引擎(engine)不能跨平台和TensorRT版本移植。上面由builder创建了一个空的网络结构,后面就需要通过tensorrt python api来逐层填充该网络结构,直至完整构建yolov5s-v4.0网络。



python tensorRT 设置 device 博客 tensorrt python api_yolo_02

图2 Focus结构



Focus结构的意义在于可以最大程度的减少信息损失而进行下采样操作。Focus结构中需要用到一个重要的tensorrt api就是add_slice,它用于创建一个Slice层。

def focus(network, weights, inp, inch, outch, ksize, lname):
    shape = trt.Dims3(inch, Yolo.INPUT_H//2, Yolo.INPUT_W//2)
    stride = trt.Dims3(1,2,2)
    s1 = network.add_slice(inp, trt.Dims3(0,0,0), shape, stride)
    s2 = network.add_slice(inp, trt.Dims3(0,1,0), shape, stride)
    s3 = network.add_slice(inp, trt.Dims3(0,0,1), shape, stride)
    s4 = network.add_slice(inp, trt.Dims3(0,1,1), shape, stride)
    input_tensors = [s1.get_output(0), s2.get_output(0), s3.get_output(0), s4.get_output(0)]
    cat = network.add_concatenation(input_tensors)  # 通道维度上的拼接
    conv = convBlock(network, weights, cat.get_output(0), outch, ksize, 1, 1, lname + ".conv")

    return conv



CBLConv+BN+Silu。注意,虽然上面的全局网络结构图中展示的CBL中的激活函数是LeakyRelu,但是在v4.0中激活函数是Silu(Sigmoid Weighted Linear Unit),是一种较为平滑的激活函数。

def convBlock(network, weights, inp, outch, ksize, s, g, lname):
    conv1_w = weights[lname + ".conv.weight"].numpy()
    conv1_b = trt.Weights(trt.float32)
    p = ksize//2
    conv1 = network.add_convolution_nd(inp, num_output_maps=outch, kernel_shape=trt.DimsHW(ksize, ksize), kernel=conv1_w, bias=conv1_b)
    assert conv1, "Add convolution_nd layer failed"
    conv1.stride_nd = trt.DimsHW(s, s)
    conv1.padding_nd = trt.DimsHW(p, p)
    conv1.num_groups = g
    bn1 = addBatchNorm2d(network, weights, conv1.get_output(0), lname+".bn", 1e-3)

    # silu = x * sigmoid
    sig = network.add_activation(bn1.get_output(0), trt.ActivationType.SIGMOID)
    assert sig, "Add activation layer failed"
    silu = network.add_elementwise(bn1.get_output(0), sig.get_output(0), trt.ElementWiseOperation.PROD)
    assert silu, "Add PROD layer failed"

    return silu


  1. 卷积层
    调用add_convolution_nd来创建一个新的卷积层。 因为没有bias一项,定义的bias为空Weights对象,可以使用其第一个重载函数__init__(self: tensorrt.tensorrt.Weights, type: tensorrt.tensorrt.DataType = DataType.FLOAT) -> None初始化一个空Weights。stride,padding,num_groups等参数通过IConvolutionLayer的内部成员变量来设置。
  2. BN层
    由于TensorRT并未提供BatchNorm层,但提供了更通用的Scale层。可以使用Scale层来实现BN层。详细过程参考笔者另一篇文章使用Python API实现TRT版BN/hswish/Silu等算子,这里不再赘述。
  3. 激活层
    同样,TensorRT中也没有直接提供Silu的api,通过add_activation配合add_elementwise中的乘操作可以轻松构建Silu。具体可参考笔者另一篇文章使用Python API实现TRT版BN/hswish/Silu等算子中的方法2



python tensorRT 设置 device 博客 tensorrt python api_yolo_03

图4 CSP

注意: 上图仅供参考,yolov5s-v5.0实际结构以代码为准,有差异。


def C3(network, weights, inp, c1, c2, n, shortcut, g, e, lname):
    c_ = int(float(c2)*e)  # e:expand param
    conv1 = convBlock(network, weights, inp, c_, 1, 1,1, lname+".cv1")
    conv2 = convBlock(network, weights, inp, c_, 1, 1,1, lname+".cv2")
    y1 = conv1.get_output(0)
    for i in range(n):
        b = bottleneck(network, weights, y1, c_, c_, shortcut, g, 1.0, lname + ".m." + str(i))
        y1 = b.get_output(0)

    input_tensors = [y1, conv2.get_output(0)]
    cat = network.add_concatenation(input_tensors)

    conv3 = convBlock(network, weights, cat.get_output(0), c2, 1,1,1, lname+".cv3")
    return conv3

width_128 = get_width(128, GW)  # =64
depth_3 = get_depth(3, GD)  # =1
# CSP:bottleneckCSP
c3_2 = C3(network, weights, conv1.get_output(0), width_128, width_128, depth_3, True, 1, 0.5, "model.2")


def get_width(x: int, gw: float, divisor: int=8 ):
    Using gw to control the number of kernels that must be multiples of 8.
    return math.ceil(x / divisor) * divisor
    if x*gw % divisor == 0:
        return int(x*gw)
    return (int(x*gw/divisor)+1)*divisor

def get_depth(x: int, gd: float):
    if x==1:
        return 1
        return round(x*gd) if round(x*gd) > 1 else 1


def bottleneck(network, weights, inp, c1: int, c2: int, shortcut: bool, g: int, e: int, lname: str):
    "Res Unit"
    conv1 = convBlock(network, weights, inp, int(float(c2)*e), 1,1,1, lname+".cv1")
    conv2 =convBlock(network, weights, conv1.get_output(0), c2, 3,1,g, lname+".cv2")
    if shortcut and c1 == c2:
        ew = network.add_elementwise(inp, conv2.get_output(0), op=trt.ElementWiseOperation.SUM)
        return ew
    return conv2


python tensorRT 设置 device 博客 tensorrt python api_yolo_04

图5 Bottleneck


python tensorRT 设置 device 博客 tensorrt python api_深度学习_05

图6 SPP

SPP(Spatial Pyramid Pooling) 原理如上图,feature maps 是经过三个pooling窗口(蓝色,青绿,银灰的窗口) 进行pooling,将分别得到的结果在channel维度进行concat。SPP可以增大感受野,有助于解决anchor和feature map的对齐问题。SPP这个结构就是通过不同kernel size的pooling抽取不同尺度特征,再进行叠加进行特征融合。

def SPP(network, weights, inp, c1, c2, k1, k2,k3, lname):
    c_ = c1//2
    conv1 = convBlock(network, weights, inp, c_, 1,1,1, lname+".cv1")
    pool1 = network.add_pooling_nd(conv1.get_output(0), trt.PoolingType.MAX, trt.DimsHW(k1,k1))
    pool1.padding_nd = trt.DimsHW(k1//2, k1//2)
    pool1.stride_nd = trt.DimsHW(1,1)
    pool2 = network.add_pooling_nd(conv1.get_output(0), trt.PoolingType.MAX, trt.DimsHW(k2, k2))
    pool2.padding_nd = trt.DimsHW(k2 // 2, k2 // 2)
    pool2.stride_nd = trt.DimsHW(1, 1)
    pool3 = network.add_pooling_nd(conv1.get_output(0), trt.PoolingType.MAX, trt.DimsHW(k3, k3))
    pool3.padding_nd = trt.DimsHW(k3 // 2, k3 // 2)
    pool3.stride_nd = trt.DimsHW(1, 1)

    input_tensors = [conv1.get_output(0), pool1.get_output(0), pool2.get_output(0), pool3.get_output(0)]
    cat = network.add_concatenation(input_tensors)

    conv2 = convBlock(network, weights, cat.get_output(0), c2, 1,1,1, lname+".cv2")
    return conv2

在YOLOv5里pooling的kernel size分别是1x1, 5x5, 9x9, 13x13。在SPP中首先通过一个1x1卷积将通道减半,再将结果做不同尺度的Pooling,最后将Pooling的结果和通道减半后的结果进行拼接,拼接后的feature map还要再经过一个CBL。YOLOv5s-v5.0的backbone部分核心代码如下:

focus0 = focus(network, weights, data, 3, get_width(64, GW), 3, "model.0")
width_128 = get_width(128, GW)  # =64
depth_3 = get_depth(3, GD)  # =1
conv1 = convBlock(network, weights, focus0.get_output(0), width_128, 3, 2, 1,"model.1")
# CSP1_1
c3_2 = C3(network, weights, conv1.get_output(0), width_128, width_128, depth_3, True, 1, 0.5, "model.2")
width_256 = get_width(256, GW)
depth_9 = get_depth(9, GD)
conv3 = convBlock(network, weights, c3_2.get_output(0), width_256, 3, 2, 1,"model.3")
# CSP1_3
c3_4 = C3(network, weights, conv3.get_output(0), width_256, width_256, depth_9, True, 1, 0.5, "model.4")
width_512 = get_width(512, GW)
conv5 = convBlock(network, weights, c3_4.get_output(0), width_512, 3, 2, 1, "model.5")
# CSP1_3
c3_6 = C3(network, weights, conv5.get_output(0), width_512, width_512, depth_9, True, 1, 0.5, "model.6")
width_1024 = get_width(1024, GW)
conv7 = convBlock(network, weights, c3_6.get_output(0), width_1024, 3, 2, 1, "model.7")
spp8 = SPP(network, weights, conv7.get_output(0), width_1024, width_1024, 5, 9, 13, "model.8")




PANetFPN的基础上增加了Bottom-up Path Augmentation,主要是考虑到网络的浅层特征中包含了大量的边缘形状等特征,他们对于实例分割这种像素级别的分类任务起到至关重要的作用。

python tensorRT 设置 device 博客 tensorrt python api_yolo_06

图7 PANet

上图红色的箭头表示在FPN中,因为要经过自底向上的过程,浅层特征传到顶层要经过几十甚至上百层网络,浅层信息丢失严重。绿色的箭头表示作者添加的Bottom-up Path Augmentation结构,这个结构本生不到十层。这样,浅层特征经过原始FPN中的横向连接到P2,然后再从P2Bottom-up Path Augmentation传到顶层,经过的层数很少,能较好的保存浅层特征。
注意: 这里的N2P2表示的是同一个特征图,而N3,N4,N5P3,P4,P5不一样,N3,N4,N5P3,P4,P5融合后的结果。


  1. 灰色区域表示第1个不同点,YOLOv5不仅利用CSP2_1结构代替部分CBL模块,而且去掉了下方的CBL模块;
  2. 绿色区域表示第2个不同点,YOLOv5不仅将Concat操作之后的CBL模块更换为CSP2_1模块,而且更换了另外一个CBL模块的位置;
  3. 蓝色区域表示第3个不同点,YOLOv5中将原始的CBL模块更换为CSP2_1模块。

python tensorRT 设置 device 博客 tensorrt python api_深度学习_07

图8 Neck

YOLOv5包含3个检测分支,分别在8x,16x,32x的特征图上,首先来使用tensort api来构造第一个分支的Neck部分。

python tensorRT 设置 device 博客 tensorrt python api_yolo_08

图9 Neck第一个分支

c3_9 = C3(network, weights, spp8.get_output(0), width_1024, width_1024, depth_3, False, 1, 0.5, "model.9")
conv10 = convBlock(network, weights, c3_9.get_output(0), width_512, 1,1,1, "model.10")

upsample11 = network.add_resize(conv10.get_output(0))
assert upsample11, "Add upsample11 failed"
upsample11.resize_mode = trt.ResizeMode.NEAREST
upsample11.shape = c3_6.get_output(0).shape

input_tensors12 = [upsample11.get_output(0), c3_6.get_output(0)]
cat12 = network.add_concatenation(input_tensors12)
c3_13 = C3(network, weights, cat12.get_output(0), width_1024, width_512, depth_3, False, 1, 0.5, "model.13")
conv14 = convBlock(network, weights, c3_13.get_output(0), width_256, 1, 1, 1, "model.14")

upsample15 = network.add_resize(conv14.get_output(0))
assert upsample15, "Add upsample15 failed"
upsample15.resize_mode = trt.ResizeMode.NEAREST
upsample15.shape = c3_4.get_output(0).shape

input_tensors16 = [upsample15.get_output(0), c3_4.get_output(0)]
cat16 = network.add_concatenation(input_tensors16)
c3_17 = C3(network, weights, cat16.get_output(0), width_512, width_256, depth_3, False, 1, 0.5, "model.17")


python tensorRT 设置 device 博客 tensorrt python api_tensorrt_09

图10 Upsample


python tensorRT 设置 device 博客 tensorrt python api_yolo_10

图11 Neck第二三分支

#The second branch
conv18 = convBlock(network, weights, c3_17.get_output(0), width_256, 3, 2, 1, "model.18")
input_tensors19 = [conv18.get_output(0), conv14.get_output(0)]
cat19 = network.add_concatenation(input_tensors19)
c3_20 = C3(network, weights, cat19.get_output(0), width_512, width_512, depth_3, False, 1, 0.5, "model.20")

#The third branch
conv21 = convBlock(network, weights, c3_20.get_output(0), width_512, 3, 2, 1, "model.21")
input_tensors22 = [conv21.get_output(0), conv10.get_output(0)]
cat22 = network.add_concatenation(input_tensors22)
c3_23 = C3(network, weights, cat22.get_output(0), width_1024, width_1024, depth_3, False, 1, 0.5, "model.23")


python tensorRT 设置 device 博客 tensorrt python api_python_11
其中,BS是Batch Size,255的计算方式为[na * (nc + 1 + 4)],具体参数

  • na(number of anchor) 为每组 anchor 的尺度数量(YOLOv5中一共有 3 组anchor,每组有3个尺度);
  • nc 为number of class (coco的class 为80);
  • 1 为前景背景的置信度score;
  • 4 为中心点坐标和宽高;


python tensorRT 设置 device 博客 tensorrt python api_ide_12

图12 Head

det0 = network.add_convolution_nd(c3_17.get_output(0), 3 * (CLASS_NUM + 5), trt.DimsHW(1, 1), weights["model.24.m.0.weight"], weights["model.24.m.0.bias"])
det1 = network.add_convolution_nd(c3_20.get_output(0), 3 * (CLASS_NUM + 5), trt.DimsHW(1, 1), weights["model.24.m.1.weight"], weights["model.24.m.1.bias"])
det2 = network.add_convolution_nd(c3_23.get_output(0), 3 * (CLASS_NUM + 5), trt.DimsHW(1, 1), weights["model.24.m.2.weight"], weights["model.24.m.2.bias"])



注意: 与本文YOLOv5-v5.0有些许差别。

from  n    params  module                                  arguments                       layer            cin    cout
  0                -1  1      3520  models.common.Focus                     [3, 32, 3]                    	Focus          	   3	  32
  1                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]                	Conv           	  32	  64
  2                -1  1     19904  models.common.BottleneckCSP             [64, 64, 1]                   	BottleneckCSP  	  64	  64
  3                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               	Conv           	  64	 128
  4                -1  1    161152  models.common.BottleneckCSP             [128, 128, 3]                 	BottleneckCSP  	 128	 128
  5                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              	Conv           	 128	 256
  6                -1  1    641792  models.common.BottleneckCSP             [256, 256, 3]                 	BottleneckCSP  	 256	 256
  7                -1  1   1180672  models.common.Conv                      [256, 512, 3, 2]              	Conv           	 256	 512
  8                -1  1    656896  models.common.SPP                       [512, 512, [5, 9, 13]]        	SPP            	 512	 512
  9                -1  1   1248768  models.common.BottleneckCSP             [512, 512, 1, False]          	BottleneckCSP  	 512	 512
 10                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]              	Conv           	 512	 256
 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          	Upsample       	 512	 256
 12           [-1, 6]  1         0  models.common.Concat                    [1]                           	Concat         	 512	 512
 13                -1  1    378624  models.common.BottleneckCSP             [512, 256, 1, False]          	BottleneckCSP  	 512	 256
 14                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              	Conv           	 256	 128
 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          	Upsample       	 256	 128
 16           [-1, 4]  1         0  models.common.Concat                    [1]                           	Concat         	 256	 256
 17                -1  1     95104  models.common.BottleneckCSP             [256, 128, 1, False]          	BottleneckCSP  	 256	 128
 18                -1  1      2322  torch.nn.modules.conv.Conv2d            [128, 18, 1, 1]               	Conv2d         	 128	 255
 19                -2  1    147712  models.common.Conv                      [128, 128, 3, 2]              	Conv           	 128	 128
 20          [-1, 14]  1         0  models.common.Concat                    [1]                           	Concat         	 128	 256
 21                -1  1    313088  models.common.BottleneckCSP             [256, 256, 1, False]          	BottleneckCSP  	 256	 256
 22                -1  1      4626  torch.nn.modules.conv.Conv2d            [256, 18, 1, 1]               	Conv2d         	 256	 255
 23                -2  1    590336  models.common.Conv                      [256, 256, 3, 2]              	Conv           	 256	 256
 24          [-1, 10]  1         0  models.common.Concat                    [1]                           	Concat         	 256	 512
 25                -1  1   1248768  models.common.BottleneckCSP             [512, 512, 1, False]          	BottleneckCSP  	 512	 512
 26                -1  1      9234  torch.nn.modules.conv.Conv2d            [512, 18, 1, 1]               	Conv2d         	 512	 255
 27      [-1, 22, 18]  1         0  Detect                                  [1, anchors						Detect         	 512	 255

python tensorRT 设置 device 博客 tensorrt python api_ide_13

附录1 YOLOv5s可视化图

python tensorRT 设置 device 博客 tensorrt python api_python_14

附录2 YOLOv5s归纳整理