前言

最近接到一个项目,希望使用手机摄像头对图像数据进行采集,并使用训练好的模型对图像数据进行检测,主要任务包括:

  1. 寻找一个轻量的检测模型,以方便集成到安卓应用中
  2. 使用自己的检测数据集对模型进行训练
  3. 探索模型集成到安卓应用中的方式

由于需要轻量模型,这里很自然的想到最近比较热的yolov5s模型。兔丁哥由于刚刚接触深度学习,比较喜欢简单容易上手的pytorch,而yolov5正好有pytorch版。但众所周知的是,pytorch在集成上并不如tensorflow方便高效,这也导致了pytorch在工业界不如tensorflow,这次项目让我对这一块深有体会。接下来我会从数据采集和处理,模型训练,模型导出,安卓集成等方面来介绍这个项目,由于内容比较多,将分成三篇文章对该项目进行讲解,分别是:

  1. 【Yolov5】训练yolov5模型并集成到安卓应用中(上)——模型训练
  2. 【Yolov5】训练yolov5模型并集成到安卓应用中(中)——模型转化
  3. 【Yolov5】训练yolov5模型并集成到安卓应用中(下)——模型集成

本文是该系列文章的第二篇,将使用TorchScript将训练好的模型转化为pt文件,供java调用。

TorchScript简介

老实说,兔丁哥因为刚接触深度学习不久,深度学习依然还属于探索学习的阶段,对TorchScript也是不太了解。网上对其解释如下:

TorchScript 是 PyTorch 模型(nn.Module的子类)的中间表示形式,可以在高性能环境(例如 C ++)中运行。

简言之,就是通过TorchScript可以在C/C++/JAVA环境中调用Pytorch模型。
暂时了解到的TorchScript提供了两种转化方式,分别是torch.jit.tracetorch.jit.script。而官方文档提供的用例,采用的就是torch.jit.trace的转化方式,这个地方确实遇到了很多小问题。先放上官方转化模型的用例:

import torch
import torchvision

model = torchvision.models.resnet18(pretrained=True)
model.eval()
example = torch.rand(1, 3, 224, 224)
traced_script_module = torch.jit.trace(model, example)
traced_script_module.save("app/src/main/assets/model.pt")

看起来很简单,使用起来也确实这么简单。刚开始我一直不知道example在程序中起到的作用是什么,直到我试图将某个Pytorch函数编译为pt文件时才发现坑也在example中。总体来说,jit的限制如下:

jit的限制

  1. 默认类型为tensor。如下面这个例子:
@torch.jit.script
def test(example):
    temp = []
    return temp.append(example)
    
test.code

jit.code可以展示编译后的代码,显示如下:

def test(example: Tensor) -> List[Tensor]:
    temp = annotate(List[Tensor], [])
    return torch.append(temp, example)

可以看到,输入被指定为Tensor类型,list中的元素也默认为Tensor类型,因此,该函数仅仅只接受Tensor类型的输入,如:

test(torch.tensor([1,2,3]))  # 运行正常
test(1) # 报错
# 提示:test() Expected a value of type 'Tensor (inferred)' for argument 'example' but instead found type 'int'.

jit编译的每个变量都要求有固定的类型,而不像python那么灵活,但默认为Tensor类型确实让人头疼,比如我仅仅想对int进行操作,该如何呢?办法也是有的,只需要指定类型即可,如:

from typing import List

@torch.jit.script
def test(example:int):
    temp:List[int] = []
    return temp.append(example)
    
test.code

编译后的代码为:

def test(example: int) -> List[int]:
    temp = annotate(List[int], [])
    return torch.append(temp, example)

可以看到,类型已经被更改,但也仅能使用int类型作为输入。

  1. 要求保持list类型的统一。如:
@torch.jit.script
def test(example):
    temp = [1]
    return temp.append(example)

编译报错:
Could not match type Tensor (inferred) to t in argument 'el': Type variable 't' previously matched to type int is matched to type Tensor (inferred).

  1. 不支持第三方库的函数。
  2. 当然还有其他限制,只是暂时还未遇到,这篇文章对更多的限制进行了说明。

jit.trace的限制

在上面限制的基础上,torch.jit.trace受到更加严格的限制,最主要的限制就是:在转化过程中严重依赖example

  1. 输入仅仅只能是与tensor相关的类型。如:
def test(example):
    return example + 2
jit = torch.jit.trace(test, 1)

报错:TypeError: 'int' object is not iterable

def test(example):
    return example + 2
jit = torch.jit.trace(test, [1])

报错:RuntimeError: Type 'Tuple[int]' cannot be traced. Only Tensors and (possibly nested) Lists, Dicts, and Tuples of Tensors can be traced

  1. 不支持条件和循环。虽然其能编译通过,但通过查看编译后的源码发现,其条件和循环会受到example的影响。如循环5次,会转化为执行5次循环体;条件语句,会只保留为True的语句。看下面的例子:
example = torch.tensor([1, 2, 3, 4, 5])

def test(example):
    result = 0
    if 10 > example.shape[0] > 0:
        for i in range(example.shape[0]):
            result += example[i]
    elif example.shape[0] >= 10:
        for i in range(example.shape[0]):
            result += example[i] * 2
    return result

jit = torch.jit.trace(test, example)
jit.code

这个程序很简单,输入一个数组,如果数组元素的个数小于10个,就累加元素,如果元素的个数大于等于10个,则累加二倍的元素。编译后的代码如下:

def test(example: Tensor) -> Tensor:
    result = torch.add(torch.select(example, 0, 0), CONSTANTS.c0, alpha=1)
    result0 = torch.add_(result, torch.select(example, 0, 1), alpha=1)
    result1 = torch.add_(result0, torch.select(example, 0, 2), alpha=1)
    result2 = torch.add_(result1, torch.select(example, 0, 3), alpha=1)
    _0 = torch.add_(result2, torch.select(example, 0, 4), alpha=1)
    return _0

可以发现编译后的程序丢了条件和循环,只剩下前五个元素的累加。因此如果程序中包含条件和循环,应该避免使用torch.jit.trace的转化方式。

  1. 当使用赋值的方式进行原址计算时,需要注意大小可能会被定死。可以看下面的例子:
example = torch.tensor([[1, 2, 3, 4, 5], [1, 2, 3, 4, 5]])

def test(example):
	y = example
	y[0] =  y[0] + 4
	return y
	
jit = torch.jit.trace(test, example)
jit.code

这个程序也很简单,将example第一行的元素加4,仅此而已,转化后的代码如下:

def test(example: Tensor) -> Tensor:
    _0 = torch.add(torch.select(example, 0, 0), CONSTANTS.c0, alpha=1)
    _1 = torch.copy_(torch.select(example, 0, 0), torch.view(_0, [5]), False)
    return example

问题出在了第3行torch.view(_0, [5])[5],这个位置被写死,会导致你的输入只能是5列的tensor。如:

jit(torch.tensor([[1, 1, 1, 1, 1], [2, 2, 2, 2, 2]])) # 运行正常
jit(torch.tensor([[1, 2, 3, 4, 5, 6, 7], [1, 2, 3, 4, 5, 6, 7]])) # 报错
# 提示:RuntimeError: shape '[5]' is invalid for input of size 7

当然解决的办法也是有的,除了可以使用@torch.jit.script外,还可以使用Pytorch自带的原址计算函数去修改,即将代码修改为如下:

example = torch.tensor([[1, 2, 3, 4, 5], [1, 2, 3, 4, 5]])

def test(example):
	y = example
	y[0].add_(4)
	return y
	
jit = torch.jit.trace(test, example)
jit.code

这里使用了torch.add_()进行原址修改,很好避免了这个问题,但一定程度上使代码不易阅读,输出的编译代码如下:

def test(example: Tensor) -> Tensor:
    _0 = torch.add_(torch.select(example, 0, 0), CONSTANTS.c0, alpha=1)
    return example

其实上面提到的仅仅只是冰山一角,具体还有哪些限制,我还在探索,并在未来慢慢补充。就比如我接下来要对模型进行转化,就遇到了很多未知的错误,当时只顾着埋头解决,没有及时记录下这些错误。

模型转化

这一块是最麻烦的,模型转化仅仅转的是模型,但其实还有一堆的数据处理部分是有用的,我们不仅需要知道模型怎么跑,还需要知道模型接受一个怎样的输入,又对模型输出的结果进行了怎样的处理,才能得到我们最终的结果。

数据预处理代码提取

这一部分需要去detect.pydatasets.py中去寻找,本文不对代码进行讲解,因此这里只附上代码提取的结果,主要是对图片进行等比例缩放,并添加一个边框:

def imageProcessing(filename, new_shape = (320, 320), color = (114, 114, 114), gray = False):
    img = cv2.imread(filename)

    shape = img.shape[:2]  # current shape [height, width]
    if isinstance(new_shape, int):
        new_shape = (new_shape, new_shape)

    # Scale ratio (new / old)
    r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])

    new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
    dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]  # wh padding

    dw, dh = np.mod(dw, 64), np.mod(dh, 64)  # wh padding

    dw /= 2  # divide padding into 2 sides
    dh /= 2

    if shape[::-1] != new_unpad:  # resize
        img = cv2.resize(img, new_unpad, interpolation=cv2.INTER_LINEAR)
    top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
    left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
    img = cv2.copyMakeBorder(img, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)  # add border

    # Convert
    if gray:
        img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    else:
        img = img[:, :, ::-1].transpose(2, 0, 1)  # BGR to RGB, to 3x416x416

    img = img.astype(np.float)

    img /= 255.0  # 0 - 255 to 0.0 - 1.0

    if img.ndim == 3:
        img = img[np.newaxis,:,:,:]
    elif img.ndim == 2:
        img = img[np.newaxis,np.newaxis,:,:]
    
    img = np.ascontiguousarray(img)

    return img

模型提取和转换

如果不了解yolov5代码的原理,仅仅只是按照官方文档的教程对之前生成的best.pt进行转化,那么将会出现各种各样的问题。不过好在yolov5中提供了一个导出脚本export.py,接下来我们将对其精简:
为了确保转化前后模型输出能够保持一直,我们需要在模型转化前后对图片进行测试。

# 初始化变量
new_shape = (320, 320)
color = (114, 114, 114)
imgPath = 'testImage/1.jpg'   # 准备一张图片
gray = False

# 载入图片
img = imageProcessing(imgPath, new_shape = new_shape, color = color, gray = gray)
img = torch.from_numpy(img)
img = img.float()

# 载入模型
modeldata = torch.load("best.pt", map_location=torch.device('cpu'))
model = modeldata['model'].float().eval()
model = model.float()

# 转化前模型对图片进行处理
pred = model.forward(img)
pred # 输出pred信息,

输出的结果如下:

(tensor([[[3.94224e+01, 1.83147e+01, 6.17124e+01, 5.91857e+01, 3.17744e-07, 9.69862e-01],
          [5.21751e+01, 2.77050e+01, 6.34605e+01, 7.80965e+01, 1.48335e-06, 9.69346e-01],
          [7.57904e+01, 2.28931e+01, 7.32974e+01, 6.84164e+01, 1.17378e-06, 9.70913e-01],
          ...,
          [2.39012e+02, 3.14549e+02, 2.71791e+01, 2.62056e+01, 9.96026e-06, 9.87485e-01],
          [2.45330e+02, 3.11950e+02, 1.63809e+01, 2.46385e+01, 5.99965e-06, 9.84169e-01],
          [2.51161e+02, 3.11525e+02, 9.34076e+00, 1.88540e+01, 9.93162e-07, 9.83997e-01]]]),
 [tensor([[[[[ 1.86583e+00,  1.44920e-01, -5.55053e-01, -3.82730e-01, -1.49620e+01,  3.47138e+00],
             [ 2.62442e-01,  7.67090e-01, -5.32980e-01, -1.37165e-01, -1.34212e+01,  3.45387e+00],
             [-2.64633e-01,  4.37677e-01, -4.16091e-01, -2.57647e-01, -1.36553e+01,  3.50796e+00],
             ...,
             [ 3.86080e-01,  2.91431e-01, -3.47019e-01, -4.21045e-01, -1.32108e+01,  3.30868e+00],
             [-8.95322e-02,  5.37793e-01, -4.54257e-01, -2.35619e-01, -1.39945e+01,  3.30029e+00],
             [-1.01065e+00,  6.59138e-01, -6.30642e-01, -2.95933e-01, -1.54304e+01,  3.41460e+00]],
             ......省略后面的一堆东西

开始模型转化

# 模型编译
model.model[-1].export = True
traced_script_module = torch.jit.trace(model, torch.rand(1, 3, new_shape[0], new_shape[1]))
# 保存模型为pt文件
traced_script_module.save('best_torchscript.pt')
# 载入pt文件
jitModel = torch.jit.load('best_torchscript.pt')
# 转化后的图片处理
jitPre = jitModel.forward(img)
jitPre # 输出处理结果

输出的结果如下:

[tensor([[[[[ 1.86583e+00,  1.44920e-01, -5.55053e-01, -3.82730e-01, -1.49620e+01,  3.47138e+00],
             [ 2.62442e-01,  7.67090e-01, -5.32980e-01, -1.37165e-01, -1.34212e+01,  3.45387e+00],
             [-2.64633e-01,  4.37677e-01, -4.16091e-01, -2.57647e-01, -1.36553e+01,  3.50796e+00],
             ...,
             [ 3.86080e-01,  2.91431e-01, -3.47019e-01, -4.21045e-01, -1.32108e+01,  3.30868e+00],
             [-8.95322e-02,  5.37793e-01, -4.54257e-01, -2.35619e-01, -1.39945e+01,  3.30029e+00],
             [-1.01065e+00,  6.59138e-01, -6.30642e-01, -2.95933e-01, -1.54304e+01,  3.41460e+00]],
             ......省略后面的一堆东西

对比两个结果,你会发现jitPre的结果丢了前面一部分数据,通过查看源码发现,这部分数据才是最重要数据,最直接能反应结果的数据。为什么会跑出不一样的结果呢?我们看到有这么一句model.model[-1].export = True,其实正是因为这句造成的,但是当改为False,虽然能得到想要的结果,但却无法编译为TorchScript。研究yolo.py源码发现,其实是因为在模型中添加了一些结果处理的内容造成的。代码如下所示:

class Detect(nn.Module):
    def __init__(self, nc=80, anchors=(), ch=()):  # detection layer
        super(Detect, self).__init__()
        self.stride = None  # strides computed during build
        self.nc = nc  # number of classes
        self.no = nc + 5  # number of outputs per anchor
        self.nl = len(anchors)  # number of detection layers
        self.na = len(anchors[0]) // 2  # number of anchors
        self.grid = [torch.zeros(1)] * self.nl  # init grid
        a = torch.tensor(anchors).float().view(self.nl, -1, 2)
        self.register_buffer('anchors', a)  # shape(nl,na,2)
        self.register_buffer('anchor_grid', a.clone().view(self.nl, 1, -1, 1, 1, 2))  # shape(nl,1,na,1,1,2)
        self.m = nn.ModuleList(nn.Conv2d(x, self.no * self.na, 1) for x in ch)  # output conv
        self.export = False  # onnx export

    def forward(self, x):
        # x = x.copy()  # for profiling
        z = []  # inference output
        self.training |= self.export
        for i in range(self.nl):
            x[i] = self.m[i](x[i])  # conv
            bs, _, ny, nx = x[i].shape  # x(bs,255,20,20) to x(bs,3,20,20,85)
            x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()
			# 下面这几行代码即为不能被编译的代码
            if not self.training:  # inference
                if self.grid[i].shape[2:4] != x[i].shape[2:4]:
                    self.grid[i] = self._make_grid(nx, ny).to(x[i].device)

                y = x[i].sigmoid()
                y[..., 0:2] = (y[..., 0:2] * 2. - 0.5 + self.grid[i].to(x[i].device)) * self.stride[i]  # xy
                y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i]  # wh
                z.append(y.view(bs, -1, self.no))

        return x if self.training else (torch.cat(z, 1), x)

至于为何编译不通过,这个原因还没找到,我的做法是将其与模型分开作为结果处理部分。

结果预处理的提取与转化

这里将结果处理分成了两大模块,一是结果预处理,主要包括筛选边框,二是结果的最终处理,主要是使用openc根据结果绘制边框等。这一节先提取结果预处理部分。
结果预处理也包括两部分,一是在上节中提到的写在模型中的,用于过滤无用数据的预处理,二是使用非极大值抑制提取主要边框的预处理。

过滤无用数据

下面的代码从yolo.py的源码中提取

# 从model中提取部分变量
stride, anchor_grid, grid, no, _make_grid = model.model[-1].stride, model.model[-1].anchor_grid, model.model[-1].grid, model.model[-1].no, model.model[-1]._make_grid

# 待转化的函数
def _processing(x, stride, anchor_grid, grid):
    bs, na, ny, nx, _ = x.shape
    if grid.shape[2:4] != x.shape[2:4]:
        grid = _make_grid(nx, ny)
    y = x.sigmoid()
    y[..., 0:2].mul_(2.).sub_(0.5).add_(grid).mul_(stride)
    # y[..., 0:2] = (y[..., 0:2] * 2. - 0.5 + grid) * stride
    y[..., 2:4].mul_(2.).pow_(2).mul_(anchor_grid)
    # y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * anchor_grid
    return y.view(bs, -1, no)

def resultsProcessing(x, y, z):
    result = []
    result.append(_processing(x, stride[0], anchor_grid[0], grid[0]))
    result.append(_processing(y, stride[1], anchor_grid[1], grid[1]))
    result.append(_processing(z, stride[2], anchor_grid[2], grid[2]))
    return torch.cat(result, 1)

# 使用trace进行转化
example = torch.rand(1, 3, new_shape[0], new_shape[1], 6)
traced_script_method = torch.jit.trace(resultsProcessing, (example, example, example))
traced_script_method.save('best_resultsProcessing.pt')

# 验证结果
jitResultsProcessing = torch.jit.load('best_resultsProcessing.pt')
jitRes = jitResultsProcessing.forward(jitPre[0],jitPre[1],jitPre[2])
jitRes

这里不对源代码进行深度讲解(其实兔丁哥也没研究得太深),仅仅把主要的代码进行提取,此时你会发现,能够得到自己想要的结果了,如下所示:

tensor([[[3.94224e+01, 1.83147e+01, 6.17124e+01, 5.91857e+01, 3.17744e-07, 9.69862e-01],
          [5.21751e+01, 2.77050e+01, 6.34605e+01, 7.80965e+01, 1.48335e-06, 9.69346e-01],
          [7.57904e+01, 2.28931e+01, 7.32974e+01, 6.84164e+01, 1.17378e-06, 9.70913e-01],
          ...,
          [2.39012e+02, 3.14549e+02, 2.71791e+01, 2.62056e+01, 9.96026e-06, 9.87485e-01],
          [2.45330e+02, 3.11950e+02, 1.63809e+01, 2.46385e+01, 5.99965e-06, 9.84169e-01],
          [2.51161e+02, 3.11525e+02, 9.34076e+00, 1.88540e+01, 9.93162e-07, 9.83997e-01]]]
筛选主要边框

下面的代码从detect.pygeneral.py中提取:

# 提取函数及转化为TorchScript
def xywh2xyxy(x):
    y = torch.zeros_like(x)
    y[:, 0].copy_( x[:, 0] - x[:, 2] / 2 )  # top left x
    y[:, 1].copy_( x[:, 1] - x[:, 3] / 2 )  # top left y
    y[:, 2].copy_( x[:, 0] + x[:, 2] / 2 )  # bottom right x
    y[:, 3].copy_( x[:, 1] + x[:, 3] / 2 )  # bottom right y
    return y

@torch.jit.script
def non_max_suppression(prediction):
    conf_thres = 0.4
    iou_thres = 0.5
    max_det = 300

    if prediction.dtype is torch.float16:
        prediction = prediction.float()  # to FP32

    nc = prediction[0].shape[1] - 5  # number of classes
    xc = prediction[..., 4] > conf_thres  # candidates

    output = None
    for xi, x in enumerate(prediction):  # image index, image inference

        x = x[xc[xi]]  # confidence

        # If none remain process next image
        if not x.shape[0]:
            continue

        # Compute conf
        x[:, 5:].mul_( x[:, 4:5] )  # conf = obj_conf * cls_conf

        # Box (center x, center y, width, height) to (x1, y1, x2, y2)
        box = xywh2xyxy(x[:, :4])

        conf, j = x[:, 5:].max(1, keepdim=True)
        x = torch.cat((box, conf, j.float()), 1)[conf.view(-1) > conf_thres]

        # If none remain process next image
        if not x.shape[0]:  # number of boxes
            continue

        # Picked bounding boxes
        indexs = torch.zeros(x.shape[0])

        boxes, scores = x[:, :4], x[:, 4]  # boxes (offset by class), scores
        # i = torchvision.ops.boxes.nms(boxes, scores, iou_thres)

        # Sort by confidence score of bounding boxes
        order = scores.argsort()


        start_x, start_y, end_x, end_y = boxes[:,0], boxes[:,1], boxes[:,2], boxes[:,3]
        # Compute areas of bounding boxes
        areas = (end_x - start_x + 1) * (end_y - start_y + 1)

        # print(order)
        # print(order.shape)

        # Iterate bounding boxes
        while order.shape[0] > 0:
            # The index of largest confidence score
            index = order[-1]

            # Pick the bounding box with largest confidence score
            indexs[index] = 1

            x1 = torch.max(start_x[index], start_x[order[:-1]])
            x2 = torch.min(end_x[index], end_x[order[:-1]])
            y1 = torch.max(start_y[index], start_y[order[:-1]])
            y2 = torch.min(end_y[index], end_y[order[:-1]])

            # Compute areas of intersection-over-union
            w = torch.max(torch.tensor(0.0), x2 - x1 + 1)
            h = torch.max(torch.tensor(0.0), y2 - y1 + 1)
            intersection = w * h

            # Compute the ratio between intersection and union
            ratio = intersection / (areas[index] + areas[order[:-1]] - intersection)

            left = torch.where(ratio < iou_thres)
            # print(left, ratio, iou_thres)
            order = order[left[0]]

        # if len(i) > max_det:  # limit detections
        #     i = i[:max_det]

        return x[indexs == 1]

# 保存为pt文件
non_max_suppression.save('filter.pt')

# 验证模型
non_max_suppression(jitRes)

输出结果为:

tensor([[ 31.0141, 159.4193, 361.3859, 515.5379,   0.8935,   0.0000]])

其中前四个表示边框的坐标,第五个为分数,第六个为类别。
这里需要说明的是极大值抑制在torchvision有提供现成的函数torchvision.ops.boxes.nms,也能顺利编译成TorchScript,但在集成到安卓应用时,提示不支持该函数,按我的理解可能是安卓的pytorch组件还尚不支持这个函数,没有办法,只能自己手写非极大值抑制了。

结果处理

其实写到上面的模型转化本文就算完毕了,这部分与开头的数据预处理由于包含opencv的相关操作,并不支持TorchScript的转化。因此将写在下一篇文章中,请敬请期待。

模型合并

做完前面的内容,你将得到三个文件,分别是:

  1. best_torchscript.pt:网络模型
  2. best_resultsProcessing.pt:过滤无用数据
  3. filter.pt:筛选主要边框

如果你忍得了在安卓中同时载入三个文件,修改模型后同时更换三个文件,那么可以选择不看下面的内容。但对于我这种强迫症,哪里容忍得了因为知识匮乏造成的麻烦?不行得继续研究。
下面的代码通过一个类将上面所有的函数和模型都封装到了一起,使得保存的文件合并了前面三个文件:

class Yolov5Model(torch.jit.ScriptModule):
    def __init__(self, model, stride, anchor_grid, grid, no):
        super(Yolov5Model, self).__init__()
        self.model = model
        self.stride, self.anchor_grid, self.grid, self.no = stride, anchor_grid, grid, no

    @torch.jit.script_method
    def forward(self, img):
        preResult = self.model.forward(img)
        boxs = self.resultsProcessing(preResult[0],preResult[1],preResult[2])
        filterResult = self.non_max_suppression(boxs)
        return filterResult

    @torch.jit.script_method
    def _make_grid(self, nx:int=20, ny:int=20):
        yv, xv = torch.meshgrid([torch.arange(ny), torch.arange(nx)])
        return torch.stack((xv, yv), 2).view((1, 1, ny, nx, 2)).float()

    @torch.jit.script_method
    def _processing(self, x, stride, anchor_grid, grid):
        bs, na, ny, nx, _ = x.shape
        if grid.shape[2:4] != x.shape[2:4]:
            grid = self._make_grid(nx, ny)
        y = x.sigmoid()
        y[..., 0:2].mul_(2.).sub_(0.5).add_(grid).mul_(stride)
        # y[..., 0:2] = (y[..., 0:2] * 2. - 0.5 + grid) * stride
        y[..., 2:4].mul_(2.).pow_(2).mul_(anchor_grid)
        # y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * anchor_grid
        return y.view(bs, -1, self.no)

    @torch.jit.script_method
    def resultsProcessing(self, x, y, z):
        result = []
        result.append(self._processing(x, self.stride[0], self.anchor_grid[0], self.grid[0]))
        result.append(self._processing(y, self.stride[1], self.anchor_grid[1], self.grid[1]))
        result.append(self._processing(z, self.stride[2], self.anchor_grid[2], self.grid[2]))
        return torch.cat(result, 1)

    @torch.jit.script_method
    def xywh2xyxy(self, x):
        y = torch.zeros_like(x)
        y[:, 0].copy_( x[:, 0] - x[:, 2] / 2 )  # top left x
        y[:, 1].copy_( x[:, 1] - x[:, 3] / 2 )  # top left y
        y[:, 2].copy_( x[:, 0] + x[:, 2] / 2 )  # bottom right x
        y[:, 3].copy_( x[:, 1] + x[:, 3] / 2 )  # bottom right y
        return y

    @torch.jit.script_method
    def non_max_suppression(self, prediction):
        conf_thres = 0.4
        iou_thres = 0.5
        max_det = 300

        if prediction.dtype is torch.float16:
            prediction = prediction.float()  # to FP32

        nc = prediction[0].shape[1] - 5  # number of classes
        xc = prediction[..., 4] > conf_thres  # candidates

        output = None
        for xi, x in enumerate(prediction):  # image index, image inference

            x = x[xc[xi]]  # confidence

            # If none remain process next image
            if not x.shape[0]:
                continue

            # Compute conf
            x[:, 5:].mul_( x[:, 4:5] )  # conf = obj_conf * cls_conf

            # Box (center x, center y, width, height) to (x1, y1, x2, y2)
            box = self.xywh2xyxy(x[:, :4])

            conf, j = x[:, 5:].max(1, keepdim=True)
            x = torch.cat((box, conf, j.float()), 1)[conf.view(-1) > conf_thres]

            # If none remain process next image
            if not x.shape[0]:  # number of boxes
                continue

            # Picked bounding boxes
            indexs = torch.zeros(x.shape[0])

            boxes, scores = x[:, :4], x[:, 4]  # boxes (offset by class), scores
            # i = torchvision.ops.boxes.nms(boxes, scores, iou_thres)

            # Sort by confidence score of bounding boxes
            order = scores.argsort()


            start_x, start_y, end_x, end_y = boxes[:,0], boxes[:,1], boxes[:,2], boxes[:,3]
            # Compute areas of bounding boxes
            areas = (end_x - start_x + 1) * (end_y - start_y + 1)

            # print(order)
            # print(order.shape)

            # Iterate bounding boxes
            while order.shape[0] > 0:
                # The index of largest confidence score
                index = order[-1]

                # Pick the bounding box with largest confidence score
                indexs[index] = 1

                x1 = torch.max(start_x[index], start_x[order[:-1]])
                x2 = torch.min(end_x[index], end_x[order[:-1]])
                y1 = torch.max(start_y[index], start_y[order[:-1]])
                y2 = torch.min(end_y[index], end_y[order[:-1]])

                # Compute areas of intersection-over-union
                w = torch.max(torch.tensor(0.0), x2 - x1 + 1)
                h = torch.max(torch.tensor(0.0), y2 - y1 + 1)
                intersection = w * h

                # Compute the ratio between intersection and union
                ratio = intersection / (areas[index] + areas[order[:-1]] - intersection)

                left = torch.where(ratio < iou_thres)
                # print(left, ratio, iou_thres)
                order = order[left[0]]

            # if len(i) > max_det:  # limit detections
            #     i = i[:max_det]

            return x[indexs == 1]

然后对其进行转化和验证

# 载入编译好的模型
jitModel = torch.jit.load('best_torchscript.pt')

# 载入相关参数
stride, anchor_grid, grid, no, _make_grid = model.model[-1].stride, model.model[-1].anchor_grid, model.model[-1].grid, model.model[-1].no, model.model[-1]._make_grid

# 初始化模型并编译
completeModel = Yolov5Model(jitModel,stride, anchor_grid, grid, no)

# 保存模型
completeModel.save('best_Complete.pt')

# 验证结果
loadModel = torch.jit.load('best_Complete.pt')
loadModel.forward(img)

不出意外,将得到与之前相同的结果。此时一个best_Complete.pt文件就可以完成前面三个模型的操作了。
这里的模型需要先用torch.jit.trace先编译后再加载进来,可能是知识不够造成的,希望懂这些东西的大佬能交流一下。

总结

写了接近2万个字符终于到了总结部分了。这一篇文章算是把我几个星期所学的关于TorchScript的知识都写下来了,虽然我知道这些内容只是TorchScript的冰山一角。
总的来说,虽然这篇文章的主要目的是为了集成yolov5s模型到安卓应用中,但是依然可以将其作为TorchScript的入门。文章虽然比较啰嗦,部分内容也没有总结得精简,这主要也是因为自己在探索中遇到的一些问题希望能给大家留下印象,也是因为这些问题也确实包含一些无法抛弃的知识点。希望大家看了这篇文章依然很有收获。