本项目利用MindSpore框架搭建YOLOv3目标检测模型,从PASCAL VOC 2012数据集中提取出的人体目标检测数据进行模型训练,得到一个人体目标检测模型。期望通过本次项目为MindSpore生态尽自己的一份绵薄之力。

1.环境准备

选择MindSpore版本为1.5或1.6,硬件为GPU。可以参照 https://www.mindspore.cn/install 根据自己的本地环境进行安装。

笔者使用了华为云->ModelArt->开发环境->notebook,用这个产品的好处是它的环境包括MindSpore框架都已经装好了,笔者选择的规格是 GPU: 1*V100(32GB)|CPU: 8核 64GB ,这种规格大概要花100~200元(具体我忘了,也许更少)能跑完本项目,而且速度还挺快。升级MindSpore版本同样可参考 https://www.mindspore.cn/install 。

2.数据集处理

PASCAL VOC 2012数据集包含训练集 5717 张,验证集 5823 张,共有20个检测的类别。

2.1 提取出"person"目标检测数据,对检测框作聚类

我们首先要从数据集中提取带"person"目标的图片,然后遍历"person"目标检测框,以检测框长和宽为坐标选出9个聚类中心点,这9个中心点的坐标将用于作为YOLOv3的先验框大小。

这部分内容的代码可以参考 附件\choose_person 文件夹,将这个文件夹下的python代码放到voc2012数据集的 VOCtrainval_11-May-2012\VOCdevkit 目录(该目录中有一个VOC2012文件夹,里面有ImageSets等目录),挨个运行cluster.py、rand_choose.py即可(可参考附件中的readme.txt)。在执行完两个python程序后,会得到类似于voc2012数据集 VOCtrainval_11-May-2012\VOCdevkit\VOC2012\ImageSets\Main\ 目录下的 train.txt、val.txt 的 txt 文本文件,将它们放置于上述的 Main\ 目录下。

2.2 数据集加载

MindSpore 为我们提供了加载 PASCAL VOC 数据集的接口 VOCDataset,可参考 MindSpore 官网中的文档使用。这里我们再为它提供一些配套的处理和封装(参见 附件\dataset\voc2012_dataset.py ):

"""读取VOC2012数据集"""

import mindspore.dataset as ds
import mindspore.dataset.vision.c_transforms as CV

import cv2
import numpy as np

# 附件\dataset\transforms.py
from transforms import reshape_fn, MultiScaleTrans

def create_voc2012_dataset(config, cv_num):
    """create VOC2012 dataset"""
    
    voc2012_dat = ds.VOCDataset(dataset_dir=config.data_path, task="Detection", usage=config.data_usage, 
                         shuffle=config.data_training, num_parallel_workers=8)
    dataset_size = voc2012_dat.get_dataset_size()
    config.class_to_idx = voc2012_dat.get_class_indexing()
    
    cv2.setNumThreads(0)
    if config.data_training:
        
        multi_scale_trans = MultiScaleTrans(config, cv_num)
        
        dataset_input_column_names = ["image", "bbox", "label", "truncate", "difficult"]
        dataset_output_column_names = ["image", "annotation", "bbox1", "bbox2", "bbox3", "gt_box1", "gt_box2", "gt_box3"]
        voc2012_dat = voc2012_dat.map(operations=CV.Decode(), input_columns=["image"])
        voc2012_dat = voc2012_dat.batch(config.batch_size, per_batch_map=multi_scale_trans, input_columns=dataset_input_column_names,
                      output_columns=dataset_output_column_names, num_parallel_workers=8, drop_remainder=True)
        
        voc2012_dat = voc2012_dat.repeat(config.max_epoch-config.pretrained_epoch_num)
        
    else:
        
        img_id = np.array(range(0,dataset_size))
        img_id = img_id.reshape((-1,1))
        img_id = ds.GeneratorDataset(img_id, ['img_id'], shuffle=False)
        voc2012_dat = voc2012_dat.zip(img_id)
        
        compose_map_func = (lambda image, img_id: reshape_fn(image, img_id, config))
        voc2012_dat = voc2012_dat.map(operations=CV.Decode(), input_columns=["image"], num_parallel_workers=8)
        voc2012_dat = voc2012_dat.map(operations=compose_map_func, input_columns=["image", "img_id"],
                    output_columns=["image", "image_shape", "img_id"],
                    column_order=["image", "image_shape", "img_id"],
                    num_parallel_workers=8)
        
        hwc_to_chw = CV.HWC2CHW()
        voc2012_dat = voc2012_dat.map(operations=hwc_to_chw, input_columns=["image"], num_parallel_workers=8)
        voc2012_dat = voc2012_dat.batch(config.batch_size, drop_remainder=True)
        voc2012_dat = voc2012_dat.repeat(1)
        
    return voc2012_dat, dataset_size

这个函数大概的意思是将 VOCDataset 读出的数据集处理成我们想要的样子,即最后返回的 voc2012_dat。dataset_size 是读出的数据集大小(图片张数)。

对于训练来说(config.data_training==True),首先要将数据集处理成batch,并能以 MindSpore 的 Tensor 类型返回,这要求将一个 batch 中的图片reshape成统一大小,其中每张输入图像都对应三种尺度的输出特征图,即上方函数中的"bbox1"、"bbox2"、"bbox3",函数 create_voc2012_dataset 要做的就是将数据集中的标签检测框数据根据坐标和大小映射到对应尺度的特征图中(方法是将标签检测框与选出的先验框进行IoU计算,标签检测框落入IoU得分大的对应先验框位置,详见 附件\dataset\transforms.py 中的 _preprocess_true_boxes 函数),最后 "gt_box" 以Tensor形式存放所有标签检测框,方便后续的计算。用于训练的图片还需要进行图像增强操作,包括对图像随机缩放、翻转、旋转、剪切、平移以及颜色随机变换,相应的标签检测框位置也要变换,这些操作都在 附件\dataset\transforms.py 文件(该文件来自于 https://gitee.com/mindspore/models/blob/r1.5/official/cv/yolov3_darknet53/src/transforms.py ,我只改了少量内容,例如添加代码将VOCDataset读出的标签检测框格式[x y w h]处理成了[xmin ymin xmax ymax])中,并最终在 MultiScaleTrans 中调用。

对于测试,同样可以将图片处理成batch,也要reshape成统一大小,另外还要保留原图像的 shape (即上方函数的"image_shape"),方便将测试时模型推理得到的坐标映射到原图中。"img_id" 是从0到n-1(设数据集大小,即dataset_size为n)的数字,代表图片对应于 config.data_usage 所指向的文件(这个文件即2.1中生成的 txt 文件)的第几行,从而进一步得到对应是哪张图片。

参数 config 的具体含义和设置,以及数据集的具体使用方法,请见后面的模型训练和模型测试部分。

3.模型搭建

本项目采用Darknet53作为YOLOv3的主干网络,Darknet53和YOLOv3模型的结构图如下:

 

opencv骨架算法去掉分支_ide

 

opencv骨架算法去掉分支_数据集_02

 

 

下面我们开始构建模型,本项目参考了 https://gitee.com/mindspore/models/tree/r1.5/official/cv/yolov3_darknet53 ,寻着目录往上找还能找到许多MindSpore写的常见模型,一般大家将数据处理成他们的格式直接用他们的模型就行了。

3.1 Darknet53

"""YOLOv3 backbone: darknet53"""

import mindspore.nn as nn
from mindspore.ops import operations as P

def conv_block(in_channels,
               out_channels,
               kernel_size,
               stride,
               dilation=1):
    """Get a conv2d batchnorm and relu layer"""
    pad_mode = 'same'
    padding = 0

    return nn.SequentialCell(
        [nn.Conv2d(in_channels,
                   out_channels,
                   kernel_size=kernel_size,
                   stride=stride,
                   padding=padding,
                   dilation=dilation,
                   pad_mode=pad_mode),
         nn.BatchNorm2d(out_channels, momentum=0.1),
         nn.ReLU()]
    )
    
class ResidualBlock(nn.Cell):
    """
    DarkNet V1 residual block definition.

    Args:
        in_channels: Integer. Input channel.
        out_channels: Integer. Output channel.

    Returns:
        Tensor, output tensor.
    Examples:
        ResidualBlock(3, 208)
    """
    expansion = 4

    def __init__(self,
                 in_channels,
                 out_channels):

        super(ResidualBlock, self).__init__()
        out_chls = out_channels//2
        self.conv1 = conv_block(in_channels, out_chls, kernel_size=1, stride=1)
        self.conv2 = conv_block(out_chls, out_channels, kernel_size=3, stride=1)
        self.add = P.Add()

    def construct(self, x):
        identity = x
        out = self.conv1(x)
        out = self.conv2(out)
        out = self.add(out, identity)

        return out

class DarkNet(nn.Cell):
    """
    DarkNet V1 network.

    Args:
        block: Cell. Block for network.
        layer_nums: List. Numbers of different layers.
        in_channels: Integer. Input channel.
        out_channels: Integer. Output channel.
        detect: Bool. Whether detect or not. Default:False.

    Returns:
        Tuple, tuple of output tensor,(f1,f2,f3,f4,f5).

    Examples:
        DarkNet(ResidualBlock,
               [1, 2, 8, 8, 4],
               [32, 64, 128, 256, 512],
               [64, 128, 256, 512, 1024],
               100)
    """
    def __init__(self,
                 block,
                 layer_nums,
                 in_channels,
                 out_channels,
                 detect=False):
        super(DarkNet, self).__init__()

        self.outchannel = out_channels[-1]
        self.detect = detect

        if not len(layer_nums) == len(in_channels) == len(out_channels) == 5:
            raise ValueError("the length of layer_num, inchannel, outchannel list must be 5!")
        self.conv0 = conv_block(3,
                                in_channels[0],
                                kernel_size=3,
                                stride=1)
        self.conv1 = conv_block(in_channels[0],
                                out_channels[0],
                                kernel_size=3,
                                stride=2)
        self.layer1 = self._make_layer(block,
                                       layer_nums[0],
                                       in_channel=out_channels[0],
                                       out_channel=out_channels[0])
        self.conv2 = conv_block(in_channels[1],
                                out_channels[1],
                                kernel_size=3,
                                stride=2)
        self.layer2 = self._make_layer(block,
                                       layer_nums[1],
                                       in_channel=out_channels[1],
                                       out_channel=out_channels[1])
        self.conv3 = conv_block(in_channels[2],
                                out_channels[2],
                                kernel_size=3,
                                stride=2)
        self.layer3 = self._make_layer(block,
                                       layer_nums[2],
                                       in_channel=out_channels[2],
                                       out_channel=out_channels[2])
        self.conv4 = conv_block(in_channels[3],
                                out_channels[3],
                                kernel_size=3,
                                stride=2)
        self.layer4 = self._make_layer(block,
                                       layer_nums[3],
                                       in_channel=out_channels[3],
                                       out_channel=out_channels[3])
        self.conv5 = conv_block(in_channels[4],
                                out_channels[4],
                                kernel_size=3,
                                stride=2)
        self.layer5 = self._make_layer(block,
                                       layer_nums[4],
                                       in_channel=out_channels[4],
                                       out_channel=out_channels[4])

    def _make_layer(self, block, layer_num, in_channel, out_channel):
        """
        Make Layer for DarkNet.

        :param block: Cell. DarkNet block.
        :param layer_num: Integer. Layer number.
        :param in_channel: Integer. Input channel.
        :param out_channel: Integer. Output channel.

        Examples:
            _make_layer(ConvBlock, 1, 128, 256)
        """
        layers = []
        darkblk = block(in_channel, out_channel)
        layers.append(darkblk)

        for _ in range(1, layer_num):
            darkblk = block(out_channel, out_channel)
            layers.append(darkblk)

        return nn.SequentialCell(layers)

    def construct(self, x):
        c1 = self.conv0(x)
        c2 = self.conv1(c1)
        c3 = self.layer1(c2)
        c4 = self.conv2(c3)
        c5 = self.layer2(c4)
        c6 = self.conv3(c5)
        c7 = self.layer3(c6)
        c8 = self.conv4(c7)
        c9 = self.layer4(c8)
        c10 = self.conv5(c9)
        c11 = self.layer5(c10)
        if self.detect:
            return c7, c9, c11

        return c11

    def get_out_channels(self):
        return self.outchannel

def get_darknet53(detect=False):
    """
    Get DarkNet53 neural network.

    Returns:
        Cell, cell instance of DarkNet53 neural network.

    Examples:
        darknet53()
    """
    return DarkNet(ResidualBlock, [1, 2, 8, 8, 4],
                   [32, 64, 128, 256, 512],
                   [64, 128, 256, 512, 1024], detect)

(未完,请见 YOLOv3人体目标检测模型实现(二))