前言

上期简要地给大家讲述了YOLO v3目标检测算法的原理,并且我们还一起动手搭建了特征提取网络。本期将会给大家讲述特征变换、非极大值抑制等函数,让我们一起来学习吧!

噢对了,这里我想说明一下。以后我打算不定期地穿插一些关于车辆工程专业的经验和感悟,希望大家能够喜欢。

干啥啥不行,挂科第一名

今天给大家讲述一门我们车辆专业挂科率最高的课——汽车测试技术。根据我多天的痛苦经验来看呢,这门课存在两个难点:

  • 学进去了却不能理解
  • 能理解了却不会运用

这门课其实哪都好:书印刷得很好看,老师课讲得十分精彩,监考老师的服务也挺到位。所以很多人又充值了VVVIP,来包学期享受这项服务。

很多人可能不太了解汽车测试技术,下面我来高度概括一下这门课的内容:

从时域和频域的角度来看待输入、输出信号和系统之间的关系及其各自特性。



图一

从思维导图来看,知识结构还是挺清晰的对吧。

有人说最好的学习方法就是把自己学到的知识讲述给别人。

受到这句话的启发,于是课后我就对着空旷的教室讲了一遍又一遍汽车测试技术。

又有谁能够耐心地听我讲呢?

util.ipynb

按照汽车测试技术的观点,万物皆可规划出:输入 --> 系统 --> 输出。那么我们今天所讲的内容就是YOLO v3目标检测算法的系统部分。

哈哈哈,大家放心,这可是个时不变系统,让我们一起来学习吧!

导入第三方库


from __future__ import division
import torch
from torchvision import transforms
import cv2
import numpy as np


构建唯一值函数


def unique(tensor): # 唯一值并从小到大排序
    tensor_np = tensor.numpy() # Tensor --> Ndarry
    unique_np = np.unique(tensor_np) 
    unique_tensor = torch.from_numpy(unique_np) # Ndarry --> Tensor
    tensor_res = tensor.new(unique_tensor.shape) # tensor_res.dtype = input,shape = unique_tensor
    tensor_res.copy_(unique_tensor)
    return tensor_res


此代码块将输入张量取唯一值并从小到大排序,主要用于后续对目标图像的类别提取。

构建IoU评价函数


def bbox_iou(box1,box2): # tensor = (x1,y1,x2,y2)
    b1_x1,b1_y1,b1_x2,b1_y2 = box1[:,0],box1[:,1],box1[:,2],box1[:,3]
    b2_x1,b2_y1,b2_x2,b2_y2 = box2[:,0],box2[:,1],box2[:,2],box2[:,3]
    inter_rect_x1 = torch.max(b1_x1,b2_x1)
    inter_rect_y1 = torch.max(b1_y1,b2_y1)
    inter_rect_x2 = torch.min(b1_x2,b2_x2)
    inter_rect_y2 = torch.min(b1_y2,b2_y2)
    inter_area = torch.clamp(inter_rect_x2-inter_rect_x1+1,min=0) * torch.clamp(inter_rect_y2-inter_rect_y1+1,min=0)
    b1_area = (b1_x2 - b1_x1 + 1) * (b1_y2 - b1_y1 + 1)
    b2_area = (b2_x2 - b2_x1 + 1) * (b2_y2 - b2_y1 + 1)
    iou = inter_area / (b1_area + b2_area - inter_area)
    return iou


IoU函数主要用于评价真实框和预测框的贴合程度,公式如下:



即真实框与预测框面积的交集与并集之比。

构建特征变换函数(重点一)

回顾上期:convolutional layer --> yolo layer

本层即feature map输出层,主要对前一卷积层传下来的feature map进行特征变换。


def predict_transform(prediction,inp_dim,anchors,num_classes): # feature map --> image
    
    # prediction = (batch_n,bbox_attrs*num_anchors,grid_size,grid_size)
    # bbox_attrs = (tx,ty,tw,th,objectness score,p1...p80)
    # (grid_size,grid_size) = (13,13) / (26,26) / (52,52)
    # inp_dim = 416
    # anchors = [(x1,y1),(x2,y2),(x3,y3)]
    # num_classes = 80
    batch_size = prediction.shape[0]
    stride = inp_dim // prediction.shape[2] # 416 // 13 == 32
    grid_size = inp_dim // stride # 416 // 32 == 13
    bbox_attrs = num_classes + 5
    num_anchors = len(anchors)
    
    # prediction --> (batch_n,anchors,bbox_attrs)
    prediction = prediction.reshape(batch_size,bbox_attrs*num_anchors,grid_size*grid_size)
    prediction = prediction.transpose(1,2)
    prediction = prediction.reshape(batch_size,grid_size*grid_size*num_anchors,bbox_attrs)
    
    # sigmoid
    prediction[:,:,0] = torch.sigmoid(prediction[:,:,0]) # center_x
    prediction[:,:,1] = torch.sigmoid(prediction[:,:,1]) # center_y
    prediction[:,:,4] = torch.sigmoid(prediction[:,:,4]) # score
    prediction[:,:,5:] = torch.sigmoid((prediction[:,:,5:])) # p_1 --> p_80

    
    # x_y_offset
    grid = np.arange(grid_size)
    a,b = np.meshgrid(grid,grid)
    x_offset = torch.Tensor(a).reshape(-1,1)
    y_offset = torch.Tensor(b).reshape(-1,1)
    x_y_offset = torch.cat((x_offset,y_offset),1).repeat(1,num_anchors).reshape(-1,2).unsqueeze(0)
    
    # bx = sigmoid(tx) + cx, by = sigmoid(ty) + cy
    prediction[:,:,:2] += x_y_offset
    
    # anchors --> feature map, anchors = [(h1,w1),(h2,w2),(h3,w3)]
    anchors = [(x[0] / stride,x[1] / stride) for x in anchors]
    anchors = torch.Tensor(anchors)
    anchors = anchors.repeat(grid_size*grid_size,1).unsqueeze(0)
    
    # bw = e^tw * pw, bh = e^th * ph
    prediction[:,:,2:4] = torch.exp(prediction[:,:,2:4]) * anchors
    
    # feature map shape --> image shape
    prediction[:,:,:4] *= stride
    
    return prediction


我们必须要明白bbox_attrs的含义!


目标检测准确率只有50_代码块

图二


参数说明:

  • 前四个参数为锚框anchors中心点坐标及宽和高
  • 第五个参数为是否包含物体(前景)的置信度
  • 后八十个参数为类别置信度,即如果包含物体,那么该物体属于哪种类别

既然大家已经明白了bbox_attrs的含义,那么我们来梳理一下此代码块的内容:

第一步:构建batch_n,anchors,bbox_attrs坐标系:


目标检测准确率只有50_代码块_02

图三

大家可以好好理解一下这幅图:


第二步:特征变换:


注:


为左上角网格坐标,


为anchors的宽和高


第三步:把prediction映射到image的尺寸中去

有一点值得我们注意:anchors必须先进行缩放才能与prediction运算

构建非极大值抑制函数(重点二)


# (x,y,w,h) --> (x1,y1,x2,y2)
# nms
def write_results(prediction,confidence,num_classes,nms_conf=0.4): # nms
    
    # prediction -- image -- [batch_n,(anchor_1 + anchor_2 + anchor_3),bbox_attrs]
    # prediction = [batch_n,10647,85]
    # bbox_attrs = [bx,by,bw,bh,objectness score,p1..p80]
    # confidence = object confidence
    # num_classes = 80
    # nms_conf = 0.4
    
    # prediction[score < confidence] -- > 0
    conf_mask = (prediction[:,:,4] > confidence).float().unsqueeze(2)
    prediction = prediction * conf_mask
    
    # (x,y,w,h) --> (x1,y1,x2,y2)
    box_corner = prediction.new(prediction.shape)
    box_corner[:,:,0] = (prediction[:,:,0] - prediction[:,:,2] / 2)
    box_corner[:,:,1] = (prediction[:,:,1] - prediction[:,:,3] / 2)
    box_corner[:,:,2] = (prediction[:,:,0] + prediction[:,:,2] / 2)
    box_corner[:,:,3] = (prediction[:,:,1] + prediction[:,:,3] / 2)
    prediction[:,:,:4] = box_corner[:,:,:4]
    
    batch_size = prediction.shape[0]
    write = False
    for ind in range(batch_size):
        
        # image_pred = (anchors,bbox_attrs)
        image_pred = prediction[ind]
        
        # (x1,y1,x2,y2,score,p1-p80) --> (anchors,information)
        # (anchors,information) = (10647,7)
        # information = x1 + y1 + x2 + y2 + score + max_conf + max_conf_index
        max_conf,max_conf_index = torch.max(image_pred[:,5:],1)
        max_conf = max_conf.float().unsqueeze(1)
        max_conf_index = max_conf_index.float().unsqueeze(1)
        seq = (image_pred[:,:5],max_conf,max_conf_index)
        image_pred = torch.cat(seq,1)
        
        # get non_zero_image
        # image_pred_ = (anchors,information)
        non_zero_ind = (torch.nonzero(image_pred[:,4])) # non_zero_index
        try:
            image_pred_ = image_pred[non_zero_ind.squeeze(),:].reshape(-1,7)
        except:
            continue
        if image_pred_.shape[0] == 0:
            continue
            
        # get all classes and sort
        # 获取该图像中检测到的物体类别
        img_classes = unique(image_pred_[:,-1]) 
        
        for cls in img_classes:
            
            # get image_pred_(anchors,information) == cls
            cls_mask = image_pred_ * (image_pred_[:,-1] == cls).float().unsqueeze(1)
            class_mask_ind = torch.nonzero(cls_mask[:,-2]).squeeze()
            image_pred_class = image_pred_[class_mask_ind].reshape(-1,7)
            
            # image_pred_classes = (anchors,information)
            # sort: big --> small
            conf_sort,conf_sort_index = torch.sort(image_pred_class[:,4],descending=True)
            image_pred_class = image_pred_class[conf_sort_index]
            idx = image_pred_class.shape[0]
            
            for i in range(idx):
                
                try:
                    ious = bbox_iou(image_pred_class[i].unsqueeze(0),image_pred_class[i+1:])
                except ValueError:
                    break
                except IndexError:
                    break
                
                # nms
                iou_mask = (ious < nms_conf).float().unsqueeze(1)
                image_pred_class[i+1:] *= iou_mask
                non_zero_ind = torch.nonzero(image_pred_class[:,4]).squeeze()
                image_pred_class = image_pred_class[non_zero_ind].reshape(-1,7)
                
            # image_pred_class = (anchores,information) <-- cls
            # ind in batch_n
            batch_ind = image_pred_class.new(image_pred_class.shape[0],1).fill_(ind)
            seq = (batch_ind,image_pred_class)
            if not write:
                output = torch.cat(seq,1)
                write = 1
            else:
                out = torch.cat(seq,1)
                output = torch.cat((output,out))
            
    try:
        return output
    except:
        return 0


我们来看看此代码块的内容:

第一步:去掉不包含物体的anchors

第二步:做如下变换:



第三步:大循环:遍历每一张图片,小循环:遍历该图片出现的每一个物体类别。再进行NMS计算:

  1. 按照 进行从大到小排序
  2. 将第一个anchor与剩下所有anchors计算IoU,并去掉分数大于nms_thresh的anchors
  3. 将此时第二个anchor与剩下所有anchors计算IoU......

第四步:得到输出信息,即



注:


该批次中的图像编号。


构建图像缩放函数(重点三)


def letterbox_image(img,inp_dim):
    # inp_dim = 'height' = (416,416)
    img_w,img_h = img.shape[1],img.shape[0]
    w,h = inp_dim
    new_w = int(img_w * min(w/img_w,h/img_h))
    new_h = int(img_h * min(w/img_w,h/img_h))
    resized_image = cv2.resize(img,(new_w,new_h),interpolation=cv2.INTER_CUBIC)
    canvas = np.full((inp_dim[1],inp_dim[0],3),128)
    canvas[(h-new_h)//2:(h-new_h)//2 + new_h,(w-new_w)//2:(w-new_w)//2 + new_w,:] = resized_image
    return canvas


我们从一个例子来看这个函数的重要性:


目标检测准确率只有50_ide_03

图四

左图是把一个宽高为(452,602)的图片直接强行缩放至(416,416),很明显这种做法会影响检测器的效果,右图则是图像缩放函数的缩放效果。所以此代码块的内容为:

  1. 将图像按比例缩放至一个宽高为(416,416)的框中
  2. 将其余部分以像素值128进行填充

构建图像标准化函数


def prep_image(img,inp_dim):
    img = (letterbox_image(img,(inp_dim,inp_dim)))
    b,g,r = cv2.split(img)
    img = cv2.merge([r,g,b]) # BGR --> RGB
    img = img.tranpose(2,0,1).copy() # (height,width,channel) --> (channel,height,width)
    img = torch.from_numpy(img).float().div(255.0).unsqueeze(0) # --> (batch_n,channel,h,w)
    return img


构建类别加载函数


def load_classes(namesfile): # get a list about str
    fp = open(namesfile,'r')
    names = fp.read().split('n')[:-1]
    return names