前言
上期简要地给大家讲述了YOLO v3目标检测算法的原理,并且我们还一起动手搭建了特征提取网络。本期将会给大家讲述特征变换、非极大值抑制等函数,让我们一起来学习吧!
噢对了,这里我想说明一下。以后我打算不定期地穿插一些关于车辆工程专业的经验和感悟,希望大家能够喜欢。
干啥啥不行,挂科第一名
今天给大家讲述一门我们车辆专业挂科率最高的课——汽车测试技术。根据我多天的痛苦经验来看呢,这门课存在两个难点:
- 学进去了却不能理解
- 能理解了却不会运用
这门课其实哪都好:书印刷得很好看,老师课讲得十分精彩,监考老师的服务也挺到位。所以很多人又充值了VVVIP,来包学期享受这项服务。
很多人可能不太了解汽车测试技术,下面我来高度概括一下这门课的内容:
从时域和频域的角度来看待输入、输出信号和系统之间的关系及其各自特性。
图一
从思维导图来看,知识结构还是挺清晰的对吧。
有人说最好的学习方法就是把自己学到的知识讲述给别人。
受到这句话的启发,于是课后我就对着空旷的教室讲了一遍又一遍汽车测试技术。
又有谁能够耐心地听我讲呢?
util.ipynb
按照汽车测试技术的观点,万物皆可规划出:输入 --> 系统 --> 输出。那么我们今天所讲的内容就是YOLO v3目标检测算法的系统部分。
哈哈哈,大家放心,这可是个时不变系统,让我们一起来学习吧!
导入第三方库
from __future__ import division
import torch
from torchvision import transforms
import cv2
import numpy as np
构建唯一值函数
def unique(tensor): # 唯一值并从小到大排序
tensor_np = tensor.numpy() # Tensor --> Ndarry
unique_np = np.unique(tensor_np)
unique_tensor = torch.from_numpy(unique_np) # Ndarry --> Tensor
tensor_res = tensor.new(unique_tensor.shape) # tensor_res.dtype = input,shape = unique_tensor
tensor_res.copy_(unique_tensor)
return tensor_res
此代码块将输入张量取唯一值并从小到大排序,主要用于后续对目标图像的类别提取。
构建IoU评价函数
def bbox_iou(box1,box2): # tensor = (x1,y1,x2,y2)
b1_x1,b1_y1,b1_x2,b1_y2 = box1[:,0],box1[:,1],box1[:,2],box1[:,3]
b2_x1,b2_y1,b2_x2,b2_y2 = box2[:,0],box2[:,1],box2[:,2],box2[:,3]
inter_rect_x1 = torch.max(b1_x1,b2_x1)
inter_rect_y1 = torch.max(b1_y1,b2_y1)
inter_rect_x2 = torch.min(b1_x2,b2_x2)
inter_rect_y2 = torch.min(b1_y2,b2_y2)
inter_area = torch.clamp(inter_rect_x2-inter_rect_x1+1,min=0) * torch.clamp(inter_rect_y2-inter_rect_y1+1,min=0)
b1_area = (b1_x2 - b1_x1 + 1) * (b1_y2 - b1_y1 + 1)
b2_area = (b2_x2 - b2_x1 + 1) * (b2_y2 - b2_y1 + 1)
iou = inter_area / (b1_area + b2_area - inter_area)
return iou
IoU函数主要用于评价真实框和预测框的贴合程度,公式如下:
即真实框与预测框面积的交集与并集之比。
构建特征变换函数(重点一)
回顾上期:convolutional layer --> yolo layer
本层即feature map输出层,主要对前一卷积层传下来的feature map进行特征变换。
def predict_transform(prediction,inp_dim,anchors,num_classes): # feature map --> image
# prediction = (batch_n,bbox_attrs*num_anchors,grid_size,grid_size)
# bbox_attrs = (tx,ty,tw,th,objectness score,p1...p80)
# (grid_size,grid_size) = (13,13) / (26,26) / (52,52)
# inp_dim = 416
# anchors = [(x1,y1),(x2,y2),(x3,y3)]
# num_classes = 80
batch_size = prediction.shape[0]
stride = inp_dim // prediction.shape[2] # 416 // 13 == 32
grid_size = inp_dim // stride # 416 // 32 == 13
bbox_attrs = num_classes + 5
num_anchors = len(anchors)
# prediction --> (batch_n,anchors,bbox_attrs)
prediction = prediction.reshape(batch_size,bbox_attrs*num_anchors,grid_size*grid_size)
prediction = prediction.transpose(1,2)
prediction = prediction.reshape(batch_size,grid_size*grid_size*num_anchors,bbox_attrs)
# sigmoid
prediction[:,:,0] = torch.sigmoid(prediction[:,:,0]) # center_x
prediction[:,:,1] = torch.sigmoid(prediction[:,:,1]) # center_y
prediction[:,:,4] = torch.sigmoid(prediction[:,:,4]) # score
prediction[:,:,5:] = torch.sigmoid((prediction[:,:,5:])) # p_1 --> p_80
# x_y_offset
grid = np.arange(grid_size)
a,b = np.meshgrid(grid,grid)
x_offset = torch.Tensor(a).reshape(-1,1)
y_offset = torch.Tensor(b).reshape(-1,1)
x_y_offset = torch.cat((x_offset,y_offset),1).repeat(1,num_anchors).reshape(-1,2).unsqueeze(0)
# bx = sigmoid(tx) + cx, by = sigmoid(ty) + cy
prediction[:,:,:2] += x_y_offset
# anchors --> feature map, anchors = [(h1,w1),(h2,w2),(h3,w3)]
anchors = [(x[0] / stride,x[1] / stride) for x in anchors]
anchors = torch.Tensor(anchors)
anchors = anchors.repeat(grid_size*grid_size,1).unsqueeze(0)
# bw = e^tw * pw, bh = e^th * ph
prediction[:,:,2:4] = torch.exp(prediction[:,:,2:4]) * anchors
# feature map shape --> image shape
prediction[:,:,:4] *= stride
return prediction
我们必须要明白bbox_attrs的含义!

图二
参数说明:
- 前四个参数为锚框anchors中心点坐标及宽和高
- 第五个参数为是否包含物体(前景)的置信度
- 后八十个参数为类别置信度,即如果包含物体,那么该物体属于哪种类别
既然大家已经明白了bbox_attrs的含义,那么我们来梳理一下此代码块的内容:
第一步:构建batch_n,anchors,bbox_attrs坐标系:

图三
大家可以好好理解一下这幅图:
第二步:特征变换:
注:
为左上角网格坐标,
为anchors的宽和高
第三步:把prediction映射到image的尺寸中去
有一点值得我们注意:anchors必须先进行缩放才能与prediction运算
构建非极大值抑制函数(重点二)
# (x,y,w,h) --> (x1,y1,x2,y2)
# nms
def write_results(prediction,confidence,num_classes,nms_conf=0.4): # nms
# prediction -- image -- [batch_n,(anchor_1 + anchor_2 + anchor_3),bbox_attrs]
# prediction = [batch_n,10647,85]
# bbox_attrs = [bx,by,bw,bh,objectness score,p1..p80]
# confidence = object confidence
# num_classes = 80
# nms_conf = 0.4
# prediction[score < confidence] -- > 0
conf_mask = (prediction[:,:,4] > confidence).float().unsqueeze(2)
prediction = prediction * conf_mask
# (x,y,w,h) --> (x1,y1,x2,y2)
box_corner = prediction.new(prediction.shape)
box_corner[:,:,0] = (prediction[:,:,0] - prediction[:,:,2] / 2)
box_corner[:,:,1] = (prediction[:,:,1] - prediction[:,:,3] / 2)
box_corner[:,:,2] = (prediction[:,:,0] + prediction[:,:,2] / 2)
box_corner[:,:,3] = (prediction[:,:,1] + prediction[:,:,3] / 2)
prediction[:,:,:4] = box_corner[:,:,:4]
batch_size = prediction.shape[0]
write = False
for ind in range(batch_size):
# image_pred = (anchors,bbox_attrs)
image_pred = prediction[ind]
# (x1,y1,x2,y2,score,p1-p80) --> (anchors,information)
# (anchors,information) = (10647,7)
# information = x1 + y1 + x2 + y2 + score + max_conf + max_conf_index
max_conf,max_conf_index = torch.max(image_pred[:,5:],1)
max_conf = max_conf.float().unsqueeze(1)
max_conf_index = max_conf_index.float().unsqueeze(1)
seq = (image_pred[:,:5],max_conf,max_conf_index)
image_pred = torch.cat(seq,1)
# get non_zero_image
# image_pred_ = (anchors,information)
non_zero_ind = (torch.nonzero(image_pred[:,4])) # non_zero_index
try:
image_pred_ = image_pred[non_zero_ind.squeeze(),:].reshape(-1,7)
except:
continue
if image_pred_.shape[0] == 0:
continue
# get all classes and sort
# 获取该图像中检测到的物体类别
img_classes = unique(image_pred_[:,-1])
for cls in img_classes:
# get image_pred_(anchors,information) == cls
cls_mask = image_pred_ * (image_pred_[:,-1] == cls).float().unsqueeze(1)
class_mask_ind = torch.nonzero(cls_mask[:,-2]).squeeze()
image_pred_class = image_pred_[class_mask_ind].reshape(-1,7)
# image_pred_classes = (anchors,information)
# sort: big --> small
conf_sort,conf_sort_index = torch.sort(image_pred_class[:,4],descending=True)
image_pred_class = image_pred_class[conf_sort_index]
idx = image_pred_class.shape[0]
for i in range(idx):
try:
ious = bbox_iou(image_pred_class[i].unsqueeze(0),image_pred_class[i+1:])
except ValueError:
break
except IndexError:
break
# nms
iou_mask = (ious < nms_conf).float().unsqueeze(1)
image_pred_class[i+1:] *= iou_mask
non_zero_ind = torch.nonzero(image_pred_class[:,4]).squeeze()
image_pred_class = image_pred_class[non_zero_ind].reshape(-1,7)
# image_pred_class = (anchores,information) <-- cls
# ind in batch_n
batch_ind = image_pred_class.new(image_pred_class.shape[0],1).fill_(ind)
seq = (batch_ind,image_pred_class)
if not write:
output = torch.cat(seq,1)
write = 1
else:
out = torch.cat(seq,1)
output = torch.cat((output,out))
try:
return output
except:
return 0
我们来看看此代码块的内容:
第一步:去掉不包含物体的anchors
第二步:做如下变换:
第三步:大循环:遍历每一张图片,小循环:遍历该图片出现的每一个物体类别。再进行NMS计算:
- 按照 进行从大到小排序
- 将第一个anchor与剩下所有anchors计算IoU,并去掉分数大于nms_thresh的anchors
- 将此时第二个anchor与剩下所有anchors计算IoU......
第四步:得到输出信息,即
注:
该批次中的图像编号。
构建图像缩放函数(重点三)
def letterbox_image(img,inp_dim):
# inp_dim = 'height' = (416,416)
img_w,img_h = img.shape[1],img.shape[0]
w,h = inp_dim
new_w = int(img_w * min(w/img_w,h/img_h))
new_h = int(img_h * min(w/img_w,h/img_h))
resized_image = cv2.resize(img,(new_w,new_h),interpolation=cv2.INTER_CUBIC)
canvas = np.full((inp_dim[1],inp_dim[0],3),128)
canvas[(h-new_h)//2:(h-new_h)//2 + new_h,(w-new_w)//2:(w-new_w)//2 + new_w,:] = resized_image
return canvas
我们从一个例子来看这个函数的重要性:

图四
左图是把一个宽高为(452,602)的图片直接强行缩放至(416,416),很明显这种做法会影响检测器的效果,右图则是图像缩放函数的缩放效果。所以此代码块的内容为:
- 将图像按比例缩放至一个宽高为(416,416)的框中
- 将其余部分以像素值128进行填充
构建图像标准化函数
def prep_image(img,inp_dim):
img = (letterbox_image(img,(inp_dim,inp_dim)))
b,g,r = cv2.split(img)
img = cv2.merge([r,g,b]) # BGR --> RGB
img = img.tranpose(2,0,1).copy() # (height,width,channel) --> (channel,height,width)
img = torch.from_numpy(img).float().div(255.0).unsqueeze(0) # --> (batch_n,channel,h,w)
return img
构建类别加载函数
def load_classes(namesfile): # get a list about str
fp = open(namesfile,'r')
names = fp.read().split('n')[:-1]
return names
















