准备
在util.py中创建write_results函数来获取我们的正确检测结果
def write_results(prediction, confidence, num_classes, nms_conf = 0.4):
函数以prediction, confidence (objectness score threshold), num_classes (80, in our case) and nms_conf (the NMS IoU threshold)作为输入
目标置信度
预测向量包含B×10647锚框的信息。对每一个得分低于阈值的锚框,我们设置它的每个属性都为零
conf_mask = (prediction[:,:,4] > confidence).float().unsqueeze(2)
prediction = prediction*conf_mask
实现非最大抑制
锚框的属性有中心坐标,高和宽。然而用每个锚框的一对对角线坐标很容易计算两个锚框的IoU(交并比)
box_corner = prediction.new(prediction.shape)
box_corner[:,:,0] = (prediction[:,:,0] - prediction[:,:,2]/2)
box_corner[:,:,1] = (prediction[:,:,1] - prediction[:,:,3]/2)
box_corner[:,:,2] = (prediction[:,:,0] + prediction[:,:,2]/2)
box_corner[:,:,3] = (prediction[:,:,1] + prediction[:,:,3]/2)
prediction[:,:,:4] = box_corner[:,:,:4]
在每个图片的真实测量数量可能不同,例如,一个批大小为3的图像有1,2,3图片,有5,2,4真实预测,所以置信阈值和非最大抑制必须一次性对一张图片做完,这意味着我们不能认为操作了就完了,而必须循环遍历预测的每一个维度
batch_size = prediction.size(0)
write = False
for ind in range(batch_size):
image_pred = prediction[ind] #image Tensor
#confidence threshholding
#NMS
像先前所说,write标志位用来说明我们对output是否初始化,一个我们用来收集整个批次正确检测的向量。
在循环里,每一个锚框行有85个属性,这种情况下,我们只关注最大值的类得分,所以我们从每行移除80个类的得分
max_conf, max_conf_score = torch.max(image_pred[:,5:5+ num_classes], 1)
max_conf = max_conf.float().unsqueeze(1)
max_conf_score = max_conf_score.float().unsqueeze(1)
seq = (image_pred[:,:5], max_conf, max_conf_score)
image_pred = torch.cat(seq, 1)
将置信度低于阈值的锚框置零
non_zero_ind = (torch.nonzero(image_pred[:,4]))
try:
image_pred_ = image_pred[non_zero_ind.squeeze(),:].view(-1,7)
except:
continue
#For PyTorch 0.4 compatibility
#Since the above code with not raise exception for no detection
#as scalars are supported in PyTorch 0.4
if image_pred_.shape[0] == 0:
continue
在图片循环体中继续跳过没有检测到的块
图像中检测得到类
#Get the various classes detected in the image
img_classes = unique(image_pred_[:,-1]) # -1 index holds the class index
如果同一类有多个正确检测,我们使用unique函数得到给出图片的真实类
def unique(tensor):
tensor_np = tensor.cpu().numpy()
unique_np = np.unique(tensor_np)
unique_tensor = torch.from_numpy(unique_np)
tensor_res = tensor.new(unique_tensor.shape)
tensor_res.copy_(unique_tensor)
return tensor_res
使用非最大抑制分类
for cls in img_classes:
#perform NMS
进行NMS非极大抑制
for i in range(idx):
#Get the IOUs of all boxes that come after the one we are looking at
#in the loop
try:
ious = bbox_iou(image_pred_class[i].unsqueeze(0), image_pred_class[i+1:])
except ValueError:
break
except IndexError:
break
#Zero out all the detections that have IoU > treshhold
iou_mask = (ious < nms_conf).float().unsqueeze(1)
image_pred_class[i+1:] *= iou_mask
#Remove the non-zero entries
non_zero_ind = torch.nonzero(image_pred_class[:,4]).squeeze()
image_pred_class = image_pred_class[non_zero_ind].view(-1,7)
这里我们使用bbox_iou函数,第一项输入是用变量i在循环中索引的锚框的行
第二项Bbox_iou的输入是锚框的多行向量,bbox_iou函数的输出是一个包含交并比的向量,代表第一个输入的锚框和在第二个输入的每一个锚框的交并比
如果我么有同一类超过阈值的的两个锚框,低的置信度的就被忽略,我们要用有最大置信度的锚框进行分类。
在循环体中,下面的代码给出了锚框的交并比。
ious = bbox_iou(image_pred_class[i].unsqueeze(0), image_pred_class[i+1:])
每一次迭代,如果i以后的锚框有IoU大于nms_thresh阈值,这个锚框就被取消
iou_mask = (ious < nms_conf).float().unsqueeze(1)
image_pred_class[i+1:] *= iou_mask
#Remove the non-zero entries
non_zero_ind = torch.nonzero(image_pred_class[:,4]).squeeze()
image_pred_class = image_pred_class[non_zero_ind]
计算IoU
如下是bbox_iou函数
def bbox_iou(box1, box2):
"""
Returns the IoU of two bounding boxes
"""
#Get the coordinates of bounding boxes
b1_x1, b1_y1, b1_x2, b1_y2 = box1[:,0], box1[:,1], box1[:,2], box1[:,3]
b2_x1, b2_y1, b2_x2, b2_y2 = box2[:,0], box2[:,1], box2[:,2], box2[:,3]
#get the corrdinates of the intersection rectangle
inter_rect_x1 = torch.max(b1_x1, b2_x1)
inter_rect_y1 = torch.max(b1_y1, b2_y1)
inter_rect_x2 = torch.min(b1_x2, b2_x2)
inter_rect_y2 = torch.min(b1_y2, b2_y2)
#Intersection area
inter_area = torch.clamp(inter_rect_x2 - inter_rect_x1 + 1, min=0) * torch.clamp(inter_rect_y2 - inter_rect_y1 + 1, min=0)
#Union Area
b1_area = (b1_x2 - b1_x1 + 1)*(b1_y2 - b1_y1 + 1)
b2_area = (b2_x2 - b2_x1 + 1)*(b2_y2 - b2_y1 + 1)
iou = inter_area / (b1_area + b2_area - inter_area)
return iou
写入预测文件
write_results函数输出大小为D×8向量这里D是所有图像的正确检测,用一行代表,每个检测有8个属性。
像之前一样,我们没有初始化我们的输出向量因为我们有正确匹配赋值给它,一旦它被初始化了,我们将后续检测连接上去,我们用write标志说明向量是否被初始化,我们将检测结果写到output向量中。
batch_ind = image_pred_class.new(image_pred_class.size(0), 1).fill_(ind)
#Repeat the batch_id for as many detections of the class cls in the image
seq = batch_ind, image_pred_class
if not write:
output = torch.cat(seq,1)
write = True
else:
out = torch.cat(seq,1)
output = torch.cat((output,out))
在函数最后,我们检测输出是否被初始化了,如果没有意味着在这一批里没有一个检测到的一个目标,返回零
try:
return output
except:
return 0