mAP的计算
直接放YoLoV3的网络结构图,让我们稍微看一下YoLov3的网络结构。
YoLoV3网络主要分成两大部分:
- 1.主干网络 DarkNet53
- 2.多尺度预测
1.主干网络DarkNet53
首先是主干网络DarkNet53,结合网络图我们看到它主要是使用了残差块Residual block,这里残差块就是进行一次3X3、步长为2的卷积,然后保存该卷积layer,再进行一次1X1的卷积和一次3X3的卷积,并把这个结果加上layer作为最后的结果.
此外,主干网络DarkNet53每一个卷积使用了特有的DarkNetConv2D结构,这里的DarkNetConv2D是指每一次卷积的时候进行l2正则化,完成卷积后进行BatchNormalization标准化与LeakyReLU。
2.多尺度预测
在多尺度预测部分,可以从网络结构图中看到yolov3提取了3个特征层,这3个特征层分别位于中间层、中下层和底层。
这3个特征层会进行5次卷积,处理完一部分用于输出该特征层对应的预测结果,一部分用于进行反卷积UmSampling2d后与其它特征层进行结合。
输出层的shape分别为(13,13,75),(26,26,75),(52,52,75),最后一个维度为75是因为该图是基于voc数据集的,它的类为20种,即25=(四个坐标+1个置信度+20个类别)。 yolo3针对每一个特征层存在3个先验框,所以最后维度为3x25=75;
# 单次卷积 #
#--------------------------------------------------#
@wraps(Conv2D)
def DarknetConv2D(*args, **kwargs):
darknet_conv_kwargs = {'kernel_regularizer': l2(5e-4)} #L2正则化帮助提高性能#
darknet_conv_kwargs['padding'] = 'valid' if kwargs.get('strides')==(2,2) else 'same'
darknet_conv_kwargs.update(kwargs)
return Conv2D(*args, **darknet_conv_kwargs)
#---------------------------------------------------#
# 卷积块
# DarknetConv2D + BatchNormalization(标准化) + LeakyReLU(激活函数)
#---------------------------------------------------#
def DarknetConv2D_BN_Leaky(*args, **kwargs):
no_bias_kwargs = {'use_bias': False}
no_bias_kwargs.update(kwargs)
return compose(
DarknetConv2D(*args, **no_bias_kwargs),
BatchNormalization(),
LeakyReLU(alpha=0.1))
#---------------------------------------------------#
# 卷积块
# DarknetConv2D + BatchNormalization (标准化)+ LeakyReLU(激活函数)
#---------------------------------------------------#
def resblock_body(x, num_filters, num_blocks): #残差结构
x = ZeroPadding2D(((1,0),(1,0)))(x)
x = DarknetConv2D_BN_Leaky(num_filters, (3,3), strides=(2,2))(x) #步长是2,所以长宽变成原来的二分之一
for i in range(num_blocks): #重复次数
y = DarknetConv2D_BN_Leaky(num_filters//2, (1,1))(x) #主干部分第一次1*1卷积,把通道数压缩二分之一
y = DarknetConv2D_BN_Leaky(num_filters, (3,3))(y) #主干部分第二次3*3卷积,把通道数扩张回去
x = Add()([x,y]) #把残差线和主体相连,进行次数就是重复次数次
return x
#---------------------------------------------------#
# darknet53 的主体部分
#---------------------------------------------------#
def darknet_body(x):
#416,416,3
x = DarknetConv2D_BN_Leaky(32, (3,3))(x) #长宽不变,通道数变成32,对应图示Step1#
#208, 208, 64
x = resblock_body(x, 64, 1) #(输入特征层,输出通道数,重复次数) 具体的r , b 解释看上面定义部分
#104 ,104, 128
x = resblock_body(x, 128, 2)
#52, 52, 256
x = resblock_body(x, 256, 8)
feat1 = x
#26, 26, 512
x = resblock_body(x, 512, 8)
feat2 = x
#13 , 13, 1024
x = resblock_body(x, 1024, 4)
feat3 = x
return feat1,feat2,feat3
#---------------------------------------------------#
import numpy as np
class YOLO_Kmeans:
def __init__(self, cluster_number, filename):
self.cluster_number = cluster_number
self.filename = "2012_train.txt"
def iou(self, boxes, clusters): # 1 box -> k clusters
n = boxes.shape[0]
k = self.cluster_number
box_area = boxes[:, 0] * boxes[:, 1]
box_area = box_area.repeat(k)
box_area = np.reshape(box_area, (n, k))
cluster_area = clusters[:, 0] * clusters[:, 1]
cluster_area = np.tile(cluster_area, [1, n])
cluster_area = np.reshape(cluster_area, (n, k))
box_w_matrix = np.reshape(boxes[:, 0].repeat(k), (n, k))
cluster_w_matrix = np.reshape(np.tile(clusters[:, 0], (1, n)), (n, k))
min_w_matrix = np.minimum(cluster_w_matrix, box_w_matrix)
box_h_matrix = np.reshape(boxes[:, 1].repeat(k), (n, k))
cluster_h_matrix = np.reshape(np.tile(clusters[:, 1], (1, n)), (n, k))
min_h_matrix = np.minimum(cluster_h_matrix, box_h_matrix)
inter_area = np.multiply(min_w_matrix, min_h_matrix)
# 计算IOU值
result = inter_area / (box_area + cluster_area - inter_area)
return result
def avg_iou(self, boxes, clusters):
accuracy = np.mean([np.max(self.iou(boxes, clusters), axis=1)])
return accuracy
def kmeans(self, boxes, k, dist=np.median):
#聚类问题
box_number = boxes.shape[0]
distances = np.empty((box_number, k))
last_nearest = np.zeros((box_number,))
np.random.seed()
clusters = boxes[np.random.choice(
box_number, k, replace=False)] # init k clusters
while True:
#此处没有使用欧氏距离,较大的box会比较小的box产生更多的错误。自定义的距离度量公式为:
#d(box,centroid)=1-IOU(box,centroid)。到聚类中心的距离越小越好,但IOU值是越大越好,所以用#1 - IOU,这样就保证距离越小,IOU值越大。
distances = 1 - self.iou(boxes, clusters)
current_nearest = np.argmin(distances, axis=1)
if (last_nearest == current_nearest).all():
break # clusters won't change
for cluster in range(k):
clusters[cluster] = dist( # update clusters
boxes[current_nearest == cluster], axis=0)
last_nearest = current_nearest
return clusters
def result2txt(self, data):
f = open("yolo_anchors.txt", 'w')
row = np.shape(data)[0]
for i in range(row):
if i == 0:
x_y = "%d,%d" % (data[i][0], data[i][1])
else:
x_y = ", %d,%d" % (data[i][0], data[i][1])
f.write(x_y)
f.close()
def txt2boxes(self):
f = open(self.filename, 'r')
dataSet = []
for line in f:
infos = line.split(" ")
length = len(infos)
for i in range(1, length):
width = int(infos[i].split(",")[2]) - \
int(infos[i].split(",")[0])
height = int(infos[i].split(",")[3]) - \
int(infos[i].split(",")[1])
dataSet.append([width, height])
result = np.array(dataSet)
f.close()
return result
def txt2clusters(self):
all_boxes = self.txt2boxes()
result = self.kmeans(all_boxes, k=self.cluster_number)
result = result[np.lexsort(result.T[0, None])]
self.result2txt(result)
print("K anchors:\n {}".format(result))
print("Accuracy: {:.2f}%".format(
self.avg_iou(all_boxes, result) * 100))
if __name__ == "__main__":
cluster_number = 9
filename = "2012_train.txt"
kmeans = YOLO_Kmeans(cluster_number, filename)
kmeans.txt2clusters()
kmeans在YOLOV3上面如何实现的
那么有了量化的anchor box后,怎么在实际的模型中加入anchor box的先验经验呢?我们在前面中简单提到过最终负责预测grid cell中对象的box的最小单元是bounding box,那我们可以让一个grid cell输出(预测)多个bounding box,然后每个bounding box负责预测不同的形状不就行了?比如前面例子中的3个不同形状的anchor box,我们的一个grid cell会输出3个参数相同的bounding box,第一个bounding box负责预测的形状与anchor box 1类似的box,其他两个bounding box依次类推。作者在YOLOv3中取消了v2之前每个grid cell只负责预测一个对象的限制,也就是说grid cell中的三个bounding box都可以预测对象,当然他们应该对应不同的ground truth。那么如何在训练中确定哪个bounding box负责某个ground truth呢?方法是求出每个grid cell中每个anchor box与ground truth box的IOU(交并比),IOU最大的anchor box对应的bounding box就负责预测该ground truth,也就是对应的对象。
*********************************************************************************************************************************************************************************************************************************************************************************************************