目标检测中NMS缺点

转载

mob64ca140f9cec 2024-09-13 20:39:07

文章标签 目标检测中NMS缺点目标检测深度学习 python YOLO 文章分类 计算机视觉人工智能

个人学习笔记见解，如有错误，望指出。

导读

在做Map计算前需要：预测框（预测的结果）与真实框（真实标注的label），预测框的格式为["Class_name",conf,x1,y2,x2,y2]，真实框其格式为["Class_name",x1,y1,x2,y2]。

Class_name是该预测框的种类，conf是置信度，x1,y1,x2,y2是框对应图片的左上角、右下角坐标。

将做Map计算需要的材料准备好后，对预测框按照conf置信度从大到小进行排序，如下图(图中仅展示2行，通常预测框数量远远大于真实框)：

Class_name	conf	x1	y1	x2	y2
aeroplane	0.9996	103	79	368	183
aeroplane	0.8657	126	89	206	118

预测框

Class_name	x1	y1	x2	y2
aeroplane	104	78	375	183
aeroplane	133	88	197	123

真实框

然后依次取出预测框与并与该图中同类别的真实框计算iou，当iou大于设定的阈值时，认为该预测框为TP(正例)，否者为FP（负例）。注：iou计算时，一次取一张图的预测结果与该图的真实结果计算，依次取预测框跟所有真实框计算iou，如取置信度为0.9996的预测框与真实框中所有类别为aeroplane的计算iou，只要有一个真实框跟它的iou大于所设定的阈值时，我们就认为该预测框为正例，记为TP。

全部计算完毕后得到如下格式的文件。

Class_name	conf	x1	y1	x2	y2	TP
aeroplane	0.9996	103	79	368	183	1
aeroplane	0.8657	126	89	206	118	0

然后利用上述得到的文件计算单张图片的准确率（Precision）跟召回率（Recall）

目标检测中NMS缺点_目标检测中NMS缺点

目标检测中NMS缺点_目标检测中NMS缺点_02

目标检测中NMS缺点_深度学习_03

召回率的分母为该图标注中的该类别的全部真实框，可直接从真实框文件中获取，准确率的分母TP、FP从上述计算完的文件中获取。

Ap可以直接理解为准确率跟召回率为坐标轴围起来的面积，如下图所围的面积。

目标检测中NMS缺点_python_04

Map的m是mean 意味这平均的意思，就是先求得各类的ap值后，取平均。得到map。

再提一嘴Map0.5、Map0.5：0.95。还记得上述制作带有TP格式的文件时候将预测框与真实框进行iou计算，当iou的值大于所设阈值的时候就认为其是正例，Map0.5 这里的0.5就是所设的阈值，0.5：0.95的意思是分别取0.5 、0.55、0.60、0.65一直到0.95这些阈值挨个计算其map值，然后再取平均，所以Map0.5：0.95的值一般是要低于Map0.5的值。

初始参数设置

以VOC数据集跟SSD模型为例在SSD的get_map.py文件中预设参数如下

#------------------------------------------------------------------------------------------------------------------#
    #   map_mode用于指定该文件运行时计算的内容
    #   map_mode为0代表整个map计算流程，包括获得预测结果、获得真实框、计算VOC_map。
    #   map_mode为1代表仅仅获得预测结果。
    #   map_mode为2代表仅仅获得真实框。
    #   map_mode为3代表仅仅计算VOC_map。
    #   map_mode为4代表利用COCO工具箱计算当前数据集的0.50:0.95map。需要获得预测结果、获得真实框后并安装pycocotools才行
    #-------------------------------------------------------------------------------------------------------------------#
    map_mode        = 0
    #--------------------------------------------------------------------------------------#
    #   此处的classes_path用于指定需要测量VOC_map的类别
    #   一般情况下与训练和预测所用的classes_path一致即可
    #--------------------------------------------------------------------------------------#
    classes_path    = 'model_data/voc_classes.txt'
    #--------------------------------------------------------------------------------------#
    #   MINOVERLAP用于指定想要获得的mAP0.x，mAP0.x的意义是什么请同学们百度一下。
    #   比如计算mAP0.75，可以设定MINOVERLAP = 0.75。
    #
    #   当某一预测框与真实框重合度大于MINOVERLAP时，该预测框被认为是正样本，否则为负样本。
    #   因此MINOVERLAP的值越大，预测框要预测的越准确才能被认为是正样本，此时算出来的mAP值越低，
    #--------------------------------------------------------------------------------------#
    MINOVERLAP      = 0.5
    #--------------------------------------------------------------------------------------#
    #   受到mAP计算原理的限制，网络在计算mAP时需要获得近乎所有的预测框，这样才可以计算mAP
    #   因此，confidence的值应当设置的尽量小进而获得全部可能的预测框。
    #   
    #   该值一般不调整。因为计算mAP需要获得近乎所有的预测框，此处的confidence不能随便更改。
    #   想要获得不同门限值下的Recall和Precision值，请修改下方的score_threhold。
    #--------------------------------------------------------------------------------------#
    confidence      = 0.02
    #--------------------------------------------------------------------------------------#
    #   预测时使用到的非极大抑制值的大小，越大表示非极大抑制越不严格。
    #   
    #   该值一般不调整。
    #--------------------------------------------------------------------------------------#
    nms_iou         = 0.5
    #---------------------------------------------------------------------------------------------------------------#
    #   Recall和Precision不像AP是一个面积的概念，因此在门限值不同时，网络的Recall和Precision值是不同的。
    #   
    #   默认情况下，本代码计算的Recall和Precision代表的是当门限值为0.5（此处定义为score_threhold）时所对应的Recall和Precision值。
    #   因为计算mAP需要获得近乎所有的预测框，上面定义的confidence不能随便更改。
    #   这里专门定义一个score_threhold用于代表门限值，进而在计算mAP时找到门限值对应的Recall和Precision值。
    #---------------------------------------------------------------------------------------------------------------#
    score_threhold  = 0.5
    #-------------------------------------------------------#
    #   map_vis用于指定是否开启VOC_map计算的可视化
    #-------------------------------------------------------#
    map_vis         = False
    #-------------------------------------------------------#
    #   指向VOC数据集所在的文件夹
    #   默认指向根目录下的VOC数据集
    #-------------------------------------------------------#
    VOCdevkit_path  = 'VOCdevkit'
    #-------------------------------------------------------#
    #   结果输出的文件夹，默认为map_out
    #-------------------------------------------------------#
    map_out_path    = 'map_out'

获取VOC数据集图片ID

VOC数据集中的test.txt：

目标检测中NMS缺点_YOLO_05

首先读取VOC数据集中图片的ID号以便输入到预测模型中产生预测结果，具体代码与结果：

image_ids = open(os.path.join(VOCdevkit_path, "VOC2007/ImageSets/Main/test.txt")).read().strip().split()

目标检测中NMS缺点_目标检测_06

获取预测框

对图片进行预测，将预测结果保存备用

遍历image_ids里的图片，将单张图片送如SSD模型

if map_mode == 0 or map_mode == 1:
        ssd = SSD(confidence = confidence, nms_iou = nms_iou)
        for image_id in tqdm(image_ids):
            image_path  = os.path.join(VOCdevkit_path, "VOC2007/JPEGImages/"+image_id+".jpg")
            image       = Image.open(image_path)
            #map_vis用于指定是否开启VOC_map计算的可视化
            if map_vis:
                image.save(os.path.join(map_out_path, "images-optional/" + image_id + ".jpg"))
            #将预测结果保存到map_out_path路径下    
            ssd.get_map_txt(image_id, image, class_names, map_out_path)
        print("Get predict result done.")

将单张图片及其对应的ID，还有数据集种类和输出路径送入ssd.get_map_txt文件。就可以在输出路径下得到单张图片的txt类型的预测结果文件，保存在预测结果文件（detection-results）中。

ssd.get_map_txt(image_id,image,classnames,map_out_path)函数具体的思路为：

对输入进来的image图片进行预处理调整（确保RGB格式、resize），PIL读进来的image格式是[300,300,3]的，这里我们要把它处理成[1,3,300,300]。通过np.expand_dims、np.transpose函数，变换完后是numpy格式的。送入模型前调整成torch格式（torch.from_numpy）。
将图像送入模型中，outputs得到预测框坐标(1,8732,4)以及预测框的置信度(1,8732,21)两组数据：

然后在对预测结果进行解码，这里的预测以及解码过程在上一篇文章里写过：

目标检测中NMS缺点_目标检测中NMS缺点_09

解码后得到的结果是[int,6]这里的int是模型预测结果的数量，6是[x1,y1,x2,y2,label,conf],发现没，这里得到的解码结果就差在前后位置不一样了，所以对这个结果进行调整，调整成[label,conf,x1,y1,x2,y2]后，就完成了一张图片的预测，将其保存为txt文件，储存起来。
每次一张图片生成一个带有预测结果的txt文件。

目标检测中NMS缺点_目标检测中NMS缺点_10

具体代码如下：

def get_map_txt(self, image_id, image, class_names, map_out_path):
        f = open(os.path.join(map_out_path, "detection-results/"+image_id+".txt"),"w") 
        #---------------------------------------------------#
        #   计算输入图片的高和宽
        #---------------------------------------------------#
        image_shape = np.array(np.shape(image)[0:2])
        #---------------------------------------------------------#
        #   在这里将图像转换成RGB图像，防止灰度图在预测时报错。
        #   代码仅仅支持RGB图像的预测，所有其它类型的图像都会转化成RGB
        #---------------------------------------------------------#
        image       = cvtColor(image)
        #---------------------------------------------------------#
        #   给图像增加灰条，实现不失真的resize
        #   也可以直接resize进行识别
        #---------------------------------------------------------#
        image_data  = resize_image(image, (self.input_shape[1], self.input_shape[0]), self.letterbox_image)
        #---------------------------------------------------------#
        #   添加上batch_size维度，图片预处理，归一化。
        #---------------------------------------------------------#
        image_data = np.expand_dims(np.transpose(preprocess_input(np.array(image_data, dtype='float32')), (2, 0, 1)), 0)

        with torch.no_grad():
            #---------------------------------------------------#
            #   转化成torch的形式
            #---------------------------------------------------#
            images = torch.from_numpy(image_data).type(torch.FloatTensor)
            if self.cuda:
                images = images.cuda()
            #---------------------------------------------------------#
            #   将图像输入网络当中进行预测！
            #---------------------------------------------------------#
            outputs     = self.net(images)
            #-----------------------------------------------------------#
            #   将预测结果进行解码
            #-----------------------------------------------------------#
            results     = self.bbox_util.decode_box(outputs, self.anchors, image_shape, self.input_shape, self.letterbox_image, 
                                                    nms_iou = self.nms_iou, confidence = self.confidence)
            #--------------------------------------#
            #   如果没有检测到物体，则返回原图
            #--------------------------------------#
            if len(results[0]) <= 0:
                return 

            top_label   = np.array(results[0][:, 4], dtype = 'int32')
            top_conf    = results[0][:, 5]
            top_boxes   = results[0][:, :4]
        
        for i, c in list(enumerate(top_label)):
            predicted_class = self.class_names[int(c)]
            box             = top_boxes[i]
            score           = str(top_conf[i])

            top, left, bottom, right = box
            if predicted_class not in class_names:
                continue

            f.write("%s %s %s %s %s %s\n" % (predicted_class, score[:6], str(int(left)), str(int(top)), str(int(right)),str(int(bottom))))

        f.close()
        return

获取真实框

真实框来源于标注文件，我们只要将标注文件读进来，然后按最开始导读中需要的格式进行排版后写入到输出文件夹下：

Class_name	x1	y1	x2	y2
aeroplane	104	78	375	183
aeroplane	133	88	197	123

需要的真实框保存格式

VOC中标注文件存在Annotations中：

我们需要<name><bndbox>这两块信息，先依次将图片id从图片列表（image_ids）中取出，然后将取出的单张图片的id通过os.path.join与VOC中的Annotations文件拼在一起，用以访问对应的图片标注信息文件，因为文件是xml格式，通过xml.etree.ElementTree库访问该文件，找到对应的需要的信息后，将其以txt格式储存在输出文件下的真实框保存路径ground-truth。具体细节与解读如代码里所示。

if map_mode == 0 or map_mode == 2:
        print("Get ground truth result.")
        for image_id in tqdm(image_ids):
            with open(os.path.join(map_out_path, "ground-truth/"+image_id+".txt"), "w") as new_f:
                root = ET.parse(os.path.join(VOCdevkit_path, "VOC2007/Annotations/"+image_id+".xml")).getroot()
                for obj in root.findall('object'):
                    difficult_flag = False
                    if obj.find('difficult')!=None:
                        difficult = obj.find('difficult').text
                        if int(difficult)==1:
                            difficult_flag = True
                    obj_name = obj.find('name').text
                    if obj_name not in class_names:
                        continue
                    bndbox  = obj.find('bndbox')
                    left    = bndbox.find('xmin').text
                    top     = bndbox.find('ymin').text
                    right   = bndbox.find('xmax').text
                    bottom  = bndbox.find('ymax').text

                    if difficult_flag:
                        new_f.write("%s %s %s %s %s difficult\n" % (obj_name, left, top, right, bottom))
                    else:
                        new_f.write("%s %s %s %s %s\n" % (obj_name, left, top, right, bottom))
        print("Get ground truth result done.")

Map计算过程

计算Map思路如导读里所述，既通过预测框文件夹里的预测结果txt文件跟真实框文件里的真实框文件计算带有TP。

Class_name	conf	x1	y1	x2	y2
aeroplane	0.9996	103	79	368	183
aeroplane	0.8657	126	89	206	118

预测框

Class_name	x1	y1	x2	y2
aeroplane	104	78	375	183
aeroplane	133	88	197	123

真实框

预测框按conf排序后与真实框进行iou计算，计算结果大于所设阈值则为TP，否则为FP，记得TP为1，FP为0。

Class_name	conf	x1	y1	x2	y2	TP
aeroplane	0.9996	103	79	368	183	1
aeroplane	0.8657	126	89	206	118	0

计算结果

然后通过计算结果再计算准确率，召回率以及map。

if map_mode == 0 or map_mode == 3:
        print("Get map.")
        get_map(MINOVERLAP, True, score_threhold = score_threhold, path = map_out_path)
        print("Get map done.")

先来看get_map.py中的代码，可以发现这里用了

get_map(MINOVERLAP, True, score_threhold = score_threhold, path = map_out_path)

在不解读之前，我们可以大概的读懂，

MINOVERLAP是我们之前说的预测框与真实框计算iou后所设的阈值，大于该阈值则认为是TP，也就是Map0.5，这个0.5的由来。

score_threhold是什么呢？如果认真看过初始参数里的介绍，应该就可以知道。我们知道预测框预测结果有很多很多，除了conf高的几个框之外，其他conf低的几乎可以直接pass掉，这里的score_threhold就是用来进行pass掉conf低的那些框，所以它也是置信度conf的阈值，如名字叫的一样，得分（score）阈值。

map_out_path 就是我们前面费进去获取的预测框文件跟真实框文件的保存路径

get_map函数直接使用

到这里如果我们可以直接用get_map函数了，只要我们按上述所示将预测框与真实框提取并改成计算所需的格式后以txt形式储存在map_out_path下。然后直接传入get_map（）函数，即可得到所要的结果。

创建计算和保存所需要的文件夹

展开get_map(MINOVERLAP, True, score_threhold = score_threhold, path = map_out_path)

def get_map(MINOVERLAP, draw_plot, score_threhold=0.5, path = './map_out'):
    GT_PATH             = os.path.join(path, 'ground-truth')
    DR_PATH             = os.path.join(path, 'detection-results')
    IMG_PATH           = os.path.join(path, 'images-optional')
    TEMP_FILES_PATH     = os.path.join(path, '.temp_files')
    RESULTS_FILES_PATH  = os.path.join(path, 'results')

IMG_PATH这个文件只有当可视化打开的时候才生效，map_vis==True，里面保存图片，用于可视化访问打开图片。因为可视化（map_vis）没打开，所以IMG_PATH中没有文件，show_animiation就为False。简单来说这一小段就是可视化的开关，可视化不是重点，以下跟可视化有关的代码就都略了。

show_animation = True
    if os.path.exists(IMG_PATH): 
        for dirpath, dirnames, files in os.walk(IMG_PATH):#os.walk来遍历文件夹及子文件夹下所有文件并得到路径
            if not files:
                show_animation = False
    else:
        show_animation = False

TEMP_FILES_PATH是临时用的文件夹，处理过程中的文件会暂时储存在这里。

RESULTS_FILES_PATH是结果输出文件，输出的结果也就是我们想要的，这里计算map的时候先把results文件夹清空一下。

if not os.path.exists(TEMP_FILES_PATH):
        os.makedirs(TEMP_FILES_PATH)
        
    if os.path.exists(RESULTS_FILES_PATH):
        shutil.rmtree(RESULTS_FILES_PATH)#shutil.rmtree() #递归地删除文件
    else:
        os.makedirs(RESULTS_FILES_PATH)

在results文件中创建要保存的内容的文件夹（"AP"、“F1”、"Recall"、"Precision"）

目标检测中NMS缺点_YOLO_15

if draw_plot:
        try:
            matplotlib.use('TkAgg') ## 在一个新窗口打开图形
        except:
            pass
        os.makedirs(os.path.join(RESULTS_FILES_PATH, "AP"))
        os.makedirs(os.path.join(RESULTS_FILES_PATH, "F1"))
        os.makedirs(os.path.join(RESULTS_FILES_PATH, "Recall"))
        os.makedirs(os.path.join(RESULTS_FILES_PATH, "Precision"))

详细代码

def get_map(MINOVERLAP, draw_plot, score_threhold=0.5, path = './map_out'):
    GT_PATH             = os.path.join(path, 'ground-truth')
    DR_PATH             = os.path.join(path, 'detection-results')
    IMG_PATH            = os.path.join(path, 'images-optional')
    TEMP_FILES_PATH     = os.path.join(path, '.temp_files')
    RESULTS_FILES_PATH  = os.path.join(path, 'results')

    show_animation = True
    if os.path.exists(IMG_PATH): 
        for dirpath, dirnames, files in os.walk(IMG_PATH):#os.walk来遍历文件夹及子文件夹下所有文件并得到路径
            if not files:
                show_animation = False
    else:
        show_animation = False

    if not os.path.exists(TEMP_FILES_PATH):
        os.makedirs(TEMP_FILES_PATH)
        
    if os.path.exists(RESULTS_FILES_PATH):
        shutil.rmtree(RESULTS_FILES_PATH)#shutil.rmtree() #递归地删除文件
    else:
        os.makedirs(RESULTS_FILES_PATH)
    if draw_plot:
        try:
            matplotlib.use('TkAgg') ## 在一个新窗口打开图形
        except:
            pass
        os.makedirs(os.path.join(RESULTS_FILES_PATH, "AP"))
        os.makedirs(os.path.join(RESULTS_FILES_PATH, "F1"))
        os.makedirs(os.path.join(RESULTS_FILES_PATH, "Recall"))
        os.makedirs(os.path.join(RESULTS_FILES_PATH, "Precision"))
    if show_animation:
        os.makedirs(os.path.join(RESULTS_FILES_PATH, "images", "detections_one_by_one"))

真实框文件转换

将真实框以{"class_name": "aeroplane", "bbox": "104 78 375 183", "used": false}格式保存到json文件中，多了used这个属性，主要用于进一步的iou计算，计算时用过这个框后used就变True。

文件转换

转换前	转换后

内容转换

转换前
转换后

具体操作：

先获取真实框路径下的每张图片真实框的txt文件路径，将其保存到一个列表中，进而遍历该列表可依次访问txt文件；
依次取一张图片将其处理后保存为json格式。
该环节，还记录以下内容:

整个数据集包含的目标类别，比如所有图中有20个类别的物体；
记录的是整个数据集中各种类分别有几个，比如所有图中有300只猫；
记录的是整个数据集中各种类分别有几张图，比如有10张图有猫。

代码详解：

ground_truth_files_list = glob.glob(GT_PATH + '/*.txt')#获取GT路径中的所有图像的txt信息
    if len(ground_truth_files_list) == 0:
        error("Error: No ground-truth files found!")
    ground_truth_files_list.sort()
    gt_counter_per_class     = {}#记录的是整个数据集中各种类分别有几个，比如所有图中有300只猫
    counter_images_per_class = {}#记录的是整个数据集中各种类分别有几张图，比如有10张图有猫
    #依次拿出一张真图
    #将所有真图转变成为json格式
    for txt_file in ground_truth_files_list:
        file_id     = txt_file.split(".txt", 1)[0]
        file_id     = os.path.basename(os.path.normpath(file_id))# normpath规范化路径，消除双斜杠等。
        #去查找有没有对应的预测图，如果没有对应的预测图就不对真图做处理
        temp_path   = os.path.join(DR_PATH, (file_id + ".txt"))
        if not os.path.exists(temp_path):
            error_msg = "Error. File not found: {}\n".format(temp_path)
            error(error_msg)
        #读取该真图中的所有label
        lines_list      = file_lines_to_list(txt_file)
        bounding_boxes  = []
        is_difficult    = False
        already_seen_classes = []
        #依次取出真图中的一个label
        for line in lines_list:
            try:
                if "difficult" in line:
                    class_name, left, top, right, bottom, _difficult = line.split()
                    is_difficult = True
                else:
                    class_name, left, top, right, bottom = line.split()
            except: #Python的except用来捕获所有异常， 因为Python里面的每次错误都会抛出 一个异常， 有异常的时候就按下面来处理 一个一个访问再赋值，
                # 没有异常就直接 class_name, left, top, right, bottom, _difficult = line.split()
                if "difficult" in line:
                    line_split  = line.split()
                    _difficult  = line_split[-1]
                    bottom      = line_split[-2]
                    right       = line_split[-3]
                    top         = line_split[-4]
                    left        = line_split[-5]
                    class_name  = ""
                    for name in line_split[:-5]:
                        class_name += name + " "
                    class_name  = class_name[:-1]
                    is_difficult = True
                else:
                    line_split  = line.split()
                    bottom      = line_split[-1]
                    right       = line_split[-2]
                    top         = line_split[-3]
                    left        = line_split[-4]
                    class_name  = ""
                    for name in line_split[:-4]:
                        class_name += name + " "
                    class_name = class_name[:-1]

            bbox = left + " " + top + " " + right + " " + bottom
            if is_difficult:
                bounding_boxes.append({"class_name":class_name, "bbox":bbox, "used":False, "difficult":True})
                is_difficult = False
            else:
                bounding_boxes.append({"class_name":class_name, "bbox":bbox, "used":False})
                if class_name in gt_counter_per_class:
                    gt_counter_per_class[class_name] += 1
                else:
                    gt_counter_per_class[class_name] = 1
                #一张图片的一个种类只记录一次
                #如果这种类别在这种图片中出现过就不再记录，如果没有出现过就记录一次
                if class_name not in already_seen_classes:
                    if class_name in counter_images_per_class:
                        counter_images_per_class[class_name] += 1
                    else:
                        counter_images_per_class[class_name] = 1
                    already_seen_classes.append(class_name)

        with open(TEMP_FILES_PATH + "/" + file_id + "_ground_truth.json", 'w') as outfile:
            json.dump(bounding_boxes, outfile)

    gt_classes  = list(gt_counter_per_class.keys())
    gt_classes  = sorted(gt_classes)
    n_classes   = len(gt_classes)

预测框文件转换

我们先来看一下转换后的格式：

转换前	转换后

发现没，预测框转换为json文件是按类转换的，从导读部分，我们可以知道按类转换其实就是便于计算具体类的AP值，当把所有类的AP求得后，取平均就是Map的值了。

内容转换：

转换前（000032图、一图中包含多类）
转换后（aeroplane类、一类覆盖所有图）

以aeroplane类为例，该类按conf置信度从大到小排序，可表示如下：

Class_name	conf	x1	y1	x2	y2
aeroplane	0.9996	103	79	368	183
aeroplane	0.8657	126	89	206	118

Class_name就是该类的名字。

然后去计算每个预测框的与其对应图片的真实框的IOU，json文件中的file_id就是其对应的真实框。计算完毕后：

Class_name	conf	x1	y1	x2	y2	TP
aeroplane	0.9996	103	79	368	183	1
aeroplane	0.8657	126	89	206	118	0

根据上表就可计算Map。

预测框文件转换具体操作步骤：

先获取预测框路径下的每张图片预测框的txt文件路径，将其保存到一个列表中，进而遍历该列表可依次访问txt文件；
上述真实框文件转换时，获得到整个数据集包含的目标类别，这里遍历这个类别列表，进而按类储存文件。
遍历txt文件路径，获取一张图片的预测框txt文件路径，获取该图片的id，一会保存到flie_id里
打开这张图片的txt文件，按行读取，读取完后转换为：

{"confidence":confidence, "file_id":file_id, "bbox":bbox}格式暂存在一个bounding_boxes列表容器中，对这个列表进行按置信度大小排序，然后处理下一张图。

每张图处理完的结果都保存到bounding_boxes列表容器中，直到把该类中的所有图都处理完，将bounding_boxes列表写入该类的json文件。
处理下一个类，直到把所有类处理完。

具体代码如下：

#开始取预测结果中的文件了
    dr_files_list = glob.glob(DR_PATH + '/*.txt')
    dr_files_list.sort()
    #按类依次取出预测集中的图片
    for class_index, class_name in enumerate(gt_classes):
        bounding_boxes = []
        #依次取出图片
        for txt_file in dr_files_list:
            file_id = txt_file.split(".txt",1)[0]
            file_id = os.path.basename(os.path.normpath(file_id))
            temp_path = os.path.join(GT_PATH, (file_id + ".txt"))
            #第一次访问的时候先看看路径有没有错
            if class_index == 0:
                if not os.path.exists(temp_path):
                    error_msg = "Error. File not found: {}\n".format(temp_path)
                    error(error_msg)
            #读取txt_file文件中的每一行，也就是每个预测框
            lines = file_lines_to_list(txt_file)
            for line in lines:
                try:
                    tmp_class_name, confidence, left, top, right, bottom = line.split()
                except:
                    line_split      = line.split()
                    bottom          = line_split[-1]
                    right           = line_split[-2]
                    top             = line_split[-3]
                    left            = line_split[-4]
                    confidence      = line_split[-5]
                    tmp_class_name  = ""
                    for name in line_split[:-5]:
                        tmp_class_name += name + " "
                    tmp_class_name  = tmp_class_name[:-1]

                if tmp_class_name == class_name:
                    bbox = left + " " + top + " " + right + " " +bottom
                    bounding_boxes.append({"confidence":confidence, "file_id":file_id, "bbox":bbox})

        bounding_boxes.sort(key=lambda x:float(x['confidence']), reverse=True)
        with open(TEMP_FILES_PATH + "/" + class_name + "_dr.json", 'w') as outfile:
            json.dump(bounding_boxes, outfile)

4.计算Map

计算Map前提是计算

Class_name	conf	x1	y1	x2	y2	TP
aeroplane	0.9996	103	79	368	183	1
aeroplane	0.8657	126	89	206	118	0

也就是计算TP，FP。

具体步骤：

打开aeroplane类的预测框json文件：

内容格式{"confidence":confidence, "file_id":file_id, "bbox":bbox}。

统计预测框总数量，以总数量为长度做TP、FP列表。计算后的结果就可以写入这个列表了。
依次访问这些预测框，取出一个预测框，获取它的"file_id"根据id号打开这种图对应的真实框json文件，
然后依次从这个json文件中取出真实框，先判断取出的真实框是不是要计算的aeroplane类，是同类的话，将预测框跟这个取出的真实框进行iou计算，取预测框跟依次取出的真实框计算iou值最大的那个真实框为预测框匹配的结果，如果iou值大于所设阈值（MINOVERLAP）则TP中记录下来，然后将这个真实框的used设置为True，再保存到真实框的json文件中。

（个人理解：其实我感觉这里更新真实框的used状态没什么必要，只要计算的iou大于所设的阈值（Map0.5的0.5），就认为该预测框是TP然后记录就可以。）

进而获取到TP,FP列表。对TP、FP列表进行累加

召回率rec对应的值就是：

TP/该类中所以真实框的总数

准确率prec对应的值为：：

TP/TP+FP

AP就是准确率prec跟召回率rec围起来的面积，将prec跟rec传入voc的计算公式可直接计算出其近似面积，根据F1的计算公式，也可计算得。
剩下的就是一些画图，读写保存文件的操作。

发现没有，只要得到TP，FP两个列表后，这些都可以计算。

第5点只是简单说个大概，看不明白的可以参考这个链接

具体代码如下：

sum_AP = 0.0
    ap_dictionary = {}
    lamr_dictionary = {}
    with open(RESULTS_FILES_PATH + "/results.txt", 'w') as results_file:
        results_file.write("# AP and precision/recall per class\n")
        count_true_positives = {}

        for class_index, class_name in enumerate(gt_classes):
            count_true_positives[class_name] = 0
            dr_file = TEMP_FILES_PATH + "/" + class_name + "_dr.json"
            dr_data = json.load(open(dr_file))

            nd          = len(dr_data)
            tp          = [0] * nd
            fp          = [0] * nd
            score       = [0] * nd
            score_threhold_idx = 0
            for idx, detection in enumerate(dr_data):
                file_id     = detection["file_id"]
                score[idx]  = float(detection["confidence"])
                if score[idx] >= score_threhold:
                    score_threhold_idx = idx

                gt_file             = TEMP_FILES_PATH + "/" + file_id + "_ground_truth.json"
                ground_truth_data   = json.load(open(gt_file))
                ovmax       = -1
                gt_match    = -1
                bb          = [float(x) for x in detection["bbox"].split()]
                for obj in ground_truth_data:
                    if obj["class_name"] == class_name:
                        bbgt    = [ float(x) for x in obj["bbox"].split() ]
                        bi      = [max(bb[0],bbgt[0]), max(bb[1],bbgt[1]), min(bb[2],bbgt[2]), min(bb[3],bbgt[3])]
                        iw      = bi[2] - bi[0] + 1
                        ih      = bi[3] - bi[1] + 1
                        if iw > 0 and ih > 0:
                            ua = (bb[2] - bb[0] + 1) * (bb[3] - bb[1] + 1) + (bbgt[2] - bbgt[0]
                                            + 1) * (bbgt[3] - bbgt[1] + 1) - iw * ih
                            ov = iw * ih / ua
                            if ov > ovmax:
                                ovmax = ov
                                gt_match = obj

                if show_animation:
                    status = "NO MATCH FOUND!" 
                    
                min_overlap = MINOVERLAP
                if ovmax >= min_overlap:
                    if "difficult" not in gt_match:
                        if not bool(gt_match["used"]):
                            tp[idx] = 1
                            gt_match["used"] = True
                            count_true_positives[class_name] += 1
                            with open(gt_file, 'w') as f:
                                    f.write(json.dumps(ground_truth_data))
                            if show_animation:
                                status = "MATCH!"
                        else:
                            fp[idx] = 1
                            if show_animation:
                                status = "REPEATED MATCH!"
                else:
                    fp[idx] = 1
                    if ovmax > 0:
                        status = "INSUFFICIENT OVERLAP"
            cumsum = 0
            for idx, val in enumerate(fp):
                fp[idx] += cumsum
                cumsum += val
                
            cumsum = 0
            for idx, val in enumerate(tp):
                tp[idx] += cumsum
                cumsum += val

            rec = tp[:]
            for idx, val in enumerate(tp):
                rec[idx] = float(tp[idx]) / np.maximum(gt_counter_per_class[class_name], 1)

            prec = tp[:]
            for idx, val in enumerate(tp):
                prec[idx] = float(tp[idx]) / np.maximum((fp[idx] + tp[idx]), 1)

            ap, mrec, mprec = voc_ap(rec[:], prec[:])
            F1  = np.array(rec)*np.array(prec)*2 / np.where((np.array(prec)+np.array(rec))==0, 1, (np.array(prec)+np.array(rec)))

            sum_AP  += ap
            text    = "{0:.2f}%".format(ap*100) + " = " + class_name + " AP " #class_name + " AP = {0:.2f}%".format(ap*100)

参考：

Pytorch 搭建自己的SSD目标检测平台

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。