dite目标检测目标检测实战

转载

mob64ca1410eb61 2024-05-09 19:52:27

文章标签 dite目标检测目标检测 xml ios python 文章分类 计算机视觉人工智能

在搭建完object detection环境之后（参考文章：）我便开始着手参照文章做一个自己的小应用，目标是通过训练图片，让机器学习检测图片中是否含有武大靖这个人。

1、在网上收集武大靖的照片，一共100张，文件名1-100.jpg

在models\research\object_detection目录下面新建images文件夹，存放图片。新建连个子文件夹 train和test，分别存放训练集图片和测试集图片。

2、使用下载LabelImg软件，对图片进行标注：

dite目标检测目标检测实战_python

自定义标签 wdj，每个图片保存之后会生成一个.xml文件，保存标签的位置信息。

3、将xml文件转换成csv文件：

# -*- coding: utf-8 -*-
"""
Created on Fri May  3 08:24:30 2019

@author: jack
"""

import os
import glob
import pandas as pd
import xml.etree.ElementTree as ET


def xml_to_csv(path):
    xml_list = []
    for xml_file in glob.glob(path + '\\*.xml'):
        print(xml_file)
        tree = ET.parse(xml_file)
        root = tree.getroot()
        for member in root.findall('object'):
            value = (root.find('filename').text,
                     int(root.find('size')[0].text),
                     int(root.find('size')[1].text),
                     member[0].text,
                     int(member[4][0].text),
                     int(member[4][1].text),
                     int(member[4][2].text),
                     int(member[4][3].text)
                     )
            xml_list.append(value)
    column_name = ['filename', 'width', 'height', 'class', 'xmin', 'ymin', 'xmax', 'ymax']
    xml_df = pd.DataFrame(xml_list, columns=column_name)
    return xml_df


def main():
    print(os.getcwd())
    for folder in ['train','test']:
        image_path = os.path.join('C:\\Users\\jack\\models\\research\\object_detection\\', ('images\\' + folder))  
        print(image_path)
#这里就是需要访问的.xml的存放地址
        xml_df = xml_to_csv(image_path)                              
        #object_detection/images/train or test
        xml_df.to_csv(('C:\\Users\\jack\\models\\research\\object_detection\\images\\' + folder + '_labels.csv'), index=None)
        print('Successfully converted xml to csv.')

main()

image_path这里容易出错，我索性改成了绝对路径。执行成功之后images文件夹下面就有两个新的csv文件了。文件内容如下：

dite目标检测目标检测实战_ios_02

4、将csv文件生成tfrecord格式：这一步把我折磨惨了。。。。一直报错。

程序如下：

"""
Usage:
  # From tensorflow/models/
  # Create train data:
  python generate_tfrecord.py --csv_input=images/train_labels.csv --image_dir=images/train --output_path=train.record


  python generate_tfrecord.py --csv_input=images/test_labels.csv  --image_dir=images/test --output_path=test.record
"""
from __future__ import division
from __future__ import print_function
from __future__ import absolute_import

import os
import io
import pandas as pd
import tensorflow as tf

from PIL import Image
from object_detection.utils import dataset_util
from collections import namedtuple, OrderedDict

flags = tf.app.flags
flags.DEFINE_string('csv_input', '', 'Path to the CSV input')
flags.DEFINE_string('image_dir', '', 'Path to the image directory')
flags.DEFINE_string('output_path', '', 'Path to output TFRecord')
FLAGS = flags.FLAGS

def class_text_to_int(row_label):
    if row_label == 'wdj':  #这里需要自己修改，有多少个类就该多少个！！！！
        return 1
    else:
        None


def split(df, group):
    data = namedtuple('data', ['filename', 'object'])
    gb = df.groupby(group)
    return [data(filename, gb.get_group(x)) for filename, x in zip(gb.groups.keys(), gb.groups)]


def create_tf_example(group, path):
    with tf.gfile.GFile(os.path.join(path, '{}'.format(group.filename)), 'rb') as fid:
        encoded_jpg = fid.read()
    encoded_jpg_io = io.BytesIO(encoded_jpg)
    image = Image.open(encoded_jpg_io)
    width, height = image.size

    filename = group.filename.encode('utf8')
    image_format = b'jpg'
    xmins = []
    xmaxs = []
    ymins = []
    ymaxs = []
    classes_text = []
    classes = []

    for index, row in group.object.iterrows():
        xmins.append(row['xmin'] / width)
        xmaxs.append(row['xmax'] / width)
        ymins.append(row['ymin'] / height)
        ymaxs.append(row['ymax'] / height)
        classes_text.append(row['class'].encode('utf8'))
        classes.append(class_text_to_int(row['class']))

    tf_example = tf.train.Example(features=tf.train.Features(feature={
        'image/height': dataset_util.int64_feature(height),
        'image/width': dataset_util.int64_feature(width),
        'image/filename': dataset_util.bytes_feature(filename),
        'image/source_id': dataset_util.bytes_feature(filename),
        'image/encoded': dataset_util.bytes_feature(encoded_jpg),
        'image/format': dataset_util.bytes_feature(image_format),
        'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
        'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
        'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
        'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
        'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
        'image/object/class/label': dataset_util.int64_list_feature(classes),
    }))
    return tf_example
def main(_):
    writer = tf.python_io.TFRecordWriter(FLAGS.output_path)
    path = os.path.join(os.getcwd(), FLAGS.image_dir)
    print(path)
    examples = pd.read_csv(FLAGS.csv_input)
    grouped = split(examples, 'filename')
    for group in grouped:
        tf_example = create_tf_example(group, path)
        writer.write(tf_example.SerializeToString())
    writer.close()
    output_path = os.path.join(os.getcwd(), FLAGS.output_path)
    print('Successfully created the TFRecords: {}'.format(output_path))


if __name__ == '__main__':
    tf.app.run()

将这个程序放在object_detection目录下，保存为generate_tfrecord.py文件。 cmd将目录切换到object_detection目录下，执行一下命令： python generate_tfrecord.py --csv_input=images/train_labels.csv --image_dir=images/train --output_path=train.record

刚开始报的错：

UnicodeEncodeError: ‘utf-8’ codec can’t encode character ‘\udcd5’ in position 84: surrogates not all

参考文档：

应该是斜线和反斜线的错误，于是我把相对路径全部改成了如上所示的绝对路径。继续报错：

dite目标检测目标检测实战_目标检测_03

看报错是找不到文件，但是我反复确认文件存在，路径没错！！奔溃无数次之后，我想到我的文件名是90，但是不是需要加上后缀90.jpg 于是我在csv文件中filename那一列把文件名全部加上后缀，居然成功了！labelImg在生成xml的时候，filename这个标签不包含后缀，把我害惨了！！！！！！！！既然它加不了，索性我在xml生成csv的时候动一下手脚：我把这一句

value = (root.find('filename').text,

改成了：

value = (root.find('filename').text+'.jpg',

重新生成csv文件之后，在运行转换成tfrecord的命令，成功了！还有一个重要的地方是程序中定义类别那里一定要自己修改，不然也会报错。在这个离子中只有wdj一个标签。

def class_text_to_int(row_label):

一个小小的问题都可能让一个人精神奔溃。。。

dite目标检测目标检测实战_python_04

万里长征走了一公里，我感觉走了一个世纪。

吐完血之后，开始训练。 5、在这里下载我们需要的模型：https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md

我选择的是第一个：ssd_mobilenet_v1_coco 这个下载挺快的，迅雷一分钟。我将下载完的模型放在object_detection目录下ssd_mobilenet_v1_coco_2018_01_28

6、在object_detection文件夹下建立一个training文件夹，然后将上述模型对应的配置文件从C:\Users\jack\models\research\object_detection\samples\configs文件夹的 ssd_mobilenet_v1_coco.config拷贝到training文件夹下，并进行相应的修改。文件的内容如下：

# SSD with Mobilenet v1 configuration for MSCOCO Dataset.
# Users should configure the fine_tune_checkpoint field in the train config as
# well as the label_map_path and input_path fields in the train_input_reader and
# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that
# should be configured.

model {
  ssd {
    num_classes: 1  #需要修改的地方1：分类的类别数目，这里只有一个
    box_coder {
      faster_rcnn_box_coder {
        y_scale: 10.0
        x_scale: 10.0
        height_scale: 5.0
        width_scale: 5.0
      }
    }
    matcher {
      argmax_matcher {
        matched_threshold: 0.5
        unmatched_threshold: 0.5
        ignore_thresholds: false
        negatives_lower_than_unmatched: true
        force_match_for_each_row: true
      }
    }
    similarity_calculator {
      iou_similarity {
      }
    }
    anchor_generator {
      ssd_anchor_generator {
        num_layers: 6
        min_scale: 0.2
        max_scale: 0.95
        aspect_ratios: 1.0
        aspect_ratios: 2.0
        aspect_ratios: 0.5
        aspect_ratios: 3.0
        aspect_ratios: 0.3333
      }
    }
    image_resizer {
      fixed_shape_resizer {
        height: 300
        width: 300
      }
    }
    box_predictor {
      convolutional_box_predictor {
        min_depth: 0
        max_depth: 0
        num_layers_before_predictor: 0
        use_dropout: false
        dropout_keep_probability: 0.8
        kernel_size: 1
        box_code_size: 4
        apply_sigmoid_to_scores: false
        conv_hyperparams {
          activation: RELU_6,
          regularizer {
            l2_regularizer {
              weight: 0.00004
            }
          }
          initializer {
            truncated_normal_initializer {
              stddev: 0.03
              mean: 0.0
            }
          }
          batch_norm {
            train: true,
            scale: true,
            center: true,
            decay: 0.9997,
            epsilon: 0.001,
          }
        }
      }
    }
    feature_extractor {
      type: 'ssd_mobilenet_v1'
      min_depth: 16
      depth_multiplier: 1.0
      conv_hyperparams {
        activation: RELU_6,
        regularizer {
          l2_regularizer {
            weight: 0.00004
          }
        }
        initializer {
          truncated_normal_initializer {
            stddev: 0.03
            mean: 0.0
          }
        }
        batch_norm {
          train: true,
          scale: true,
          center: true,
          decay: 0.9997,
          epsilon: 0.001,
        }
      }
    }
    loss {
      classification_loss {
        weighted_sigmoid {
        }
      }
      localization_loss {
        weighted_smooth_l1 {
        }
      }
      hard_example_miner {
        num_hard_examples: 3000
        iou_threshold: 0.99
        loss_type: CLASSIFICATION
        max_negatives_per_positive: 3
        min_negatives_per_image: 0
      }
      classification_weight: 1.0
      localization_weight: 1.0
    }
    normalize_loss_by_num_matches: true
    post_processing {
      batch_non_max_suppression {
        score_threshold: 1e-8
        iou_threshold: 0.6
        max_detections_per_class: 100
        max_total_detections: 100
      }
      score_converter: SIGMOID
    }
  }
}

train_config: {
  batch_size: 1  #可以修改的地方1：batch的大小，这里使用cpu版本的tensorflow可以改小一点
  optimizer {
    rms_prop_optimizer: {
      learning_rate: {
        exponential_decay_learning_rate {
          initial_learning_rate: 0.004
          decay_steps: 800720
          decay_factor: 0.95
        }
      }
      momentum_optimizer_value: 0.9
      decay: 0.9
      epsilon: 1.0
    }
  }
 
  # Note: The below line limits the training process to 200K steps, which we
  # empirically found to be sufficient enough to train the pets dataset. This
  # effectively bypasses the learning rate schedule (the learning rate will
  # never decay). Remove the below line to train indefinitely.
fine_tune_checkpoint: "C:\Users\jack\models\research\object_detection\ssd_mobilenet_v1_coco_2018_01_28\model.ckpt"             #可以修改的地方：是否进行finetune，指定model的目录
from_detection_checkpoint: true    #可以修改的地方：是否进行finetune
 
  num_steps: 20000  #可以修改的地方，迭代次数，时间有限的话可以改小一点，但是效果差
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  data_augmentation_options {
    ssd_random_crop {
    }
  }
}

train_input_reader: {
  tf_record_input_reader {
    input_path: "C:\Users\jack\models\research\object_detection\train.record"
    #需要修改的地方 训练集的地址
  }
  label_map_path: "C:\Users\jack\models\research\object_detection\trainning\wdj.pbtxt"
   #需要修改的地方 映射关系文件的目录，后面讲如何生成  
}

eval_config: {
  num_examples: 8000  #需要修改的地方：测试集的样本个数
  # Note: The below line limits the evaluation process to 10 evaluations.
  # Remove the below line to evaluate indefinitely.
  max_evals: 10
}

eval_input_reader: {
  tf_record_input_reader {
    input_path: "C:\Users\jack\models\research\object_detection\test.record" #需要修改的地方 测试集的地址
  }
  label_map_path: "C:\Users\jack\models\research\object_detection\trainning\wdj.pbtxt"
  #需要修改的地方 映射关系文件的目录，后面讲如何生成
  shuffle: false
  num_readers: 1
}

需要修改的地方和可以修改的地方一共有几处，都已经表明了。 7、生成映射关系文件：wdj.pbtxt 随便找个txt文件，贴入：

item {
  id: 1
  name: 'wdj'
}

放在config文件中指定的目录中，这里是C:\Users\jack\models\research\object_detection\trainning\wdj.pbtxt

8、终于要开始训练了！但是我知道肯定没那么简单

C:\Users\jack\models\research\object_detection>python model_main.py --logtostderr --model_dir=trainning/ --pipeline_config_path=trainning/ssd_mobilenet_v1_coco.config

果然还是报错了：（1） “error：No modul named pycocotools” 因为之前的COCOAPI没有windows版本，不过在大神们的努力下github里面开源了能够在windows下执行的文件，具体的安装方式为：重新打开commond窗口，然后cd到models文件夹下，接着执行如下语句：

pip install git+https://github.com/philferriere/cocoapi.git#subdirectory=PythonAPI

然后将编译好的pycocotools文件夹拷贝到Tensorflow object detection API 的research文件夹下，就大功告成了。但是，为了使得上述编译过程可以顺利进行，电脑上面必须要有Visual C++ 14.0，没有的话建议大家下载visual c++ 2015 build tools进行安装。这一块内容是参考 (https://github.com/philferriere/cocoapi/blob/master/README.md)

但是我按照这个方法尝试了还是错。最后我参考这篇文章：除了上述命令之外，我采用了第二种方法：在 https://github.com/philferriere/cocoapi 下载源码，并进行解压。以管理员身份打开 CMD 终端，并切换到 *\cocoapi-master\PythonAPI 目录。运行以下指令：

python setup.py build_ext install

运行以上指令时如果出现以下错误提示：

error: Microsoft Visual C++ 14.0 is required.

// 或者

error: Unable to find vcvarsall.bat

解决方法：此种安装方法需要使用 Microsoft Visual C++ 14.0 对 COCO 源码进行编译。如果本地不支持 Microsoft Visual C++ 14.0 或者版本低于 14.0，可以通过安装 Microsoft Visual Studio 2015 及以上版本。下载地址： https://visualstudio.microsoft.com/zh-hans/visual-cpp-build-tools/ 只要选择c++的那个选项就行了，其他的不用选。用这种方法终于成功了 --------------------- 。 9、重新开始训练：

C:\Users\jack\models\research\object_detection>python model_main.py --logtostderr --model_dir=training/ --pipeline_config_path=training/ssd_mobilenet_v1_coco.config

发现说文件打不开。。。仔细检查，发现我的文件夹名字是trainning…

C:\Users\jack\models\research\object_detection>python model_main.py --logtostderr --model_dir=trainning/ --pipeline_config_path=trainning/ssd_mobilenet_v1_coco.config

真是细致活，来不得半点马虎。

然鹅，还是报错。。。

dite目标检测目标检测实战_dite目标检测_05

百度了一下，感觉不是普遍的错。仔细读了一下，感觉是路径的名字里有错，

参考这篇文章：

于是我把config文件中的路径都改成双斜线：例如

C:\\Users\\jack\\models\\research\\object_detection\\train.record

哈哈，真的不报这个错了，但是还是继续报错，我根本就不意外。

InvalidArgumentError ： image_size must contain 3 elements[4]

查了一下，是因为我的图片不是RGB的。终于是一个普遍的问题了。。。不是彩色的就是RGB三通道的吗，我吐了！！！在网上找了个程序，把所有图片都扫描一遍，还好，只有两个图片是错的。

from PIL import Image     
import os       
path = 'C:\\Users\\jack\\models\\research\\object_detection\\images\\test\\' #图片目录 
for file in os.listdir(path): 
     
     extension = file.split('.')[-1]
     if extension == 'jpg':
           fileLoc = path+file
           img = Image.open(fileLoc)
           if img.mode != 'RGB':
                 print(file+', '+img.mode)

把不是rgb的图片替换掉之后，或者转换成rgb格式的图片之后，从新生成csv、TFRecord文件，再次输入命令：

C:\Users\jack\models\research\object_detection>python model_main.py --logtostderr --model_dir=trainning/ --pipeline_config_path=trainning/ssd_mobilenet_v1_coco.config

跳出来好几个warning，然后好像没有报错，终于泡起来了！GOOD JOB~

还没开心多久，又报错了：

dite目标检测目标检测实战_目标检测_06

好在这个错误在我参考文章里提到了：

TypeError: can’t pickle dict_values objects”

解决方案为：我们进入到D:\tensorflow1\models\research\object_detection下，然后打开model_lib.py文件，接着找到下图中所标出的位置，最后将category_index.values()改为list(category_index.values())即可。

dite目标检测目标检测实战_ios_07

终于开始泡起来了，速度有点慢，cpu版本的破电脑。

dite目标检测目标检测实战_ios_08

10、此时，可以使用tensorboard查看运行的情况。另开一个cmd窗口，更改路径到object_detection:

tensorboard --logdir=trainning

按照提示在浏览器中输入地址：

http://jack-PC:6006 跳出tensorboard界面了：

dite目标检测目标检测实战_xml_09

我咋看到我的精度是在下降的呢，瀑布。不过我是以跑通程序为主，后面再说精度把。

在模型跑的过程中可以任意时间中断，查看目前模型的效果如何。

在trainning目录下，发现多了很多文件，是checkpoint点生成的文件。

11、模型导出：

在object_detetion目录下，有个export_inference_graph.py文件。新建一个文件夹命名为wdj_detection用于存放导出的模型。

在cmd命令框中输入如下命令：

models\research\object_detection>python export_inference_graph.py --input_type image_tensor --pipeline_config_path trainning/ssd_mobilenet_v1_coco.config --trained_checkpoint_prefix trainning/model.ckpt-3582 --output_directory wdj_detection

其中 training/ssd_mobilenet_v1_coco.config是我们训练模型的配置文件 training/model.ckpt-3582是他trainning目录下挑选的checkpoint点文件。 wdj_detection是模型导出的文件夹名称。

这一次很顺利地导出了model，没有报错，撒花~~~

12、查看模型的泛化能力：模型下载完毕，那个这个模型处理新的图片效果如何？在test_images目录下载几张图片，可以包含武大靖的。 jupyter打开object_detection目录下的object_detection_tutorial.ipynb文件，在此程序下进行修改：

MODEL_NAME = 'ssd_mobilenet_v1_coco_2017_11_17'

此处修改成自己的model名称：

MODEL_NAME = 'wdj_detection'

下面两行因为不用下载模型了，可以删除：

MODEL_FILE = MODEL_NAME + '.tar.gz'
  DOWNLOAD_BASE = 'http://download.tensorflow.org/models/object_detection/'

这里不用改： PATH_TO_FROZEN_GRAPH = MODEL_NAME + ‘/frozen_inference_graph.pb’

# List of the strings that is used to add correct label for each box.

这里改成自己的pbtxt文件：

PATH_TO_LABELS = os.path.join('trainning', 'wdj.pbtxt')

download程序块全部删除：

opener = urllib.request.URLopener()
opener.retrieve(DOWNLOAD_BASE + MODEL_FILE, MODEL_FILE)
tar_file = tarfile.open(MODEL_FILE)
for file in tar_file.getmembers():
  file_name = os.path.basename(file.name)
  if 'frozen_inference_graph.pb' in file_name:
    tar_file.extract(file, os.getcwd())

检测程序块需要修改：

PATH_TO_TEST_IMAGES_DIR = 'test_images'
#TEST_IMAGE_PATHS = [ os.path.join(PATH_TO_TEST_IMAGES_DIR, 'image{}.jpg'.format(i)) for i in range(1, 3) ]
TEST_IMAGE_PATHS =os.listdir('C:\\Users\\jack\\models\\research\\object_detection\\test_images')
 IMAGE_SIZE = (12, 8)

最后，在打开图片的时候修改一下：

for image_path in TEST_IMAGE_PATHS:
  print(image_path)
  image = Image.open('C:\\Users\\jack\\models\\research\\object_detection\\test_images\\'+image_path)

然后点击最上面Cell，run all运行全部程序。至此，整个过程全部结束！

dite目标检测目标检测实战_ios_10

由于训练时间不长，而且样本有点少，所以效果很差。后面希望改进吧。总算是圆满了。

后续：为了提高图片的识别成功率，我做了优化： 1、增加样本数，从50提高到100个 2、在配置文件中使用了 fine tune，在之前这两句是被注释掉的，现在去掉注释

fine_tune_checkpoint: "C:\Users\jack\models\research\object_detection\ssd_mobilenet_v1_coco_2018_01_28\model.ckpt"          
from_detection_checkpoint: true

重新训练 20000次，再次之前需要把trainning中的文件删除。

结果如下：有了明显的提高

dite目标检测目标检测实战_python_11