香港中文大学多媒体实验室 | 开源视频目标检测&跟踪平台（附源码下载）

原创

wx62d966d625404 2022-10-18 19:30:33 ©著作权

文章标签 计算机视觉目标检测 git 文章分类 前端开发

©著作权归作者所有：来自51CTO博客作者wx62d966d625404的原创作品，请联系作者获取转载授权，否则将追究法律责任

香港中文大学多媒体实验室 | 开源视频目标检测&跟踪平台（附源码下载）_目标检测

计算机视觉研究院专栏

作者：Edison_G

香港中文大学多媒体实验室 | 开源视频目标检测&跟踪平台（附源码下载）_计算机视觉_02

从去年2020年，说起目标检测，大多数人也许会知道“MMDetection框架”。今天框架还是香港中文大学实验室贡献，首先我们说下MMDetection框架，然后详细介绍一体化视频感知平台“MMTracking”。

香港中文大学多媒体实验室 | 开源视频目标检测&跟踪平台（附源码下载）_git_03

MMDetection V1.0版本发布以来，就获得很多用户的喜欢，发布以来，其中有不少有价值的建议，同时也有很多开发者贡献代码，在2020年5月6日，发布了MMDetection V2.0。

香港中文大学多媒体实验室 | 开源视频目标检测&跟踪平台（附源码下载）_git_04

经过对模型各个组件的重构和优化，全面提升了MMDetection的速度和精度，达到了现有检测框架中的最优水平。通过更细粒度的模块化设计，MMDetection的任务拓展性大大增强，成为了检测相关项目的基础平台。同时对文档和教程进行了完善，增强用户体验。

香港中文大学多媒体实验室 | 开源视频目标检测&跟踪平台（附源码下载）_git_05

MMDetection中实现了RPN，Fast R-CNN，Faster R-CNN，Mask R-CNN等网络和框架。先简单介绍一下和 Detectron 的对比：

performance 稍高
训练速度稍快
所需显存稍小

但更重要的是，基于PyTorch和基于Caffe2的code相比，易用性是有代差的。成功安装 Detectron的时间，大概可以装好一打的mmdetection吧。

当然Detectron有些优势也很明显，作为第一个全面的detection codebase，加上FAIR的金字招牌，release的模型也比较全面。研究者也在努力扩充model zoo，奈何人力和算力还是有很大差距，所以还需要时间。

具体说说上面提到的三个方面吧。首先是performance ，由于PyTorch官方model zoo里面的ResNet结构和Detectron所用的ResNet有细微差别（mmdetection中可以通过backbone的style参数指定），导致模型收敛速度不一样，所以用两种结构都跑了实验，一般来说在1x的lr schedule下Detectron的会高，但2x的结果PyTorch的结构会比较高。

香港中文大学多媒体实验室 | 开源视频目标检测&跟踪平台（附源码下载）_git_06

速度方面Mask R-CNN差距比较大，其余的很小。采用相同的setting，Detectron每个iteration需要0.89s，而mmdetection只需要0.69s。Fast R-CNN比较例外，比Detectron的速度稍慢。另外在自己的服务器上跑Detectron会比官方report的速度慢20%左右，猜测是FB的Big Basin服务器性能比研究者好？

香港中文大学多媒体实验室 | 开源视频目标检测&跟踪平台（附源码下载）_计算机视觉_07

显存方面优势比较明显，会小30%左右。但这个和框架有关，不完全是codebase优化的功劳。一个让研究者比较意外的结果是现在的codebase版本跑ResNet-50的Mask R-CNN，每张卡（12 G）可以放4张图，比研究者比赛时候小了不少。

香港中文大学多媒体实验室 | 开源视频目标检测&跟踪平台（附源码下载）_计算机视觉_08

MMTracking

MMDetection是商汤科技（2018 COCO 目标检测挑战赛冠军）和香港中文大学开源的一个基于Pytorch实现的深度学习目标检测工具箱。

新年2021年，香港中文大学多媒体实验室（MMLab）OpenMMLab 又研究并贡献新的平台工具，发布了一款一体化视频目标感知平台MMTracking。该框架基于PyTorch写成，支持单目标跟踪、多目标跟踪与视频目标检测，目前已开源。我们开始详细分下下。

香港中文大学多媒体实验室 | 开源视频目标检测&跟踪平台（附源码下载）_计算机视觉_09

主要特征：

第一个统一的视频感知平台

MMLab是第一个统一多功能视频感知任务的开源工具箱，包括视频目标检测，单个目标跟踪，多个目标跟踪。

模块化设计

MMLab将视频感知框架分解成不同的组件，可以很容易地通过组合不同的模块来构建定制的方法。

Simple, Fast and Strong

Simple：MMTracking与其他Open MMLab项目交互。它是建立在MMDetection上的，通过修改配置文件选择。

Fast：所有操作都运行在GPU上。训练和推理速度比其他实现快。

Strong：性能超过最先进的模型，其中一些模型甚至优于官方的实现。

如何使用：

1、Create a conda virtual environment and activate it.

conda create -n open-mmlab python=3.7conda activate open-mmlab

2、Install PyTorch and torchvision following the official instructions, e.g.,

conda

Note: Make sure that your compilation CUDA version and runtime CUDA version match. You can check the supported CUDA version for precompiled packages on the PyTorch website.

E.g.1 If you have CUDA 10.1 installed under /usr/local/cuda and would like to install PyTorch 1.5, you need to install the prebuilt PyTorch with CUDA 10.1.

conda install pytorch cudatoolkit=10.1

E.g. 2 If you have CUDA 9.2 installed under /usr/local/cuda and would like to install PyTorch 1.3.1., you need to install the prebuilt PyTorch with CUDA 9.2.

conda install pytorch=1.3.1 cudatoolkit=9.2 torchvision=0.4.2

If you build PyTorch from source instead of installing the prebuilt pacakge, you can use more CUDA versions such as 9.0.

3、Install mmcv-full, we recommend you to install the pre-build package as below.

pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu101/torch1.6.0/index.html

See here for different versions of MMCV compatible to different PyTorch and CUDA versions. Optionally you can choose to compile mmcv from source by the following command

git clonecdMMCV_WITH_OPS=1 pip install -e .
# package mmcv-full will be installed after this stepcd

Or directly run

pip install mmcv-full

4、Install MMDetection

pip

Optionally, you can also build MMDetection from source in case you want to modify the code:

git clone https://github.com/open-mmlab/mmdetection.gitcd mmdetectionpip
install -r requirements/build.txtpip install -v -e .  # or "python setup.py develop"

5、Clone the MMTracking repository.

git clonecd

6、Install build requirements and then install MMTracking.

pip installpip install -v -e .  # or "python setup.py develop"

使用该平台测试：

This section will show how to test existing models on supported datasets. The following testing environments are supported:

single GPU
single node multiple GPU
multiple nodes

During testing, different tasks share the same API and we only support samples_per_gpu = 1.

You can use the following commands for testing:

# single-gpu testingpython tools/test.py ${CONFIG_FILE} [--checkpoint ${CHECKPOINT_FILE}]
[--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}]# multi-gpu testing./tools/dist_test.sh ${CONFIG_FILE} ${GPU_NUM} [--checkpoint ${CHECKPOINT_FILE}] [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}]

Optional arguments:

CHECKPOINT_FILE: Filename of the checkpoint. You do not need to define it when applying some MOT methods but specify the checkpoints in the config.
RESULT_FILE: Filename of the output results in pickle format. If not specified, the results will not be saved to a file.
EVAL_METRICS: Items to be evaluated on the results. Allowed values depend on the dataset, e.g., bbox is available for ImageNet VID, track is available for LaSOT, bbox and track are both suitable for MOT17.
--cfg-options: If specified, the key-value pair optional cfg will be merged into config file
--eval-options: If specified, the key-value pair optional eval cfg will be kwargs for dataset.evaluate() function, it’s only for evaluation
--format-only: If specified, the results will be formated to the offical format.

香港中文大学多媒体实验室 | 开源视频目标检测&跟踪平台（附源码下载）_计算机视觉_10

计算机视觉研究院主要涉及深度学习领域，主要致力于人脸检测、人脸识别，多目标检测、目标跟踪、图像分割等研究方向。研究院接下来会不断分享最新的论文算法新框架，我们这次改革不同点就是，我们要着重”研究“。之后我们会针对相应领域分享实践过程，让大家真正体会摆脱理论的真实场景，培养爱动手编程爱动脑思考的习惯！

香港中文大学多媒体实验室 | 开源视频目标检测&跟踪平台（附源码下载）_git_11