cernterpoint 目标检测目标检测 fpn

转载

mob64ca14092155 2024-02-26 08:40:11

文章标签 cernterpoint 目标检测目标检测计算机视觉深度学习卷积 文章分类 计算机视觉人工智能

1、FPN网络结构

基于feature pyrimid来检测不同scale的object，共有4种思路

cernterpoint 目标检测目标检测 fpn_cernterpoint 目标检测

(a)使用图像金字塔构建特征金字塔,在每个图像尺度上独立计算的

(b)只使用单一尺度的特征

(c)重用由卷积神经网络计算的金字塔特征层次，仿佛它是一个特征图像金字塔。

(d)我们提出的特征金字塔网络(FPN)与(b)和(c)一样快，但更准确。

FPN-Structure：基于CNN固有的pyramid hierarchy，通过skip connection构建top-down path，仅需少量成本生成feature pyramid，并且feature pyramid的每个scale都具有high-level semantic feature，最终在feature pyramid的各个level上进行目标检测

cernterpoint 目标检测目标检测 fpn_目标检测_02

FPN包含两个部分： 1、Bottom-up pathway 2、Top-down pathway and lateral connections

bottom-up path：

将backbone分为多个stage，将每个stage定义为1个pyramid level

输出：每个stage中，所有layer输出特征图的size是相同的，取其中最后1层的输出作为该stage的输出，因为每个stage中最深的层应该具有最强的特征

下采样：相邻stage之间的下采样比例为2

top-down path：

动机：high-level semantic information有助于识别目标但不利于定位目标，low-level spatial information有害于识别目标但有助于定位目标

构建：通过skip connection来构建top-down path

注：在开始top-down path之前，会在bottom-up path顶层使用1×1卷积生成较低分辨率的特征图

skip connection：

1、将来自top-down path的coarser-resolution feature map上采样。上采样比例为2，简单起见就使用nearest neighbor upsampling

2、使用1×1卷积减少来自bottom-up path的对应feature map的通道数

3、对上2步得到的2个feature map（size和channel数量都相同）进行element-wise addition

最近邻插值：

cernterpoint 目标检测目标检测 fpn_目标检测_03

2、FPN-ResNet结构

本文将ResNet的后4个stage[C2.C3.C4.C5]（相对于输入的下采样比例分别为4、8、16、32）的输出定义为4个pyramid level，不将第1个stage的输出包含到FPN中因为其内存占用量比较大。

cernterpoint 目标检测目标检测 fpn_计算机视觉_04

3、在Faster RCNN中的应用

FPN应用于RPN：

FPN输出：[P2,P3,P4,P5,P6]，其中[P6]只是1个步长为2的下采样，引入它是为了覆盖更大的anchor scale [512*512] RPN结构：1个3×3卷积 + 2个并行1×1卷积

RPN输入：在5个pyramid level上，分别运行同1个RPN

anchor：5个level共有5×3=15种

anchor scale：引入FPN后，每个pyrimid level上的anchor就不需要是multi-scale的了。每个pyramid level上的anchor各只有1种scale，[P2,P3,P4,P5,P6]上anchor的scale分别为

纵横比：每个level上都有3个aspect ratio的anchor（1:2, 1:1, 2:1）

cernterpoint 目标检测目标检测 fpn_计算机视觉_05

FPN for Fast RCNN

cernterpoint 目标检测目标检测 fpn_目标检测_06

1个在input image上的size为(w,h)的RoI应该被分配到feature pyramid上的level Pk：

cernterpoint 目标检测目标检测 fpn_目标检测_07

其中224为ImageNet的预训练size，k0是1个224×224的RoI应该被映射到的target pyramid level。 ResNet原文中的Faster RCNN使用C4作为RPN的输入，所以本文将k0设为4。

假如RoI的scale小于224×224（比如112×112，正好是224的一半），则它会被映射到像素数多的层（比如3）。

ResNet中使用conv5作为conv4输出的feature map顶部的head，但本文已经将conv5用于构建FPN。因此本文使用RoI pooling生成7×7的feature，然后用2个1024维的FC层+ReLU，然后再输入到最终的classification layer和BBox regression layer。相比于standard conv5 head，我们的方法参数更少、速度更快

4、Faster R-CNN+FPN细节图

cernterpoint 目标检测目标检测 fpn_卷积_08