cylinder 目标检测目标检测centernet

转载

mob6454cc762e37 2024-04-18 19:22:28

文章标签 cylinder 目标检测计算机视觉目标检测宽高 3D 文章分类 计算机视觉人工智能

CenterNet

论文：Objects as Points
地址：https://paperswithcode.com/paper/objects-as-points

基本思想

目标检测一般将图像中物体识别为一个平行坐标轴的框，目前多数的检测器都几乎穷举了图像中所有可能的目标位置然后对其进行分类，这种方式非常低效而且需要额外的后处理。论文提出的检测方法CenterNet则将目标视为单个的点——边界框的中心点，该检测器使用关键点估计来确定中心点，然后回归所有其他的属性，比如尺寸，3D位置，角度甚至是姿态，CenterNet是一个端到端，全程可微，高校简单又准确的检测器，不仅再2d目标检测上表现较好，也可用于3d检测以及人体姿态估计。

CenterNet将目标检测问题转换为一个标准的关键点估计问题，首先将图像喂给一个全卷积网络产生热点图，图上的峰值点对应目标的中心，每个峰值点附近的图像特征来预测目标的宽高，模型训练的方法使用的是标准的密集任务有监督学习，推断则是完全端到端没有nms后处理的方法。

CenterNet与基于Anchor的一阶段检测器类似，一个中心点可以被视为一个形状位置的Anchor，不同的是：

CenterNet确定“anchor”只与位置相关，与框的重叠度无关；
每个目标只有一个正“anchor”因此不需要NMS；
CenterNet输出的分辨率较大OS=4.

形式化

$cylinder 目标检测目标检测centernet_目标检测_02$ 表示宽高分别为 $cylinder 目标检测目标检测centernet_3D_03$ 的输入图像，目标是预测一个关键点热力图 $cylinder 目标检测目标检测centernet_3D_04$ ，其中 $cylinder 目标检测目标检测centernet_cylinder 目标检测_05$ 表征输出的特征图尺寸， $cylinder 目标检测目标检测centernet_计算机视觉_06$ 表示关键点类型的数量（或者目标检测中的目标类别数），论文中 $cylinder 目标检测目标检测centernet_宽高_07$ 。在热力图中 $cylinder 目标检测目标检测centernet_宽高_08$ 对应一个被检测到的关键点， $cylinder 目标检测目标检测centernet_cylinder 目标检测_09$ 表示背景。

训练目标

2D目标检测

$cylinder 目标检测目标检测centernet_计算机视觉_10$ 的GT关键点 $cylinder 目标检测目标检测centernet_计算机视觉_11$ ，模型都会计算一个低分辨率的替换值 $cylinder 目标检测目标检测centernet_目标检测_12$ ，然后使用一个高斯核函数 $cylinder 目标检测目标检测centernet_计算机视觉_13$ 将所有的GT关键点赋予到一幅热力图 $cylinder 目标检测目标检测centernet_宽高_14$ ，如果同一类别的两个高斯生成点重叠，则以较大值为准，训练的目标损失函数如下：

$cylinder 目标检测目标检测centernet_宽高_15$

$cylinder 目标检测目标检测centernet_计算机视觉_16$ 是focal loss的超参数，论文中 $cylinder 目标检测目标检测centernet_宽高_17$ ， $cylinder 目标检测目标检测centernet_计算机视觉_18$ 表示图像 $cylinder 目标检测目标检测centernet_cylinder 目标检测_19$ 中关键点的数量，除以 $cylinder 目标检测目标检测centernet_计算机视觉_18$ 是为了对所有的正损失归一化。

$cylinder 目标检测目标检测centernet_目标检测_21$ ，所有的类别 $cylinder 目标检测目标检测centernet_计算机视觉_10$ 共享同一个偏移与测量，这里使用如下所示L1损失函数：

$cylinder 目标检测目标检测centernet_宽高_23$

$cylinder 目标检测目标检测centernet_计算机视觉_24$ 起作用。

$cylinder 目标检测目标检测centernet_目标检测_25$ 表示目标类别为 $cylinder 目标检测目标检测centernet_3D_26$ 的目标 $cylinder 目标检测目标检测centernet_目标检测_27$ 的边界框，其中心点 $cylinder 目标检测目标检测centernet_目标检测_28$ ，使用关键点估计 $cylinder 目标检测目标检测centernet_cylinder 目标检测_29$ 来预测所有的中心点，另外，为每个目标 $cylinder 目标检测目标检测centernet_目标检测_27$ 进行尺寸 $cylinder 目标检测目标检测centernet_cylinder 目标检测_31$ 的回归，使用L1损失来指导训练：

$cylinder 目标检测目标检测centernet_cylinder 目标检测_32$

$cylinder 目标检测目标检测centernet_cylinder 目标检测_33$ ：

$cylinder 目标检测目标检测centernet_目标检测_34$

$cylinder 目标检测目标检测centernet_目标检测_35$ 个模型输出。

$cylinder 目标检测目标检测centernet_3D_36$ 表示 $cylinder 目标检测目标检测centernet_宽高_37$ 个被检测到的类别为 $cylinder 目标检测目标检测centernet_计算机视觉_10$ 的中心点 $cylinder 目标检测目标检测centernet_cylinder 目标检测_39$ 的集合，每个关键点的位置坐标由 $cylinder 目标检测目标检测centernet_宽高_40$ 给出，使用关键点的值 $cylinder 目标检测目标检测centernet_目标检测_41$ 作为检测的置信度然后在该位置产生一个边界框：

$cylinder 目标检测目标检测centernet_目标检测_42$

$cylinder 目标检测目标检测centernet_cylinder 目标检测_43$ 是偏移预测量， $cylinder 目标检测目标检测centernet_宽高_44$ 是尺寸与测量，其中峰值点提取可以使用一个 $cylinder 目标检测目标检测centernet_cylinder 目标检测_45$ 最大池化，不再需要NMS后处理。

单目3D目标检测

$cylinder 目标检测目标检测centernet_计算机视觉_46$ 对于每个中心点来说是一个单独的所放量，但是深度比较难以直接回归，论文使用将深度作为一个额外的输出通道 $cylinder 目标检测目标检测centernet_目标检测_47$ 来计算，这里也使用L1损失，其中 $cylinder 目标检测目标检测centernet_目标检测_48$ 是GT绝对深度，具体地：

$cylinder 目标检测目标检测centernet_cylinder 目标检测_49$

$cylinder 目标检测目标检测centernet_宽高_50$ 和L1损失来进行回归，其中 $cylinder 目标检测目标检测centernet_cylinder 目标检测_51$ 表示目标的GT长宽高，具体地：

$cylinder 目标检测目标检测centernet_计算机视觉_52$

$cylinder 目标检测目标检测centernet_cylinder 目标检测_53$ 范围内，另一个bin则对应角度在 $cylinder 目标检测目标检测centernet_目标检测_54$ 。对于每个bin，其中两个参数 $cylinder 目标检测目标检测centernet_目标检测_55$ 用于softmax分类（如果朝向叫落在这个bin），另外两个参数 $cylinder 目标检测目标检测centernet_cylinder 目标检测_56$ 用于预测bin内的正弦与余弦偏移（相对于bin中心 $cylinder 目标检测目标检测centernet_宽高_57$ ），令GT朝向角 $cylinder 目标检测目标检测centernet_3D_58$ ，L1损失具体为：

$cylinder 目标检测目标检测centernet_cylinder 目标检测_59$

$cylinder 目标检测目标检测centernet_计算机视觉_60$ 时 $cylinder 目标检测目标检测centernet_cylinder 目标检测_61$ ，其余 $cylinder 目标检测目标检测centernet_3D_62$ ， $cylinder 目标检测目标检测centernet_目标检测_63$ ，最终预测的 $cylinder 目标检测目标检测centernet_宽高_64$ 通过8个参数来编码，其中 $cylinder 目标检测目标检测centernet_宽高_65$ 表示bin的索引：

$cylinder 目标检测目标检测centernet_cylinder 目标检测_66$

人体姿态估计

$cylinder 目标检测目标检测centernet_目标检测_27$ 个2D人体连接点，考虑对于每个中心点姿态为 $cylinder 目标检测目标检测centernet_计算机视觉_68$ 维属性，然后将其参数化为对于中心点的一个偏移，从而直接使用L1损失回归连接点 $cylinder 目标检测目标检测centernet_cylinder 目标检测_69$ 。为了更好的输出关键点，论文还会使用一个标准的自底向上的多人体姿态估计来预测 $cylinder 目标检测目标检测centernet_目标检测_27$ 个人体关键点热力图 $cylinder 目标检测目标检测centernet_3D_71$ 。