1.数据融合:

输入的数据,也即被融合的数 据,既可以是来自传感器的原始数据或仅经过预 处理的数据[30, 63, 67],也可以是集成模型的子模型 输出的特征向量、特征图或计算结果[55-56]。

文 献 [67] 对由深度相机提供的 RGB 图像和深度图 像分别进行处理后融合,文中提供了多个可能的 融合位置,当模型提取特征前就通过向量连接 时,融合结果作为后续完整模型的输入,属于数 据融合

在 Zhou 等[53] 的论文中,激光雷达点云被投影到 RGB 图像的像 素平面,并被添加到图像作为扩展的第 4 通道,展 示了一种常用的与 RGB 图像融合的方法。

2.特征融合

具 体地,特征融合通常指模型输入在模型计算过程 中的特征图之间的融合操作,且融合的结果继续 作为当前模型的特征图参与计算[25-26],但在某些 模型中,原始数据经过预处理后也可能与特征图 进行特征融合[56-57]。

DeepAI 在 2019 年的论文中 提及了一种鲁棒性的融合方法,其中点云和图像 分别由单独的 SSD 分支处理,其特征图进行交叉 融合,同样的方法在另一篇车道线检测的论文中 同样出现[71]。Liang 等[56] 提出了一种复杂的融合 模型,其设计的 dense fusion 模块将 4 个分支的数 据进行融合,再作为点云和图像两个处理网络的 特征图继续处理,展现了特征层次的融合方法对 数据来源的兼容性。UC Berkley 的研究者展现了 一种新颖的特征融合方法[59],通过设计一个稀疏 矩阵来实现图像到图像的鸟瞰图、点云鸟瞰图到 点云前视图的转化,并通过向量的连接来融合两 种模态的特征。两种典型的融合方法在文献 [30] 中被展现:采用叠加点云到图像的扩展通道的方 法来实现数据融合,通过连接特征图的方法来实 现特征融合。

3.结果融合

按定义,结果融合 主要包括并行的模型分支的计算结果的融合操 作,以及串行模型中,后续子模型在已有结果上 进行的针对同一目标输出的优化计算[56]。在文 献 [69] 中,模型将基于点云和基于图像两个分支 的提取的区域建议 (region proposal) 进行融合,从 而得到所有的候选检测框,由于目标检测模型的 候选框由区域建议回归得到,因此可以视为针对 候选框的结果融合。类似的融合在文献 [29] 中也 有体现。文献 [48] 可以视为一项只包含结果融合 的研究。其所提及的融合模型对 3 个模型分支的 边界框融合,虽然此后仍有特征提取的操作,但 由于融合后的计算仅针对边界框 (网络输入) 的 提升,因此将后续对边界框处理的整个子网络视 为结果融合模块。类似的设计见于文献 [68],所 提出的模型同样专注于对已有的检测框融合、调 整,这部分的融合操作仅限于对已有结果的提 升,因此属于结果融合。除此之外,结果融合还 广泛地被应用于文献 [51, 62-66] 中。

文献

1.数据融合

[30] FISCHER V , HERMAN M, BEHNKE S. Multispectral pedestrian detection using deep fusion convolutional neural networks[C]//European Symposium on artificial Neural Networks. Bruges, Belgium, 2016: 27−29.

[53]  ZHOU T, JIANG K, XIAO Z, et al. Object detection using multi-sensor fusion based on deep learning[C]//CO- TA International Conference of Transportation. Nanjing, China, 2019: 5770−5782.

[55]DOU J, XUE J, FANG J. SEG-VoxelNet for 3D vehicle detection from RGB and lidar data[C]//International Con- ference on Robotics and Automation. Montreal, Canada, 2019: 4362−4368.

[56] LIANG M, Y ANG B, CHEN Y , et al. Multi-task multi- sensor fusion for 3D object detection[C]//IEEE Confer-

ence on Computer Vision and Pattern Recognition. Long Beach, USA, 2019: 7345−7353.

[63] PFEUFFER A, DIETMAYER K. Optimal sensor data fu- sion architecture for object detection in adverse weather conditions[C]//International Conference on Information Fusion. Cambridge, UK, 2018: 1-8.

67]SCHNEIDER L, JASCH M. Multimodal neural networks: RGB-D for semantic segmentation and object detection[C]//Scandinavian Conference on Image Analys-is. Norrköping, Sweden, 2017: 98−109.

2.特征融合

[25]CHADWICK S, MADDERN W, NEWMAN P. Distant vehicle detection using radar and vision[C]//International Conference on Robotics and Automation. Montreal, Canada, 2019: 8311−8317.

[26]BIJELIC M, GRUBER T. Seeing through fog without seeing fog: deep sensor fusion in the absence of labeled training data[C]//IEEE Conference on Computer Vision and Pattern Recognition. Virtual, 2020: 11621-11631.

[30] FISCHER V , HERMAN M, BEHNKE S. Multispectral pedestrian detection using deep fusion convolutional neural networks[C]//European Symposium on artificial Neural Networks. Bruges, Belgium, 2016: 27−29.

[56] LIANG M, Y ANG B, CHEN Y , et al. Multi-task multi- sensor fusion for 3D object detection[C]//IEEE Confer-

ence on Computer Vision and Pattern Recognition. Long Beach, USA, 2019: 7345−7353.

[57] SINDAGI V A, ZHOU Y, TUZEL O. MVX-Net: mul- timodal VoxelNet for 3D object detection[C]//2019 Inter- national Conference on Robotics and Automation. Montreal, Canada, 2019: 7276−7282.

[59] WANG Z, ZHAN W, TOMIZUKA M. Fusing bird’s eye view LIDAR point cloud and front view camera image for deep object detection[C]//IEEE Intelligent V ehicles Sym- posium. Changshu, China, 2018: 1−6.

3.结果融合

[56] LIANG M, Y ANG B, CHEN Y , et al. Multi-task multi- sensor fusion for 3D object detection[C]//IEEE Confer-

ence on Computer Vision and Pattern Recognition. Long Beach, USA, 2019: 7345−7353.

[69]KIM T, GHOSH J. Robust detection of nonmotorized road users using deep learning on optical and lidar data[C]//IEEE Iinternational Conference on Intelligent Transportation Systems. Rio de Janeiro, Brazil, 2016: 271−276.

[29] LIU J, ZHANG S, W ANG S, et al. Multispectral deep neural networks for pedestrian detection[C]//British Ma- chine Vision Conference. York, UK, 2016: 1−13.

[48] ASV ADI A, GARROTE L, PREMEBIDA C, et al. Mul- timodal vehicle detection: fusing 3D LIDAR and color camera data[J]. Pattern recognition letters, 2018, 115: 20–29.

[51] GUAN D, CAO Y , Y ANG J, et al. Fusion of multispec- tral data through illumination-aware deep neural net- works for pedestrian detection[J]. Information fusion, 2018, 50: 148–157.

[68] OH S, KANG H. Object detection and classification by decision-level fusion for intelligent vehicle systems[J]. Sensors (Basel), 2017, 17(1): 207–214.

[62] KU J, MOZIFIAN M, LEE J, et al. Joint 3D proposal gen- eration and object detection from view aggregation[C]// 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems. Madrid, Spain, 2018: 5750−5757.

[63] PFEUFFER A, DIETMAYER K. Optimal sensor data fu- sion architecture for object detection in adverse weather conditions[C]//International Conference on Information Fusion. Cambridge, UK, 2018: 1-8.

[64] XU D, ANGUELOV D, JAIN A. PointFusion: deep sensor fusion for 3D bounding box estimation[C]//IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA, 2018: 244−253.

[65] DU X, ANG M H, RUS D. Car detection for autonomous vehicle: lidar and vision fusion approach through deep learning framework[C]//IEEE/RSJ International Confer- ence on Intelligent Robots and Systems. Vancouver, Canada, 2017: 749−754.

[66] MA TTI D, EKENEL H K, THIRAN J. Combining LiD- AR space clustering and convolutional neural networks for pedestrian detection[C]//IEEE International Confer- ence on Advanced Video and Signal Based Surveillance. Lecce, Italy, 2017: 1−6.