我这几天结合之前的阅读梳理了一下关于multi-modal相关的文献,附件包含了所列文献,你可以按照以下内容进行系统调研:

一、调研建议首先从综述入手,再针对综述中的参考文献进行深入调研,以下是比较好的Multi-modal方面的一些综述:

(1)ACL 2020上有个Tutorial:Multi-modal Information Extraction from Text, Semi-structured, and Tabular Data on the Web,地址:Multi-modal Information Extraction from Text, Semi-structured, and Tabular Data on the Web

(2)KDD2020 上有个Tutorial:Multi-modal Network Representation Learning,地址:https://chuxuzhang.github.io/KDD20_Tutorial.html

(3)多模态视觉语言表征学习研究综述 (软件学报2021)

(4)Survey on Deep Multi-modal Data Analytics Collaboration Rivalry and Fusion (ACM Trans. Multimedia Comput. Commun. Appl. 2021)

(5)A Survey on Deep Learning for Multimodal Data Fusion (Neural Computation2020)

(6)Multimodal Machine Learning A Survey and Taxonomy (IEEE Transactions on Pattern Analysis and Machine Intelligence 2019)

(7)A comprehensive survey on multimodal medical signals fusion for smart healthcare system (Information Fusion2021,偏重医疗领域传感器数据融合)

(8)A review of multimodal image matching Methods and application (Information Fusion2021,偏重图像匹配)

(9)Multi-source knowledge fusion a survey (World Wide Web Journal 2020,偏重知识图谱融合,部分和multi-modal有点关系)

(10)Heterogeneous network representation learning A unified framework with survey and benchmark (IEEE Transactions on Knowledge and Data Engineering 2020,heterogeneous network本身也包含了multi-modal信息,而且graph在很多应用中是一个模态)

二、Multi-modal的一个核心问题是进行信息融合,很多研究关注于此,以下是比较新的一些工作:

(1)Deep Multimodal Fusion by Channel Exchanging (NIPS2020)

(2)Memory based fusion for multi-modal deep learning (Information Fusion201)

(3)Modality to Modality Translation: An Adversarial Representation Learning and Graph Fusion Network for Multimodal Fusion (AAAI 2020)

三、Multi-modal数据上基于Transformer的模型应该是未来的一个发展方向,我们也需要关注,我们当前需要解决的很多多模态相关问题可以考虑从multi-modal transformer的角度来设计方法,以下是代表性文章:

(1)Dynamic Graph Representation Learning for Video Dialog via Multi-Modal Shuffled Transformer (AAAI2021)

(2)InterBERT Vision-and-Language Interaction for Multi-modal Pr