CVPR 2024 | idea这不就有了！扩散diffusion模型100+篇论文、40+研究方向（清单版）...

原创

公号机器学习与AI生成创作 2024-05-16 15:33:45 ©著作权

文章标签 github Image Text 文章分类 HarmonyOS 后端开发

©著作权归作者所有：来自51CTO博客作者公号机器学习与AI生成创作的原创作品，请联系作者获取转载授权，否则将追究法律责任

最新视觉顶会 CVPR 2024 会议，涌现出大量基于生成式AIGC的CV论文，尤其扩散模型diffusion为代表！除直接生成，还广泛应用在各类 low-level、high-level 视觉任务！本文集齐和梳理CVPR 2024共40+方向、百篇AIGC+扩散模型论文！均已分类打包好！

关注【机器学习与AI生成创作】公众号，后台回复 CVPR2024 （长按红字、选中复制）即可获取分类、按文件夹汇总好的论文集！！！

本文为清单版，详细版文章很长（CVPR 2024 | 绝了！！最新 diffusion 扩散模型梳理！100+篇论文、40+研究方向！），梳理不易，越到后面越有趣！麻烦列位，转发、分享、三连，多多鼓励！！！

扩散模型应用方向目录

1、扩散模型改进
2、可控文生图
3、风格迁移
4、人像生成
5、图像超分
6、图像恢复
7、目标跟踪
8、目标检测
9、关键点检测
10、deepfake检测
11、异常检测
12、图像分割
13、图像压缩
14、视频理解
15、视频生成
16、倾听人生成
17、数字人生成
18、新视图生成
19、3D相关
20、图像修复
21、草图相关
22、版权隐私
23、数据增广
24、医学图像
25、交通驾驶
26、语音相关
27、姿势估计
28、图相关
29、动作检测/生成
30、机器人规划/智能决策
31、视觉叙事/故事生成
32、因果生成
33、隐私保护-对抗估计
34、扩散模型改进-补充
35、交互式可控生成
36、图像恢复-补充
37、域适应-迁移学习
38、手交互
39、伪装检测
40、多任务学习
41、轨迹预测
42、场景生成
43、流估计-3D相关

一、扩散模型改进

1、Accelerating Diffusion Sampling with Optimized Time Steps

2、DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

https://github.com/mit-han-lab/distrifuser

3、Balancing Act: Distribution-Guided Debiasing in Diffusion Models

4、Few-shot Learner Parameterization by Diffusion Time-steps

5、Structure-Guided Adversarial Training of Diffusion Models

6、Tackling the Singularities at the Endpoints of Time Intervals in Diffusion Models

https://github.com/PangzeCheung/SingDiffusion

7、Boosting Diffusion Models with Moving Average Sampling in Frequency Domain

8、Towards Memorization-Free Diffusion Models

9、SD-DiT: Unleashing the Power of Self-supervised Discrimination in Diffusion Transformer

二、可控文生图

10、ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models

https://lukashoel.github.io/ViewDiff/

11、NoiseCollage: A Layout-Aware Text-to-Image Diffusion Model Based on Noise Cropping and Merging

https://github.com/univ-esuty/noisecollage

12、Discriminative Probing and Tuning for Text-to-Image Generation

https://github.com/LgQu/DPT-T2I

13、Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with LLMs

14、Face2Diffusion for Fast and Editable Face Personalization

https://github.com/mapooon/Face2Diffusion

15、LeftRefill: Filling Right Canvas based on Left Reference through Generalized Text-to-Image Diffusion Model

https://github.com/ewrfcas/LeftRefill

19、One-dimensional Adapter to Rule Them All: Concepts, Diffusion Models and Erasing Applications

https://lyumengyao.github.io/projects/spm

20、FlashEval: Towards Fast and Accurate Evaluation of Text-to-image Diffusion Generative Models

三、风格迁移

21、DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations

https://tianhao-qi.github.io/DEADiff/

22、Deformable One-shot Face Stylization via DINO Semantic Guidance

https://github.com/zichongc/DoesFS

23、One-Shot Structure-Aware Stylized Image Synthesis

四、人像生成

24、Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis

https://github.com/YanzuoLu/CFLD

25、High-fidelity Person-centric Subject-to-Image Synthesis

https://github.com/CodeGoat24/Face-diffuser

26、Towards Effective Usage of Human-Centric Priors in Diffusion Models for Text-based Human Image Generation

https://hcplayercvpr2024.github.io/

27、A Unified and Interpretable Emotion Representation and Expression Generation

https://emotion-diffusion.github.io/

28、CosmicMan: A Text-to-Image Foundation Model for Humans

https://cosmicman-cvpr2024.github.io/

29、DiffHuman: Probabilistic Photorealistic 3D Reconstruction of Humans

30、Texture-Preserving Diffusion Models for High-Fidelity Virtual Try-On

五、图像超分

31、Arbitrary-Scale Image Generation and Upsampling using Latent Diffusion Model and Implicit Neural Decoder

32、Diffusion-based Blind Text Image Super-Resolution

33、Text-guided Explorable Image Super-resolution

34、Building Bridges across Spatial and Temporal Resolutions: Reference-Based Super-Resolution via Change Priors and Conditional Diffusion Model

https://github.com/dongrunmin/RefDiff

六、图像恢复

35、Boosting Image Restoration via Priors from Pre-trained Models

36、Image Restoration by Denoising Diffusion Models with Iteratively Preconditioned Guidance

37、Diff-Plugin: Revitalizing Details for Diffusion-based Low-level Tasks

https://yuhaoliu7456.github.io/Diff-Plugin/

38、Selective Hourglass Mapping for Universal Image Restoration Based on Diffusion Model

https://github.com/iSEE-Laboratory/DiffUIR

39、Shadow Generation for Composite Image Using Diffusion Model

https://github.com/bcmi/Object-Shadow-Generation-Dataset-DESOBAv2

七、目标跟踪

40、Delving into the Trajectory Long-tail Distribution for Muti-object Tracking

https://github.com/chen-si-jia/Trajectory-Long-tail-Distribution-for-MOT

八、目标检测

41、SAFDNet: A Simple and Effective Network for Fully Sparse 3D Object Detection

https://github.com/zhanggang001/HEDNet

42、DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception

43、SDDGR: Stable Diffusion-based Deep Generative Replay for Class Incremental Object Detection

九、关键点检测

44、Pose-Guided Self-Training with Two-Stage Clustering for Unsupervised Landmark Discovery

十、deepfake检测

####45、Latent Reconstruction Error Based Method for Diffusion-Generated Image Detection

十一、异常检测

46、RealNet: A Feature Selection Network with Realistic Synthetic Anomaly for Anomaly Detection

https://github.com/cnulab/RealNet

十二、抠图/分割

47、In-Context Matting

https://github.com/tiny-smart/in-context-matting/tree/master

十三、图像压缩

48、Laplacian-guided Entropy Model in Neural Codec with Blur-dissipated Synthesis

十四、视频理解

49、Abductive Ego-View Accident Video Understanding for Safe Driving Perception

http://www.lotvsmmau.net/

十五、视频生成

50、FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation

51、Grid Diffusion Models for Text-to-Video Generation

52、TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models

https://trip-i2v.github.io/TRIP/

53、Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model

https://github.com/thuhcsi/S2G-MDDiffusion

54、Video Interpolation With Diffusion Models

https://vidim-interpolation.github.io/

十六、倾听人生成

55、CustomListener: Text-guided Responsive Interaction for User-friendly Listening Head Generation

https://customlistener.github.io/

十七、数字人生成

56、Make-Your-Anchor: A Diffusion-based 2D Avatar Generation Framework

https://github.com/ICTMCG/Make-Your-Anchor

十八、新视图生成

57、EscherNet: A Generative Model for Scalable View Synthesis

十九、3D相关

58、Bayesian Diffusion Models for 3D Shape Reconstruction

59、DreamControl: Control-Based Text-to-3D Generation with 3D Self-Prior

https://github.com/tyhuang0428/DreamControl

60、DanceCamera3D: 3D Camera Movement Synthesis with Music and Dance

https://github.com/Carmenw1203/DanceCamera3D-Official

61、DiffuScene: Denoising Diffusion Models for Generative Indoor Scene Synthesis

https://tangjiapeng.github.io/projects/DiffuScene/

62、IPoD: Implicit Field Learning with Point Diffusion for Generalizable 3D Object Reconstruction from Single RGB-D Images

https://yushuang-wu.github.io/IPoD/

63、Move as You Say, Interact as You Can: Language-guided Human Motion Generation with Scene Affordance

https://afford-motion.github.io/

64、MicroDiffusion: Implicit Representation-Guided Diffusion for 3D Reconstruction from Limited 2D Microscopy Projections

https://github.com/UCSC-VLAA/MicroDiffusion

65、Sculpt3D: Multi-View Consistent Text-to-3D Generation with Sparse 3D Prior

https://stellarcheng.github.io/Sculpt3D/

66、Score-Guided Diffusion for 3D Human Recovery

https://statho.github.io/ScoreHMR/

67、Towards Realistic Scene Generation with LiDAR Diffusion Models

https://github.com/hancyran/LiDAR-Diffusion

68、VP3D: Unleashing 2D Visual Prompt for Text-to-3D Generation

https://vp3d-cvpr24.github.io/

二十、图像修复

69、Structure Matters: Tackling the Semantic Discrepancy in Diffusion Models for Image Inpainting

https://github.com/htyjers/StrDiffusion

二十一、草图相关

70、It’s All About Your Sketch: Democratising Sketch Control in Diffusion Models

71、Text-to-Image Diffusion Models are Great Sketch-Photo Matchmakers

二十二、版权隐私

72、CGI-DM: Digital Copyright Authentication for Diffusion Models via Contrasting Gradient Inversion

https://github.com/Nicholas0228/Revelio

73、CPR: Retrieval Augmented Generation for Copyright Protection

二十三、数据增广

74、SatSynth: Augmenting Image-Mask Pairs through Diffusion Models for Aerial Semantic Segmentation

75、ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object

https://github.com/chenshuang-zhang/imagenet_d

二十四、医学图像

76、MedM2G: Unifying Medical Multi-Modal Generation via Cross-Guided Diffusion with Visual Invariant

二十五、交通驾驶

77、Controllable Safety-Critical Closed-loop Traffic Simulation via Guided Diffusion

https://safe-sim.github.io/

78、Generalized Predictive Model for Autonomous Driving

二十六、语音相关

79、FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models

https://shivangi-aneja.github.io/projects/facetalk/

80、ConvoFusion: Multi-Modal Conversational Diffusion for Co-Speech Gesture Synthesis

https://vcai.mpi-inf.mpg.de/projects/ConvoFusion/

81、Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners

https://yzxing87.github.io/Seeing-and-Hearing/

二十七、姿势估计

82、Object Pose Estimation via the Aggregation of Diffusion Features

https://github.com/Tianfu18/diff-feats-pose

二十八、图相关

83、DiffAssemble: A Unified Graph-Diffusion Model for 2D and 3D Reassembly

https://github.com/IIT-PAVIS/DiffAssemble

二十九、动作检测或生成

84、Action Detection via an Image Diffusion Process

85、Lodge: A Coarse to Fine Diffusion Network for Long Dance Generation Guided by the Characteristic Dance Primitives

https://li-ronghui.github.io/lodge

86、OMG: Towards Open-vocabulary Motion Generation via Mixture of Controllers

https://tr3e.github.io/omg-page/

三十、机器人规划/智能决策

87、SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution

https://skilldiffuser.github.io/

三十一、视觉叙事-故事生成

88、Intelligent Grimm - Open-ended Visual Storytelling via Latent Diffusion Models

https://haoningwu3639.github.io/StoryGen_Webpage/

三十二、因果归因

89、 ProMark: Proactive Diffusion Watermarking for Causal Attribution

三十三、隐私保护-对抗估计

90、Robust Imperceptible Perturbation against Diffusion Models

https://github.com/liuyixin-louis/MetaCloak

三十四、扩散模型改进-补充

91、Condition-Aware Neural Network for Controlled Image Generation

https://github.com/mit-han-lab/efficientvit

三十五、交互式可控生成

92、Drag Your Noise: Interactive Point-based Editing via Diffusion Semantic Propagation

https://github.com/haofengl/DragNoise

三十六、图像恢复-补充

93、Generating Content for HDR Deghosting from Frequency View

三十七、域适应/迁移学习

94、Unknown Prompt, the only Lacuna: Unveiling CLIP’s Potential for Open Domain Generalization

https://github.com/mainaksingha01/ODG-CLIP

三十八、手交互

95、Text2HOI: Text-guided 3D Motion Generation for Hand-Object Interaction

https://github.com/JunukCha/Text2HOI

96、InterHandGen: Two-Hand Interaction Generation via Cascaded Reverse Diffusion

https://jyunlee.github.io/projects/interhandgen/

三十九、伪装检测

97、LAKE-RED: Camouflaged Images Generation by Latent Background Knowledge Retrieval-Augmented Diffusion

https://github.com/PanchengZhao/LAKE-RED

四十、多任务学习

98、DiffusionMTL: Learning Multi-Task Denoising Diffusion Model from Partially Annotated Data

https://prismformore.github.io/diffusionmtl/

四十一、轨迹预测

99、SingularTrajectory: Universal Trajectory Predictor Using Diffusion Model

https://github.com/inhwanbae/SingularTrajectory

四十二、场景生成

100、SemCity: Semantic Scene Generation with Triplane Diffusion

https://github.com/zoomin-lee/SemCity

四十三、3D相关/流估计

101、DifFlow3D: Toward Robust Uncertainty-Aware Scene Flow Estimation with Iterative Diffusion-Based Refinement

关注公众号【机器学习与AI生成创作】

上一篇：StoryDiffusion：让多图漫画和长视频更连贯！南开、字节开源

下一篇：从头设计视频生成扩散模型 | Sora之后，OpenAI安全负责人Lilian Weng亲自撰文

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯

CVPR 2024 | idea这不就有了！扩散diffusion模型100+篇论文、40+研究方向（清单版）...

CVPR 2024 | idea这不就有了！扩散diffusion模型100+篇论文、40+研究方向（清单版）...

扩散模型应用方向目录

一、扩散模型改进

1、Accelerating Diffusion Sampling with Optimized Time Steps

2、DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

3、Balancing Act: Distribution-Guided Debiasing in Diffusion Models

4、Few-shot Learner Parameterization by Diffusion Time-steps

5、Structure-Guided Adversarial Training of Diffusion Models

6、Tackling the Singularities at the Endpoints of Time Intervals in Diffusion Models

7、Boosting Diffusion Models with Moving Average Sampling in Frequency Domain

8、Towards Memorization-Free Diffusion Models

9、SD-DiT: Unleashing the Power of Self-supervised Discrimination in Diffusion Transformer

二、可控文生图

10、ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models

11、NoiseCollage: A Layout-Aware Text-to-Image Diffusion Model Based on Noise Cropping and Merging

12、Discriminative Probing and Tuning for Text-to-Image Generation

13、Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with LLMs

14、Face2Diffusion for Fast and Editable Face Personalization

15、LeftRefill: Filling Right Canvas based on Left Reference through Generalized Text-to-Image Diffusion Model

16、InteractDiffusion: Interaction Control in Text-to-Image Diffusion Models

17、MACE: Mass Concept Erasure in Diffusion Models

18、MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis

19、One-dimensional Adapter to Rule Them All: Concepts, Diffusion Models and Erasing Applications

20、FlashEval: Towards Fast and Accurate Evaluation of Text-to-image Diffusion Generative Models

三、风格迁移

21、DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations

22、Deformable One-shot Face Stylization via DINO Semantic Guidance

23、One-Shot Structure-Aware Stylized Image Synthesis

四、人像生成

24、Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis

25、High-fidelity Person-centric Subject-to-Image Synthesis

26、Towards Effective Usage of Human-Centric Priors in Diffusion Models for Text-based Human Image Generation

27、A Unified and Interpretable Emotion Representation and Expression Generation

28、CosmicMan: A Text-to-Image Foundation Model for Humans

29、DiffHuman: Probabilistic Photorealistic 3D Reconstruction of Humans

30、Texture-Preserving Diffusion Models for High-Fidelity Virtual Try-On

五、图像超分

31、Arbitrary-Scale Image Generation and Upsampling using Latent Diffusion Model and Implicit Neural Decoder

32、Diffusion-based Blind Text Image Super-Resolution

33、Text-guided Explorable Image Super-resolution

34、Building Bridges across Spatial and Temporal Resolutions: Reference-Based Super-Resolution via Change Priors and Conditional Diffusion Model

六、图像恢复

35、Boosting Image Restoration via Priors from Pre-trained Models

36、Image Restoration by Denoising Diffusion Models with Iteratively Preconditioned Guidance

37、Diff-Plugin: Revitalizing Details for Diffusion-based Low-level Tasks

38、Selective Hourglass Mapping for Universal Image Restoration Based on Diffusion Model

39、Shadow Generation for Composite Image Using Diffusion Model

七、目标跟踪

40、Delving into the Trajectory Long-tail Distribution for Muti-object Tracking

八、目标检测

41、SAFDNet: A Simple and Effective Network for Fully Sparse 3D Object Detection

42、DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception

43、SDDGR: Stable Diffusion-based Deep Generative Replay for Class Incremental Object Detection

九、关键点检测

44、Pose-Guided Self-Training with Two-Stage Clustering for Unsupervised Landmark Discovery

十、deepfake检测

####45、Latent Reconstruction Error Based Method for Diffusion-Generated Image Detection

十一、异常检测

46、RealNet: A Feature Selection Network with Realistic Synthetic Anomaly for Anomaly Detection

十二、抠图/分割

47、In-Context Matting

十三、图像压缩

48、Laplacian-guided Entropy Model in Neural Codec with Blur-dissipated Synthesis

十四、视频理解

49、Abductive Ego-View Accident Video Understanding for Safe Driving Perception

十五、视频生成

50、FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation

51、Grid Diffusion Models for Text-to-Video Generation

52、TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models

53、Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model

54、Video Interpolation With Diffusion Models

十六、倾听人生成

55、CustomListener: Text-guided Responsive Interaction for User-friendly Listening Head Generation

十七、数字人生成

56、Make-Your-Anchor: A Diffusion-based 2D Avatar Generation Framework

十八、新视图生成

57、EscherNet: A Generative Model for Scalable View Synthesis

十九、3D相关

58、Bayesian Diffusion Models for 3D Shape Reconstruction