Arxiv 2207 | LightViT: Towards Light-Weight Convolution-Free Vision Transformers

原创

開心的猫 2022-12-14 12:34:23 博主文章分类：深度学习 ©著作权

文章标签 深度学习人工智能 github 更新过程 Self 文章分类 云平台云计算

©著作权归作者所有：来自51CTO博客作者開心的猫的原创作品，请联系作者获取转载授权，否则将追究法律责任

LightViT: 全局与局部的交互与强化

论文：https://arxiv.org/abs/2207.05557
代码：https://github.com/hunto/LightViT

Arxiv 2207 | LightViT: Towards Light-Weight Convolution-Free Vision Transformers_Self

本文旨在改进轻量视觉Transformer模型的设计。

Arxiv 2207 | LightViT: Towards Light-Weight Convolution-Free Vision Transformers_Self_02

Arxiv 2207 | LightViT: Towards Light-Weight Convolution-Free Vision Transformers_人工智能_03

针对Transformer Block的改进

Arxiv 2207 | LightViT: Towards Light-Weight Convolution-Free Vision Transformers_Self_04

针对Self-Attention，在local attention计算得到的局部依赖的基础上，额外引入了global token与image token的交互。这一过程现将image token中的信息聚合，并更新global token。之后反过来再讲global token中的信息传播到image token上得到全局依赖。将全局与局部依赖整合更新image token。最终模块输出为更新后的image和global token。作者们将这一过程基于global token的更新过程称之为information squeeze-and-expand scheme，也就是信息压缩和扩张的形式，与seblock的形式本质上颇为类似。

针对FFN，在原来的点变换的基础上级联了一个双维度的注意力，从空间和通道两个维度上级进行了特征的强化。