Rotate to Attend: Convolutional Triplet Attention Module
PDF:​​​https://arxiv.org/pdf/2010.03045.pdf​​​ PyTorch: ​​https://github.com/shanglianlm0525/PyTorch-Networks​

1 概述

Triplet Attention和SE,CBAM,GC的结构对比

深度学习论文: Rotate to Attend: Convolutional Triplet Attention Module及其PyTorch实现_pytorch


跟其他注意力模块比起来,参数要少很多(参数少了,但是Flops并没有少。。。)

深度学习论文: Rotate to Attend: Convolutional Triplet Attention Module及其PyTorch实现_人工智能_02

2 Triplet Attention

对于输入张量,triplet attention通过旋转操作,然后使用残差变换建立维度间的依存关系,并以可忽略的计算开销对通道间和空间信息进行编码。

深度学习论文: Rotate to Attend: Convolutional Triplet Attention Module及其PyTorch实现_算法_03


具体的网络结构如上图所示:

  • 第一个分支:通道注意力计算分支,输入特征经过Z-Pool,再接着7 x 7卷积,最后Sigmoid激活函数生成通道注意力权重
  • 第二个分支:通道C和空间W维度交互捕获分支,输入特征先经过permute,变为H X C X W维度特征,接着在H维度上进行Z-Pool,后面操作类似。最后需要经过permuter变为C X H X W维度特征,方便进行element-wise相加
  • 第三个分支:通道C和空间H维度交互捕获分支,输入特征先经过permute,变为W X H X C维度特征,接着在W维度上进行Z-Pool,后面操作类似。最后需要经过permuter变为C X H X W维度特征,方便进行element-wise相加

最后对3个分支输出特征进行相加求Avg

深度学习论文: Rotate to Attend: Convolutional Triplet Attention Module及其PyTorch实现_人工智能_04

import torch
import torch.nn as nn
import torchvision


class ChannelPool(nn.Module):
def forward(self, x):
return torch.cat( (torch.max(x,1)[0].unsqueeze(1), torch.mean(x,1).unsqueeze(1)), dim=1 )


class SpatialGate(nn.Module):
def __init__(self):
super(SpatialGate, self).__init__()

self.channel_pool = ChannelPool()
self.conv = nn.Sequential(
nn.Conv2d(in_channels=2, out_channels=1, kernel_size=7, stride=1, padding=3),
nn.BatchNorm2d(1)
)
self.sigmod = nn.Sigmoid()

def forward(self, x):
out = self.conv(self.channel_pool(x))
return out * self.sigmod(out)


class TripletAttention(nn.Module):
def __init__(self, spatial=True):
super(TripletAttention, self).__init__()
self.spatial = spatial
self.height_gate = SpatialGate()
self.width_gate = SpatialGate()
if self.spatial:
self.spatial_gate = SpatialGate()

def forward(self, x):
x_perm1 = x.permute(0, 2, 1, 3).contiguous()
x_out1 = self.height_gate(x_perm1)
x_out1 = x_out1.permute(0, 2, 1, 3).contiguous()

x_perm2 = x.permute(0, 3, 2, 1).contiguous()
x_out2 = self.width_gate(x_perm2)
x_out2 = x_out2.permute(0, 3, 2, 1).contiguous()

if self.spatial:
x_out3 = self.spatial_gate(x)
return (1/3) * (x_out1 + x_out2 + x_out3)
else:
return (1/2) * (x_out1 + x_out2)

3 Experiments

3-1 ImageNet

深度学习论文: Rotate to Attend: Convolutional Triplet Attention Module及其PyTorch实现_算法_05

3-2 COCO

深度学习论文: Rotate to Attend: Convolutional Triplet Attention Module及其PyTorch实现_深度学习_06