Fully Convolutional Networks for Semantic Segmentation
将当前分类网络(AlexNet,VGGNet和GoogleNet)修改为全卷积网络,通过对分割任务进行微调,将它们学习的表征转移到网络中。然后,定义了一种新的架构,将深度、粗糙的网络层的语义信息和浅的、精细的网络层的表征信息结合起来,来生成精确和详细的分割。即在上采样的每一个阶段通过融合(简单的相加)下采样阶段网络中的底层的更加粗糙但是分辨率更高的特征图进一步细化特征。
https://arxiv.org/abs/1511.00561
总的来说,FCN可以看作图像分割的开山鼻祖,其创新点主要是:
-
卷积化:
用卷积来代替全连接,因此是拥有较少参数的轻量级网络。并将VGG16等分类器网络进行知识迁移来实现语义细分。 -
上采样
:对低分辨的图像采用通过学习实现插值操作进行上采样,而不是使用简单的双线性插值(即使用经双线性插值滤波器初始化后的反卷积操作完成) -
跳跃结构
:将encoder不同"层面"的特征进行融合(这些不同"层面"的特征具有不同"粗糙程度"的语义信息)
首先是VGG16的主干结构,用来提取特征
def get_vgg_encoder(input_height=224, input_width=224, pretrained='imagenet'):
assert input_height % 32 == 0
assert input_width % 32 == 0
img_input = Input(shape=(input_height, input_width, 3))
# block1
x = Conv2D(64, (3, 3), activation='relu', padding='same',name='block1_conv1')(img_input)
x = Conv2D(64, (3, 3), activation='relu', padding='same',name='block1_conv2')(x)
x = MaxPooling2D((2, 2), strides=(2, 2), name='block1_pool')(x)
f1 = x
# Block 2
x = Conv2D(128, (3, 3), activation='relu', padding='same',name='block2_conv1')(x)
x = Conv2D(128, (3, 3), activation='relu', padding='same',name='block2_conv2')(x)
x = MaxPooling2D((2, 2), strides=(2, 2), name='block2_pool')(x)
f2 = x
# Block 3
x = Conv2D(256, (3, 3), activation='relu', padding='same',
name='block3_conv1')(x)
x = Conv2D(256, (3, 3), activation='relu', padding='same',name='block3_conv2')(x)
x = Conv2D(256, (3, 3), activation='relu', padding='same',name='block3_conv3')(x)
x = MaxPooling2D((2, 2), strides=(2, 2), name='block3_pool')(x)
f3 = x
# Block 4
x = Conv2D(512, (3, 3), activation='relu', padding='same',name='block4_conv1')(x)
x = Conv2D(512, (3, 3), activation='relu', padding='same',name='block4_conv2')(x)
x = Conv2D(512, (3, 3), activation='relu', padding='same',name='block4_conv3')(x)
x = MaxPooling2D((2, 2), strides=(2, 2), name='block4_pool')(x)
f4 = x
# Block 5
x = Conv2D(512, (3, 3), activation='relu', padding='same',name='block5_conv1')(x)
x = Conv2D(512, (3, 3), activation='relu', padding='same',name='block5_conv2')(x)
x = Conv2D(512, (3, 3), activation='relu', padding='same',name='block5_conv3')(x)
x = MaxPooling2D((2, 2), strides=(2, 2), name='block5_pool')(x)
f5 = x
return img_input, [f1, f2, f3, f4, f5]
从上图我们看到FCN算法采用了“融合层”的策略(简单的相加),将高层特征和低层特征进行融合来提高分割性能,分别采用了3种不通过的结构。即32x upsampled,16x upsampled,8x upsampled。
这里来看一下FCN-8的解码器,FC8解码器融合了不同粗糙程度conv3,conv4和fc7的特征,利用编码器不同阶段不同分辨率的空间信息来细化分割结果。具体操作步骤是将pool5的结果进行2倍上采样,与pool4相加,作为“融合”,然后将“融合”后的结果进行2倍的上采样。再与pool3的结果进行“融合”,之后进行8倍的上采样。(这里的融合是指对应位置像素值相加,后来的u-net则是通道上拼接。)
def f8_decoder(n_classes,levels):
[f1,f2,f3,f4,f5] = levels
x = f5
x = (Conv2D(4096,(7,7),activation="relu",padding="same"))(x)
x = Dropout(0.5)(x)
x = (Conv2D(4096,(1,1),activation="relu",padding="same"))(x)
x = Dropout(0.5)(x)
x = (Conv2D(n_classes,(1,1),kernel_initializer='he_normal'))(x)
x = Conv2DTranspose(n_classes,kernel_size=(4,4),strides=(2,2),padding='same', kernel_initializer='he_normal')(x)
f4 = (Conv2D(n_classes,(1,1),kernel_initializer="he_normal"))(f4)
x = Add()([f4,x])
x = Conv2DTranspose(n_classes,kernel_size=(4,4),strides=(2,2),padding="same",kernel_initializer="he_normal")(x)
f3 = (Conv2D(n_classes,(1,1),kernel_initializer="he_normal"))(f3)
x = Add()([f3,x])
x = Conv2DTranspose(n_classes,kernel_size=(16,16),strides=(8,8),padding="same",kernel_initializer="he_normal")(x)
output_f8 = x
return output_f8
最后的FCN结构
def fcn(n_classes,encoder,input_height=416,input_width=608):
# encoder通过主干网络
img_input, levels = encoder(input_height=input_height, input_width=input_width)
# 将特征传入f32网络
x = f8_decoder( n_classes, levels)
x = Softmax()(x)
model = Model(img_input,x)
return model