一维卷积二维卷积三维卷积


Google’s Inception architecture has had lots of success in the image classification world —and much of it is owed to a clever trick known as 1×1 convolution, central to the model’s design.

Google的Inception架构在图像分类世界中取得了许多成功-其中很大一部分归功于一个聪明的技巧,即所谓的1×1卷积,这是模型设计的核心。

One notices immediately that the 1×1 convolution is an essential part of the Inception module. It precedes any other convolution (3×3 and 5×5) and used four times in a single module, more than any other element.

人们立即注意到1×1卷积是Inception模块的重要组成部分。 它在任何其他卷积(3×3和5×5)之前,并且在单个模块中使用了四次,比其他任何元素都多。




Inception paper. 《盗梦空间》

What a 1×1 convolution even does, however, can be confusing. If convolutions are meant to apply a sliding window over several consecutive pixels, exactly what is a sliding window the same size as a pixel doing? To answer this question, let’s begin with a definition of what a convolution is.

但是,即使是1×1卷积,也会造成混淆。 如果卷积用于在多个连续像素上应用滑动窗口,那么与像素大小相同的滑动窗口到底是什么? 为了回答这个问题,让我们从定义卷积开始。

An a×a convolution refers to the size of the ‘filter’, or the sliding window, which multiplies and sums elements to form a convolved output. The ‘stride’ of the convolution refers to how many pixels it moves before calculating the next value. The number of pixels in the output convolved feature is a simple function of both the size of the filter and the stride. In this case, the convolution has a 3×3 filter and a stride of 1.

a×a卷积是指“过滤器”(即滑动窗口)的大小,它将元素相乘和相加以形成卷积输出。 卷积的“步幅”是指在计算下一个值之前,卷积移动了多少像素。 输出卷积特征中的像素数量是滤波器大小和步幅的简单函数。 在这种情况下,卷积具有3 × 3的滤镜和跨度为1。


Stanford Deep Learning Tutorial. Image free to share. 斯坦福大学深度学习指南》 。 图片免费分享。

It appears obvious, then, that a convolution filter of size 1×1 results in a convolved feature that is the same size as the input image.

显然,大小为1×1的卷积滤波器会产生与输入图像大小相同的卷积特征。

However, Inception was designed to perform well on colored images, which are three-dimensional. Whereas an image represented as a two-dimensional array is on a scale from white to black (grayscale), images with color are usually three-dimensional, in the form of (h, w, 3), with h and w corresponding to the height and width.

但是,Inception被设计为在三维彩色图像上表现良好。 表示为二维数组的图像的大小是从白色到黑色(灰度),而彩色图像通常是三维的,形式为(h, w, 3) (h, w, 3) (h, w, 3) ,其中hw对应于高度和宽度。

This is because each pixel has three values, corresponding to the amount of red, green, and blue to form a specific value. This number — 3 — is often used to refer to the number of ‘channels’ in an image. Convolutions performed on three-dimensional inputs apply a cube-shaped filter in three dimensions: vertically, horizontally, and along the depth.

这是因为每个像素具有三个值,分别对应于红色,绿色和蓝色的数量以形成特定值。 这个数字-3-通常用于指代图像中“通道”的数量。 在三维输入上执行的卷积将在三个维度上应用立方体形状的滤波器:垂直,水平和沿着深度。

However, since the number of channels is not considered a ‘spatial’ dimension, convolutions on multiple channels simply run vertically and horizontally, incorporating but not moving along the depth, like such:

但是,由于通道的数量不被视为“空间”维度,因此多个通道上的卷积仅在垂直和水平方向上进行, 包括但不沿深度移动 ,例如:


Nadeem Qazi. Image free to share. Nadeem Qazi 。 图片免费分享。

This means that in a colored image, each pixel’s value is treated as the sum of products of the R, G, and B values: a×R + b×G + c×B. The weights a, b, and c are learned by the convolutional layer. Hence, using two-dimensional filters in the context of colored images actually collapses the three-dimensional image into a two-dimensional one. Then, with the large filter in the image above, these pixel values are aggregated and outputted.

这意味着在彩色图像中,每个像素的值都被视为R,G和B值的乘积之和: a×R + b×G + c×B 。 权重abc由卷积层学习。 因此,在彩色图像的上下文中使用二维过滤器实际上会将三维图像折叠为二维图像。 然后,使用上图中的大滤镜,将这些像素值汇总并输出。

However, since a 1×1 convolution outputs an image with the same width and height as the input, it serves as a channel-collapser, and allows the following operations to run on a two-dimensional image instead of an expensive three-dimensional one, drastically speeding up the number of operations needed, while at the same time not compromising on the recognition of color.

但是,由于1×1卷积输出的图像的宽度和高度与输入的宽度和高度相同,因此它用作通道折叠器,并允许以下操作在二维图像上运行,而不是在昂贵的三维图像上运行, 大大加快了所需的操作数量,同时又不影响色彩识别。

Through the weights a, b, and c, the network can ‘choose’ which colors it ‘pays attention to’. For instance, if the red value for a particular pixel is important, the value of its weight, a, may be blown up to 3, and perhaps the weight for a less significant pixel-color reduced to a fraction. Hence, it serves not only a basic purpose as a channel-collapser but an important messenger of color to the preceding layers.

通过权重abc ,网络可以“选择”它“关注”的颜色。 例如,如果特定像素的红色值很重要,则其权重a的值可能会被炸毁至3,而不太重要的像素颜色的权重可能会减小到一小部分。 因此,它不仅充当通道折叠器的基本目的,而且还充当先前各层颜色的重要信使。

There’s another reason why 1×1 convolutions are so effective: convolutional layers don’t only learn one filter. In fact, they usually learn anywhere from a dozen to hundreds of filters for each image, and apply them to inputs to produce different convolved images. It is similar to different learned parameters of activation functions like ReLU in standard feedforward networks: think of each filter as a neuron. (This is an oversimplification.)

1×1卷积如此有效的另一个原因是:卷积层不仅学习一个滤波器。 实际上,他们通常为每个图像学习从十几到数百个滤波器的任何地方,并将其应用于输入以生成不同的卷积图像。 它类似于标准前馈网络中激活函数(如ReLU)的不同学习参数:将每个过滤器视为一个神经元。 (这太简单了。)

Suppose convolutional layer A outputs a (N, F, H, W)-shaped tensor where N is the batch size, F is the number of filters, and H&W are the dimensions of the images. If A is connected to a second layer, B, which has f filters and a 1×1 convolution, the output will be (N, f, H, W): only the number of filters changes, since the dimensions remain the same.

假设卷积层A输出一个(N, F, H, W)形的张量,其中N是批大小, F是滤波器的数量, HW是图像的尺寸。 如果A连接到第二层B ,该第二层具有f滤镜和1×1卷积,则输出将为(N, f, H, W) :仅滤镜的数量会发生变化,因为尺寸保持不变。

In this sense, the 1×1 convolution is like a purposely placed bottleneck in the network, forcing the network to squeeze information through a limited number of filters. Consider two neural networks, for instance, one with a bottleneck and one without. The one with a bottleneck has two-thirds the number of operations/linkages, and the potential savings are even larger at the massive scale of modern convolutional neural networks.

从这个意义上讲,1×1卷积就像是网络中故意放置的瓶颈,迫使网络通过有限数量的过滤器挤压信息。 考虑两个神经网络,例如,一个有瓶颈,一个没有瓶颈。 一个瓶颈的操作/链接数量为三分之二,而在现代卷积神经网络的大规模开发中,潜在的节省甚至更大。


Left: with bottleneck. Right: without bottleneck. Image created by author. 左:有瓶颈。 正确:没有瓶颈。 图片由作者创建。


In the case of regular deep neural networks, the placement of such bottlenecks are not necessary, but in the expensive and data-rich world of images, using 1×1 convolutions are like placing an intermediate layer to reduce the number of filters while maintaining the dimensions of the image.

在常规的深度神经网络的情况下,不必放置此类瓶颈,但是在昂贵且数据丰富的图像世界中,使用1×1卷积就像放置中间层以减少过滤器数量,同时保持图片的尺寸。

One can think of filters as designed to capture specific attributes of images, like edges or color-filled sections. Since most images in the ImageNet dataset, which Inception was built for, can be easily identified with a few filters — instead of granular identification needed for, say, classifying plants that look very similar to each other — forcing the compression of so many filters cuts down on unnecessary computation calculating the classes of images with many more filters than are actually needed.

可以将滤镜设计为捕获图像的特定属性,例如边缘或颜色填充的部分。 由于使用Inception生成的ImageNet数据集中的大多数图像都可以使用几个过滤器轻松识别-而不是例如对看起来彼此非常相似的植物进行分类所需的精细识别-强制压缩许多过滤器切口减少不必要的计算,使用比实际需要更多的过滤器来计算图像类别。

Additionally, these 1×1 convolutions are doing a lot of heavy lifting by computing reductions of the filters, then passing on the reduced images to be convolved by larger-filter layers. This is a very efficient method to process images — for example, below the number of operations was reduced tenfold, in one single connection. Imagine the savings multiple 1×1 convolutions would have!

此外,这些1×1卷积通过计算滤镜的缩减量进行了大量的繁重工作,然后继续传递缩减后的图像,以供较大的滤镜层进行卷积。 这是一种非常有效的图像处理方法,例如,在单个连接中,操作次数减少了十倍以下。 想象一下,节省多个1×1卷积!


Inception v3 paper, image free to share. Inception v3纸 ,图片免费共享。

In the Inception paper, the authors dedicate a lengthy excerpt to justifying their use of the 1×1 convolution:

在《盗梦空间》中,作者专门冗长地摘录了使用1×1卷积的理由:

One big problem…[with not using 1×1 convolutions] is that even a modest number of 5×5 convolutions can be prohibitively expensive on top of a convolutional layer with a large number of filters.

一个大问题…[不使用1 × 1卷积] 是即使在 具有大量过滤器的卷积层上 ,即使是5 × 5卷积 的适度数量 也可能是昂贵的。

This leads to the second idea of the proposed architecture: judiciously applying dimension reductions and projections wherever the computational requirements would increase too much otherwise. This is based on the success of embeddings: even low dimensional embeddings might contain a lot of information about a relatively large image patch…1×1 convolutions are used to compute reductions before the expensive 3×3 and 5×5 convolutions. Besides being used as reductions, they also include the use of rectified linear activation which makes them dual-purpose.

这导致了所提出的体系结构的第二个想法:明智地在任何情况下如果计算需求会增加太多的地方进行尺寸缩减和投影。 这是基于嵌入的成功:即使是低维的嵌入也可能包含许多有关相对较大图像块的信息… 在昂贵的3 × 3和5 × 5卷积 之前,使用 1 × 1卷积来计算缩减量 除了用作减少量之外,它们还包括使用整流线性激活,使其具有双重用途。

- Source: “Going deeper with convolutions”

-资料来源: “深入研究卷积”

In summary,

综上所述,

  • 1×1 convolutions are an essential part of the Inception module.
  • A 1×1 convolution returns an output image with the same dimensions as the input image.
  • Colored images have three dimensions, or channels. 1×1 convolutions compress these channels at little cost, leaving a two-dimensional image to perform expensive 3×3 and 5×5 convolutions on.
  • Convolutional layers learn many filters to identify attributes of images. 1×1 convolutions can be placed as ‘bottlenecks’ to help compress a high number of filters into just the amount of information that is necessary for a classification.

1×1 convolutions are clever and natural implementations of dimensionality reduction in the context of convolutional neural networks. Its placement throughout Inception helps keep computational costs feasible — without them, it is doubtful whether Inception would have been so successful.

1×1卷积是卷积神经网络背景下降维的一种聪明而自然的实现。 它在整个Inception中的位置有助于使计算成本保持可行—如果没有这些成本,则Inception是否会如此成功令人怀疑。

Thanks for reading!

谢谢阅读!



翻译自: https://towardsdatascience.com/the-clever-trick-behind-googles-inception-the-1-1-convolution-58815b20113

一维卷积二维卷积三维卷积