文章目录
- 前言
- 一、CNN构成
- 二、三通道cnn代码构建
- 1、补0
- 2、单步卷积
- 3、conv_forward函数卷积
- 三、二维cnn代码构建
- 核心代码
- c++实现二维卷积
- Maxpooling
- softmax实现
前言
首先回顾一下CNN的基础知识:
“物所看到的景象并非世界的原貌,而是长期进化出来的适合自己生存环境的一种感知方式。画面识别实际上是寻找(学习)人类的视觉关联方式 ,并再次应用。”
在计算机中,图片存储为0-255的数字,0最暗,255最亮。彩色图片有三个通道,RGB(红、绿、蓝),三原色叠加成为不同颜色的图片,计算机中,用三维立方体表示。
CNN具有的特性:
1)位置不变性:物体无论在一张图片的什么位置都是同一个物体,卷积网络对此具有不变性。为什么不可以用前馈网络实现?前馈网络首先将三维数据拉成一维输入网络,经过一系列隐藏层输出最后的节点,这个一维数据在每个网络节点对应不同权重,如果改变物体顺序结果一定是不一样的。但是CNN具有权重共享的特性,不同位置具有相同的权重,因此输出具有位置不变性。
2)局部连接:选择一个局部区域(卷积核fliter),去扫描这个三维张量,这个扫描的局部区域的全部节点一起连接到下一个点。
这样操作可以减少参数数量,因为可以不用一个节点分配一个权重,只要局部分配一个权重。
3)空间共享:与全连接不同,CNN每层输出节点并不是和全部输入相连,是部分连接,卷积核每次往前推进stride。
卷积计算:
1)通道C方向权重不共享: 彩色图片为三维,在图片的长宽方向是局部连接,通道方向为全部连接,2D卷积过程中,WHC张量映射成平面上的一个点,这里注意在通道C方向权重不共享,权重会扩充到C组。
2)补0操作:为了保留图片边缘信息;同时防止图片越“卷”越小。同时为什么经常选取33和55的卷积核?卷积核为3补上1个zero会生成与原特征图相同的尺寸,卷积核为5补上2个zero会生成与原特征图相同的尺寸。
尺寸计算公式:(input_size + 2*padding - kernel_size)/stride + 1
*权重个数计算公式:kernel_size * kernel_size * kernel_numbers C(c为通道数)
3)多个卷积核叠加:用特定的卷积核去抓取特定的存在,例如:用不同的kernal去提取不同的feature_map,最后按顺序堆叠起来成为一个三维张量,再输入到下一个卷积操作里面。
4)非线性映射:和前馈网络一样,加入非线性变换增加网络拟合能力。
5)池化:下采样,因为识别出的特征图存在冗余。maxpooling:取一个卷积核区域最大数值,保留纹理特征;平均池化:取一个卷积核区域平均值,保留平均值;全局池化:取每个通道的平均,常用来代替全连接,防止因参数过多造成过拟合。
总结
CNN不变性的实现方法:1、平移不变性:空间和参数共享;2、旋转不变性:大量数据;3、尺寸不变性:Inception模块,用多种尺寸的卷积核输出后进行concate。
残差:不同层级之间进行信息交互。
一、CNN构成
首先需要明确构成:
1)Zero Padding补0
2)Kernal卷积核
3)Pooling池化
4)Convolution forward 前向传播
5)Convolution backward 后向传播
二、三通道cnn代码构建
1、补0
def zero_pad(X, pad):
"""
‘constant’——表示连续填充相同的值,每个轴可以分别指定填充值,constant_values=(x, y)时前面用x填充,后面用y填充,缺省值填充0 X的shape (m, n_H, n_W, n_C)表示(图片数量,高,宽,通道)
pad --在n_H和n_W的边缘补0个数
返回X_pad -- 补0之后的shape (m, n_H + 2*pad, n_W + 2*pad, n_C)
"""
X_pad = np.pad(X,((0,0),(pad,pad),(pad,pad),(0,0)),'constant',constant_values = 0)
return X_pad
2、单步卷积
def conv_single_step(a_slice_prev, W, b):
"""
Apply one filter defined by parameters W on a single slice (a_slice_prev) of the output activation of the previous layer.
Arguments:
a_slice_prev -- slice of input data of shape (f, f, n_C_prev)
W -- Weight parameters contained in a window - matrix of shape (f, f, n_C_prev)
b -- Bias parameters contained in a window - matrix of shape (1, 1, 1)
Returns:
Z -- a scalar value, result of convolving the sliding window (W, b) on a slice x of the input data
"""
s = np.multiply(a_slice_prev,W)
Z = np.sum(s)
Z = Z + float(b) #将b转化为浮点数,将Z转成标量.
return Z
3、conv_forward函数卷积
def conv_forward(A_prev, W, b, hparameters):
"""
Implements the forward propagation for a convolution function
Arguments:
A_prev -- output activations of the previous layer, numpy array of shape (m, n_H_prev, n_W_prev, n_C_prev)
W -- Weights, numpy array of shape (f, f, n_C_prev, n_C)
b -- Biases, numpy array of shape (1, 1, 1, n_C)
hparameters -- python dictionary containing "stride" and "pad"
Returns:
Z -- conv output, numpy array of shape (m, n_H, n_W, n_C)
cache -- cache of values needed for the conv_backward() function
"""
# Retrieve dimensions from A_prev's shape
(m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape
# Retrieve dimensions from W's shape
(f, f, n_C_prev, n_C) = W.shape
# Retrieve information from "hparameters"
stride = hparameters['stride']
pad = hparameters['pad']
# Compute the dimensions of the CONV output volume using the formula given above.
n_H = int((n_H_prev - f +2 * pad)/stride) +1
n_W = int((n_W_prev - f +2 * pad)/stride) +1
# Initialize the output volume Z with zeros.
Z = np.zeros((m , n_H, n_W, n_C))
# Create A_prev_pad by padding A_prev
A_prev_pad = zero_pad(A_prev, pad)
for i in range(m): # loop over the batch of training examples
a_prev_pad = A_prev_pad[i] # Select ith training example's padded activation
for h in range(n_H): # loop over vertical axis of the output volume
for w in range(n_W): # loop over horizontal axis of the output volume
for c in range(n_C): # loop over channels (= #filters) of the output volume
# Find the corners of the current "slice"
vert_start = h*stride
vert_end = vert_start + f
horiz_start =w*stride
horiz_end = horiz_start + f
# Use the corners to define the (3D) slice of a_prev_pad (See Hint above the cell). (≈1 line)
a_slice_prev = a_prev_pad[vert_start:vert_end, horiz_start:horiz_end, :]
# Convolve the (3D) slice with the correct filter W and bias b, to get back one output neuron. (≈1 line)
Z[i, h, w, c] = conv_single_step(a_slice_prev, W[...,c], b[...,c])
# Making sure your output shape is correct
assert(Z.shape == (m, n_H, n_W, n_C))
# Save information in "cache" for the backprop
cache = (A_prev, W, b, hparameters)
return Z, cache
三、二维cnn代码构建
核心代码
def convolution(k, data):
n,m = data.shape
img_new = []
for i in range(n-2):
line = []
for j in range(m-2):
a = data[i:i+3,j:j+3]
line.append(np.sum(np.multiply(k, a)))
img_new.append(line)
return np.array(img_new)
## 卷积核1:垂直边缘检测
k1 = np.array([
[1,0,-1],
[1,0,-1],
[1,0,-1]
])
##卷积核2:水平边缘检测
k2 = np.array([
[1,1,1],
[0,0,0],
[-1,-1,-1]
])
c++实现二维卷积
//c++, arr为原始图像, filter为卷积和,输出为unsigned char类型;
#include<iostream>
#include<algorithm>
#include<vector>
using namespace std;
int main() {
int M, N;
int tmparr;
cin >> M >> N;
vector<vector<int> > array(N, vector<int>(M, 0));
for(int i = 0; i < N; ++i) {
for(int j = 0; j < M; ++j) {
cin >> tmparr;
array[i][j] = tmparr;
}
}
cout << "array end" << endl;
int W, H;
double tmpfilter;
cin >> W >> H;
vector<vector<double> > filter(H, vector<double>(W, 0));
for(int i = 0; i < H; ++i) {
for(int j = 0; j < W; ++j) {
cin >> tmpfilter;
filter[i][j] = tmpfilter;
}
}
cout << "filter end" << endl;
//vector<vector<unsigned char> > res(N, vector<unsigned char>(M, 0));
double tmp;
int top = -(H-1)/2;
int left = -(W-1)/2;
for(int i = 0; i < N; ++i) {
for(int j = 0; j < M; ++j) {
tmp = 0;
int boxtop = i + top;
int boxleft = j + left;
for(int k = 0; k < H; ++k) {
for(int l = 0; l < W; ++l) {
int tmpboxtop = boxtop + k;
int tmpboxleft = boxleft + l;
if (tmpboxtop < 0) tmpboxtop = -tmpboxtop;
if (tmpboxtop >= N) tmpboxtop = 2*N - 2 - tmpboxtop;
if (tmpboxleft < 0) tmpboxleft = - tmpboxleft;
if (tmpboxleft >= M) tmpboxleft = 2*M - 2 - tmpboxleft;
//cout << "tmpboxtop = " << tmpboxtop << endl;
//cout << "tmpboxleft = " << tmpboxleft << endl;
//cout << "k = " << k << endl;
//cout << "l = " << l << endl;
//cout << "array[tmpboxtop][tmpboxleft] = " << array[tmpboxtop][tmpboxleft] << endl;
//cout << "filter[k][l] = " << filter[k][l] << endl;
tmp += array[tmpboxtop][tmpboxleft] * filter[k][l];
}
}
//res[i][j] = (unsigned char)tmp;
cout << tmp << " ";
}
cout << endl;
}
system("pause");
}
Maxpooling
import numpy as np
def max_pooling(feature_map, size=2, stride=2):
#feature_map (h,w)
height = feature_map.shape[0]
width = feature_map.shape[1]
# 确定最后的输出形状
out_height = np.uint16((height - size) // stride + 1)
out_width = np.uint16((width - size) // stride + 1)
# print "out_shape", (out_height, out_width)
out_pooling = np.zeros((out_height, out_width), dtype=np.uint8)
x = y = 0
for m in np.arange(0, height, stride):
for n in np.arange(0, width, stride):
try:
out_pooling[x][y] = np.max(feature_map[m:m + size, n:n + size])
y += stride
# try执行不成功, 说明已经超出,终止循环
except:
break
x += stride
y = 0
return out_pooling
if __name__ == "__main__":
input = np.arange(9).reshape((3, 3))
output = max_pooling(input, 2, 1)
print(output)
softmax实现
import numpy as np
def softmax(a):
exp_a = np.exp(a)
sum_exp_a = np.sum(exp_a)
y = exp_a/sum_exp_a
return y