在近几年随着人工智能的发展,基于神经网络的算法在计算机视觉、自然语言处理、语音识别等方面广泛应用。 神经网络模型的发展趋势越来越复杂。从最初的8层网络结构到100多层的网络结构。但其本质优化方法不变,大多为梯度下降法等优化方法。近几年神经网络的发展大部分原因是因为计算机计算能力的提升。在现有的神经网络中通常有上百万的参数。存在大量的参数冗余。 因此简化神经网络,使得神经网络可以在小型的设备上运行。非常有必要的。在此处分析神经网络模型的简化。常用的模型简化方法可以分为三种:剪枝法、设计轻量型的神经网络以及权值矩阵稀疏化的方法。

        神经网络的发展网络的规模越来越大,需要的处理器的运算能力也越来越高。我们在实验中我们可以选用较好的处理器,选择运算能力较强的GPU。但在实际的生活中我们的用户往往没有较高运算能力的GPU/CPU。因此在本文中介绍几种轻量型的卷积神经网络。均在cifar10数据集上运行。

      在本问中介绍-mobilenet,在该神经网络中主要采用了深度可分离卷积和1*1的卷积。从而简化了神经网络的参数。

普通卷积神经网络

bp神经网络轻量化 神经网络模型轻量化_v8

在传统卷积中,一个卷积核存在3个滤波器对每个通道特征进行提取,最后进行加和得到一张特征图。

Mobilenet中采用的深度可分离卷积:

bp神经网络轻量化 神经网络模型轻量化_ide_02

一个卷积核只存在一个滤波器,对图像特征进行提取得到特征图。之后再利用1*1的卷积核对特征进行融合。

其基本的结构如下图:

bp神经网络轻量化 神经网络模型轻量化_v8_03

上图中左边为普通的卷积操作,右边为mobilenet中的基本单元

mobilenet中总的网络结构如下图:

bp神经网络轻量化 神经网络模型轻量化_v8_04

上图为mobilenetv1在imagenet1000上应用的结构,在我i们的实际应用中应该对网络的结构进行改进。如果直接应用其网络很容易造成过拟合的情况因此应该对其进行改进,简化网络的模型。下面代码为在cifar10上应用其进行训练的代码,对于自己的数据集只有在输入输出部分代码不同中间的网络均可采用相同的代码结构。在该网络中应用了BN、dropout、正则化等比卖你过拟合的方法。

代码如下:

import tensorflow as tf
import pickle
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# 实现Batch Normalization
def bn_layer(x,name1,name2,is_training,name='BatchNorm',moving_decay=0.9,eps=1e-5):
    # 获取输入维度并判断是否匹配卷积层(4)或者全连接层(2)
    print(is_training)
    shape = x.shape
    assert len(shape) in [2,4]

    param_shape = shape[-1]
    print(param_shape)
    with tf.variable_scope(name):
        # 声明BN中唯一需要学习的两个参数,y=gamma*x+beta
        gamma = tf.get_variable(name1,param_shape,initializer=tf.constant_initializer(1))
        beta  = tf.get_variable(name2, param_shape,initializer=tf.constant_initializer(0))

        # 计算当前整个batch的均值与方差
        axes = list(range(len(shape)-1))     #特征的通道数
        print(axes)
        batch_mean, batch_var = tf.nn.moments(x,axes,name='moments')

        # 采用滑动平均更新均值与方差
        ema = tf.train.ExponentialMovingAverage(moving_decay)

        def mean_var_with_update():
            ema_apply_op = ema.apply([batch_mean,batch_var])
            with tf.control_dependencies([ema_apply_op]):
                return tf.identity(batch_mean), tf.identity(batch_var)

        # 训练时,更新均值与方差,测试时使用之前最后一次保存的均值与方差
        mean, var = tf.cond(tf.equal(is_training,True),mean_var_with_update,
                lambda:(ema.average(batch_mean),ema.average(batch_var)))

        # 最后执行batch normalization
        return tf.nn.batch_normalization(x,mean,var,beta,gamma,eps)

train_data = {b'data':[], b'labels':[]} #两个items都是list形式
# 5*10000的训练数据和1*10000的测试数据,数据为dict形式,train_data[b'data']为10000 * 3072的numpy向量
# 3072个数字表示图片特征,前1024个表示红色通道,中间1024表示绿色通道,最后1024表示蓝色通道
# train[b'labels']为长度为10000的list,每一个list数字对应以上上3072维的一个特征

# 加载训练数据
for i in range(5):
    with open("H:\data\cifar-10-batches-py/data_batch_" + str(i + 1), mode='rb') as file:
        data = pickle.load(file, encoding='bytes')
        train_data[b'data'] += list(data[b'data'])
        train_data[b'labels'] += data[b'labels']

# 加载测试数据
with open("H:\data\cifar-10-batches-py\est_batch", mode='rb') as file:
    test_data = pickle.load(file, encoding='bytes')

# 定义一些变量
NUM_LABLES = 10 # 分类结果为10类
FC_SIZE = 512   # 全连接隐藏层节点个数
BATCH_SIZE = 100 # 每次训练batch数
lamda = 0.004   # 正则化系数,这里添加了正则化但是没有使用

sess = tf.InteractiveSession()

# 初始化权值
def weight_variable(shape):
    initial = tf.truncated_normal(shape, stddev=0.1)  # 生成一个截断的正态分布
    return tf.Variable(initial)


# 初始化偏置
def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)



def dwconv(input, filter_size, in_filters, mutil_filters, strides):
    f2 = tf.Variable(tf.truncated_normal([filter_size, filter_size, in_filters, mutil_filters], stddev=0.1))

    return tf.nn.depthwise_conv2d(input,f2,strides,padding="SAME",rate=None,name=None,data_format=None)

# 池化层
def max_pool_2x2(x):
    # ksize [1,x,y,1]
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
# 2D卷积
def _conv( name, x, filter_size, in_fliters, out_filters, strides):
    with tf.variable_scope(name):
        n = filter_size * filter_size * out_filters
        # 获取或新建卷积核,正态随机初始化
        b=bias_variable([out_filters])
        kernel = tf.Variable( tf.truncated_normal([filter_size, filter_size, in_fliters, out_filters], stddev=0.1))

        # 计算卷积
        return tf.nn.conv2d(x, kernel, strides, padding='SAME')+b
# 把步长值转换成tf.nn.conv2d需要的步长数组
def stride_arr(stride):
    return [1, stride, stride, 1]
# 对输入进行占位操作,输入为BATCH*3072向量,输出为BATCH*10向量
x = tf.placeholder(tf.float32, [None, 3072])
y_ = tf.placeholder(tf.float32, [None, NUM_LABLES])
# 对输入进行reshape,转换成3*32*32格式
x_image = tf.reshape(x, [-1, 3, 32, 32])
# 转置操作,转换成滤波器做卷积所需格式:32*32*3,32*32为其二维卷积操作维度
x_image = tf.transpose(x_image, [0, 2, 3, 1])
is_training =tf.placeholder(tf.bool)

conv1=_conv("conv1",x_image,3,3,32,stride_arr(2))  #16*16
# conv1=bn_layer(conv1,"gamme1","beat1",is_training)
conv1=tf.nn.relu(conv1)

conv2=dwconv(conv1,3,32,1,stride_arr(1))
# conv2=bn_layer(conv2,"gamme2","beat2",is_training)
conv2=tf.nn.relu(conv2)
conv2=_conv("conv2",conv2,1,32,64,stride_arr(1))
# conv2=bn_layer(conv2,"gamme22","beat22",is_training)
conv2=tf.nn.relu(conv2)

conv3=dwconv(conv2,3,64,1,stride_arr(2))     #8*8
# conv3=bn_layer(conv3,"gamme3","beat3",is_training)
conv3=tf.nn.relu(conv3)
conv3=_conv("conv2",conv3,1,64,128,stride_arr(1))
# conv3=bn_layer(conv3,"gamme33","beat33",is_training)
conv3=tf.nn.relu(conv3)

conv4=dwconv(conv3,3,128,1,stride_arr(1))
conv4=bn_layer(conv4,"gamme4","beat4",is_training)
conv4=tf.nn.relu(conv4)
conv4=_conv("conv2",conv4,1,128,128,stride_arr(1))
conv4=bn_layer(conv4,"gamme44","beat44",is_training)
conv4=tf.nn.relu(conv4)

conv5=dwconv(conv4,3,128,1,stride_arr(2))     #44
conv5=bn_layer(conv5,"gamme5","beat5",is_training)

conv5=tf.nn.relu(conv5)

conv5=_conv("conv2",conv5,1,128,256,stride_arr(1))
conv5=bn_layer(conv5,"gamme55","beat55",is_training)

conv5=tf.nn.relu(conv5)

# conv6=dwconv(conv5,3,256,1,stride_arr(1))
# conv6=bn_layer(conv6,"gamme6","beat6",is_training)
#
# conv6=tf.nn.relu(conv6)
# conv6=_conv("conv2",conv6,1,256,256,stride_arr(1))
# conv6=bn_layer(conv6,"gamme66","beat66",is_training)
#
# conv6=tf.nn.relu(conv6)
#
# conv7=dwconv(conv6,3,256,1,stride_arr(2))    #4*4
# conv7=bn_layer(conv7,"gamme7","beat7",is_training)
#
# conv7=tf.nn.relu(conv7)
# conv7=_conv("conv2",conv7,1,256,512,stride_arr(1))
# conv7=bn_layer(conv7,"gamme77","beat77",is_training)
#
# conv7=tf.nn.relu(conv7)
#
# conv8=dwconv(conv7,3,512,1,stride_arr(1))
# conv8=bn_layer(conv8,"gamme8","beat8",is_training)
#
# conv8=tf.nn.relu(conv8)
# conv8=_conv("conv2",conv8,1,512,512,stride_arr(1))
# conv8=bn_layer(conv8,"gamme88","beat88",is_training)
#
# conv8=tf.nn.relu(conv8)

# conv9=dwconv(conv8,3,512,1,stride_arr(2))    #2*2\
# conv9=bn_layer(conv9,"gamme9","beat9",is_training)
#
# conv9=tf.nn.relu(conv9)
# conv9=_conv("conv2",conv9,1,512,1024,stride_arr(1))
# conv9=bn_layer(conv9,"gamme99","beat99",is_training)
#
# conv9=tf.nn.relu(conv9)
#
# conv10=dwconv(conv9,3,1024,1,stride_arr(2))    #1*1
# conv10=bn_layer(conv10,"gamme10","beat10",is_training)
#
# conv10=tf.nn.relu(conv10)
# conv10=_conv("conv2",conv10,1,1024,1024,stride_arr(1))
# conv10=bn_layer(conv10,"gamme11","beat11",is_training)
#
# conv10=tf.nn.relu(conv10)
#
h_pool10=max_pool_2x2(conv5)

# 将8 * 8 * 64 三维向量拉直成一行向量
h_pool2_flat = tf.reshape(h_pool10, [-1,2*2*256])

# 第一层全连接
W_fc1 = weight_variable([2*2*256,10])
b_fc1 = bias_variable([10])
# h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
# # 对隐藏层使用dropout
learnrate = tf.placeholder(tf.float32)

keep_prob = tf.placeholder(tf.float32)
# h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
# 第1.5层全连接
# W_fc15 = weight_variable([2048,1024])
# b_fc15 = bias_variable([1024])
# h_fc15 = tf.nn.relu(tf.matmul(h_fc1, W_fc15) + b_fc15)
# 第二层全连接
# W_fc2 = weight_variable([1024, 10])
# b_fc2 = bias_variable([10])
# log=tf.matmul(h_fc1, W_fc2) + b_fc2
# prediction= tf.nn.softmax(tf.matmul(h_fc1, W_fc2) + b_fc2)
log=tf.matmul(h_pool2_flat, W_fc1) + b_fc1
prediction=tf.nn.softmax(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
w1_loss = lamda * tf.nn.l2_loss(W_fc1)  # 对W_fc1使用L2正则化
# w2_loss = lamda * tf.nn.l2_loss(W_fc2)  # 对W_fc2使用L2正则化
# 交叉熵损失
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=log))
# 总损失
zloss=cross_entropy+w1_loss
# 用AdamOptimizer优化器训练
train_step = tf.train.AdamOptimizer(learnrate).minimize(zloss)

# 计算准确率
correct_prediction = tf.equal(tf.argmax(prediction, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) #tf.cast将数据转换成指定类型

# 开始训练
# tf.global_variables_initializer().run()
sess.run(tf.initialize_all_variables())

# 对数据范围为0-255的训练数据做归一化处理使其范围为0-1,并将list转成numpy向量
x_train = np.array(train_data[b'data']) / 255
# 将训练输出标签变成one_hot形式并将list转成numpy向量
y_train = np.array(pd.get_dummies(train_data[b'labels']))

# 对数据范围为0-255的测试数据做归一化处理使其范围为0-1,并将list转成numpy向量
x_test = test_data[b'data'] / 255
# 将测试输出标签变成one_hot形式并将list转成numpy向量
y_test = np.array(pd.get_dummies(test_data[b'labels']))
saver = tf.train.Saver()
acc=[]
loss=[]
losse=[]
accye=[]
with tf.Session() as sess:
    initop = tf.group(tf.global_variables_initializer(), tf.local_variables_initializer())
    sess.run(initop)
    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(sess=sess, coord=coord)
    # while True:
    rate=0.001
    # 训练
    for i in range(20000):
        # if(i==6000):
        #     rate=rate/10
        # 100条数据为1个batch,轮流训练
        start = i * BATCH_SIZE % 50000

        # print(start)
        _,accy,loss_=sess.run([train_step,accuracy,cross_entropy],feed_dict = {is_training:True,x: x_train[start: start + BATCH_SIZE],
                                        y_: y_train[start: start + BATCH_SIZE],learnrate:rate,keep_prob:0.3})
        # 每迭代100次在前200条个测试集上测试训练效果
        if(i%100==0):
            if(i%500==0 and i!=0):
                accye.append(accy)
                losse.append(loss_)
            print("step %d,train , %g,loss %g"%(i,accy,loss_))
        if i % 500 == 0 and i!=0:
            # 测试准确率
            loss_value, train_accuracy = sess.run([cross_entropy, accuracy],
                               feed_dict={is_training: True, x: x_test[0: 100],
                                          y_: y_test[0: 100], keep_prob: 1.0})
            acc.append(train_accuracy)
            loss.append(loss_value)
            # 该次训练的损失
            print("step %d, test accuracy, %g loss %g" % (i, train_accuracy, loss_value))

    #测试
    test_accuracy = accuracy.eval(feed_dict = {is_training:False,x:  x_test[0: 500], y_: y_test[0: 500],keep_prob:1.0})
    print("test accuracy %g" % test_accuracy)
    coord.request_stop()
    coord.join(threads)
    # 保存模型
    saver.save(sess, 'net/my_net.ckpt')
fig = plt.figure()

ax1 = fig.add_subplot(111)
ax1.plot(acc, 'r')
ax1.plot(accye, 'b')
ax1.set_ylabel('accuracy')
# ax1.set_title("Double Y axis")

ax2 = ax1.twinx()  # this is the important function
ax2.plot(loss, 'r')
ax2.plot(losse, 'b')
ax2.set_ylabel('loss')
# ax2.set_xlabel('Same X for both exp(-x) and ln(x)')

# plt.plot(acc, 'r')
# plt.plot(accye, 'b')
plt.show()