BP神经网络进阶

前言

在BP神经网络原理探索一文中,只是介绍了简单的回归,并给出简单的回归代码。

MINIST数据集

大多数示例使用手写数字的MNIST数据集。该数据集包含60,000个用于训练的示例和10,000个用于测试的示例。这些数字已经过尺寸标准化并位于图像中心,图像是固定大小(28x28像素),其值为0到1。为简单起见,每个图像都被平展并转换为784(28 * 28)个特征一维numpy数组

BP神经网络最大Mu bp神经网络 mnist_BP神经网络最大Mu

注意哦是,一维数组耶

BP 分类原理

BP神经网络最大Mu bp神经网络 mnist_数据_02


注意,有的说法把单纯的输入的数据叫做输入层,其余都叫做隐藏层

以上网络结构中,输入数据是BP神经网络最大Mu bp神经网络 mnist_数据_03其中包括2个样本,每个样本有3个特征,即BP神经网络最大Mu bp神经网络 mnist_神经网络_04的行数=特征个数,列数=样本数。记BP神经网络最大Mu bp神经网络 mnist_数据_05,圆括号表示第几个样本,方括号表示第几层,下角标是特征个数。
虽然在实际编程的代码中,输入经常是行代表样本数,列代表特征个数,这个自己记一下就好啦

输入层(Input)

权重和偏置BP神经网络最大Mu bp神经网络 mnist_BP神经网络最大Mu_06BP神经网络最大Mu bp神经网络 mnist_数据集_07
BP神经网络最大Mu bp神经网络 mnist_数据集_08的行数=当前层神经元的个数,列数=当前层所接受的特征个数
BP神经网络最大Mu bp神经网络 mnist_神经网络_09的行数=当前层神经元的个数
该层的线性计算:
BP神经网络最大Mu bp神经网络 mnist_神经网络_10
激活输出:
BP神经网络最大Mu bp神经网络 mnist_神经网络_11
设每个神经元的激活函数为最常用的Sigmoid函数:
BP神经网络最大Mu bp神经网络 mnist_数据集_12

隐藏层(Hidden)

类似输入层,不多说

输出层(Output)

输出结果是BP神经网络最大Mu bp神经网络 mnist_神经网络_13,对应的真实标签值是BP神经网络最大Mu bp神经网络 mnist_神经网络_14

损失函数~~

分类问题中的BP目标函数(损失函数)是交叉熵函数
BP神经网络最大Mu bp神经网络 mnist_数据集_15
简单的可以记作BP神经网络最大Mu bp神经网络 mnist_BP神经网络最大Mu_16

由简单的链导法则可有:BP神经网络最大Mu bp神经网络 mnist_神经网络_17

BP神经网络最大Mu bp神经网络 mnist_数据集_18

则有:

BP神经网络最大Mu bp神经网络 mnist_BP神经网络最大Mu_19


其中的BP神经网络最大Mu bp神经网络 mnist_神经网络_20表示逐元素相乘(比如MatLAB里的A.B,Python里的矩阵 AB)。这一块涉及的函数求导就不赘述了,然后就可很容易计算出来下面的结果:

BP神经网络最大Mu bp神经网络 mnist_数据集_21


上式计算后,结果为:

BP神经网络最大Mu bp神经网络 mnist_数据_22


从上面的式子可以看出,每个权重的梯度是每个样本得到的梯度之和,因此,这里都除以样本个数,求出平均梯度。整理一下,我们就得到:

BP神经网络最大Mu bp神经网络 mnist_数据集_23


同理可以求出来:

BP神经网络最大Mu bp神经网络 mnist_神经网络_24


其实这就是 BP神经网络最大Mu bp神经网络 mnist_数据集_25按行求和(即求第一行的总和,第二行的总和),所以简写为(Python中numpy的sum函数):

BP神经网络最大Mu bp神经网络 mnist_BP神经网络最大Mu_26


现在输出层的都求出来,然后就再往回一层,求隐含层的梯度,因此,中间链导需要经过BP神经网络最大Mu bp神经网络 mnist_数据_27

BP神经网络最大Mu bp神经网络 mnist_数据_28


BP神经网络最大Mu bp神经网络 mnist_神经网络_29


接着就可以计算

BP神经网络最大Mu bp神经网络 mnist_数据_30


BP神经网络最大Mu bp神经网络 mnist_数据集_31


看不懂的读者建议先去看原理那篇博客,其实这里的反向传播方法没啥变化,就是偏微分求啊求。只是出了求损失函数针对W权重的偏微分,还有损失函数针对b偏置的偏微分~综上总结起来,就是这样的:

BP神经网络最大Mu bp神经网络 mnist_数据集_32


流程图如下;

BP神经网络最大Mu bp神经网络 mnist_BP神经网络最大Mu_33

代码

NeuralNetwork.py是核心代码,实现了神经网络的定义,包括前向传播和反向传播,训练代码等等。

三个demo文件:demo_curve_fitting.py,demo_logistic,demo_mnist分别对应曲线拟合、二分类、多分类的三种不同的任务:

先来研究重点,对minist的骚操作:

demo_mnist.py
import NeuralNetwork as NN
import numpy as np
import matplotlib.pyplot as plt
import tools

def train(path_to_datas, save_model_path):
    # 读取MNIST数据集
    train_datas, labels = tools.load_mnist(path_to_datas, 'train')
    print("The total numbers of datas : ", len(train_datas))
    train_labels = np.zeros((labels.shape[0], 10))
    train_labels[np.arange(labels.shape[0]), labels.astype('int').reshape(-1)-1] = 1.0

    # 设置训练所需的超参数
    batch_size = 100
    # 训练次数
    train_epochs = 10
    # 学习率
    lr = 0.01
    decay = False
    regularization = False
    input_features_numbers = train_datas.shape[1]
    layer_structure = [input_features_numbers, 512, 256, 128, 10]
    display = True
    net_name = 'nn'
    # 定义我们的神经网络分类器
    net = NN.MLP(name=net_name, layer_structure=layer_structure, task_model='multi', batch_size=batch_size)
    # 开始训练
    print("---------开始训练---------")
    net.train(train_datas=train_datas, train_targets=train_labels, train_epoch=train_epochs, lr=lr, lr_decay=decay, loss='BE', regularization=regularization, display=display)
    # 保存模型
    net.save_model(path=save_model_path)
    # 绘制网络的训练损失和精度
    total_net_loss = [net.total_loss]
    total_net_accuracy = [net.total_accuracy]
    tools.drawDataCurve(total_net_loss, total_net_accuracy)

def test(path_to_datas, save_model_path):
    # 读取xlsx文件
    test_datas, all_label = tools.load_mnist(path_to_datas, 'test')
    print("The total numbers of datas : ", len(test_datas))
    test_labels = np.zeros((all_label.shape[0], 10))
    test_labels[np.arange(all_label.shape[0]), all_label.astype('int').reshape(-1)-1] = 1.0

    # 设置训练所需的超参数
    batch_size = 100
    input_features_numbers = test_datas.shape[1]
    layer_structure = [input_features_numbers, 512, 256, 128, 10]
    net_name = 'nn'

    # 测试代码
    print("---------测试---------")
    # 载入训练好的模型
    net = NN.MLP(name=net_name, layer_structure=layer_structure, task_model='multi', batch_size=batch_size, load_model=save_model_path)

    # 网络进行预测
    test_steps = test_datas.shape[0] // batch_size
    accuracy = 0
    for i in range(test_steps):
        input_data = test_datas[batch_size*i : batch_size*(i+1), :].reshape(batch_size, test_datas.shape[1])
        targets = test_labels[batch_size*i : batch_size*(i+1), :].reshape(batch_size, test_labels.shape[1])

        pred = net(input_data)
        # 计算准确率
        accuracy += np.sum(np.argmax(pred,1) == np.argmax(targets,1)) / targets.shape[0]
    print("网络识别的准确率 : ", accuracy / test_steps)

if __name__ == "__main__":
    path_to_datas = 'mnist/'
    save_model_path = 'model/'
    train(path_to_datas, save_model_path)
    test(path_to_datas, save_model_path)

核心网络的实现

NeuralNetwork.py:
import numpy as np
import matplotlib.pyplot as plt

class MLP():
    def __init__(self, name='nn', layer_structure=[], task_model=None, batch_size=1, load_model=None):
        """layer_number : 神经网络的层数
           layer_structure = [输入的特征个数,第1层神经元个数,第2层神经元个数,...,最后一层神经元个数输出层特征个数],
           如网络层数设为layer_number=3, layer_structure=[20,10,5,1]:输入特征是20个,第一层有10个神经元,第二层5个,第三层1个.
           output_model = 'regression'/'logistic'
        """
        self.name = name
        self.layer_number = len(layer_structure) - 1
        self.layer_structure = layer_structure
        self.task_model = task_model
        self.W = []
        self.B = []
        self.batch_size = batch_size
        self.total_loss = []
        if self.task_model == 'logistic' or self.task_model == 'multi':
            self.total_accuracy = []
        
        if load_model == None:
            print("Initializing the network from scratch ...")
            for index in range(self.layer_number):
                self.W.append(np.random.randn(self.layer_structure[index], self.layer_structure[index+1]))
                self.B.append(np.random.randn(1, self.layer_structure[index+1]))
        else:
            print("Initializing the network from trained model ...")
            for index in range(self.layer_number):
                self.W.append(np.loadtxt(load_model + self.name + "_layer_" + str(index) + "_W.txt").reshape(self.layer_structure[index], self.layer_structure[index+1]))
                self.B.append(np.loadtxt(load_model + self.name + "_layer_" + str(index) + "_B.txt").reshape(1, self.layer_structure[index+1]))

    def normal_parameters(self, means, sigmas):
        self.means = means
        self.sigams = sigmas

    def sigmoid(self, x):
        return 1/(1+np.exp(-x))

    def sigmoid_gradient(self, x):
        return self.sigmoid(x)*(1-self.sigmoid(x))
	
    def softmax(self, x):
        return np.exp(x)/np.sum(np.exp(x), axis = 1, keepdims = True)
    
    def forward(self, x):
        """
            intput : x = [batch_size, features]
        """
        self.before_activation = []
        self.activations = [x]
        for index in range(self.layer_number):
            if index < self.layer_number - 1:
                Z = np.dot(self.activations[index], self.W[index]) + self.B[index]
                self.before_activation.append(Z)
                self.activations.append(self.sigmoid(Z))
            else:
                if self.task_model == 'logistic':
                    Z = np.dot(self.activations[index], self.W[index]) + self.B[index]
                    self.before_activation.append(Z)
                    self.activations.append(self.sigmoid(Z))
                elif self.task_model == 'regression':
                    Z = np.dot(self.activations[index], self.W[index]) + self.B[index]
                    self.before_activation.append(Z)
                    self.activations.append(Z)
                elif self.task_model == 'multi':
                    Z = np.dot(self.activations[index], self.W[index]) + self.B[index]
                    self.before_activation.append(Z)
                    self.activations.append(self.softmax(Z))

        return self.activations[-1]

    def __call__(self, x):
        return self.forward(x)

    def lossfunction(self, inputs, target):
        if self.task_model == 'regression':
            return(np.mean(np.sum((inputs - target)**2, 1)))
        elif self.task_model == 'logistic':
            return np.mean(np.sum(-target*np.log(inputs+1e-14) - (1-target)*np.log(1-inputs+1e-14), 1))
        elif self.task_model == 'multi':
            return np.mean(np.sum(-target*np.log(inputs+1e-14), 1))

    def back_forward(self, targets=None, loss=None, regularization=False):
        self.dWs = []
        self.dBs = []
        self.dAs = []
        W_reverse = self.W[::-1]
        activations_reverse = self.activations[::-1]
        before_activation_reverse = self.before_activation[::-1]
        # 从最后一层开始往回传播
        for k in range(self.layer_number):
            if(k == 0):
                if loss == 'MSE' or loss == 'CE' or loss == 'BE':
                    dZ = activations_reverse[k] - targets
                    dW = 1/self.batch_size*np.dot(activations_reverse[k+1].T, dZ)
                    dB = 1/self.batch_size*np.sum(dZ, axis = 0, keepdims = True)
                    dA_before = np.dot(dZ, W_reverse[k].T)
                    self.dWs.append(dW)
                    self.dBs.append(dB)
                    self.dAs.append(dA_before)
            else:
                dZ = self.dAs[k-1]*self.sigmoid_gradient(before_activation_reverse[k])
                dW = 1/self.batch_size*np.dot(activations_reverse[k+1].T,dZ)
                dB = 1/self.batch_size*np.sum(dZ, axis = 0, keepdims = True)
                dA_before = np.dot(dZ, W_reverse[k].T)
                self.dWs.append(dW)
                self.dBs.append(dB)
                self.dAs.append(dA_before)
        self.dWs = self.dWs[::-1]
        self.dBs = self.dBs[::-1]
        
    def steps(self, lr=0.001, lr_decay=False):
        for index in range(len(self.dWs)):
            self.W[index] -= lr*self.dWs[index]
            self.B[index] -= lr*self.dBs[index]

    def train(self, train_datas=None, train_targets=None, train_epoch=1, lr=0.001, lr_decay=False, loss='MSE', regularization=False, display=False):
        train_counts = 0
        for epoch in range(train_epoch):
            if epoch == int(train_epoch * 0.7) and lr_decay == True:
                lr *= 0.1
            train_steps = train_datas.shape[0] // self.batch_size
            for i in range(train_steps):
                input_data = train_datas[self.batch_size*i : self.batch_size*(i+1), :].reshape(self.batch_size, train_datas.shape[1])
                targets = train_targets[self.batch_size*i : self.batch_size*(i+1), :].reshape(self.batch_size, train_targets.shape[1])
                prediction = self.forward(input_data)
                forward_loss = self.lossfunction(prediction, targets)
                if self.task_model=='logistic':
                    accuracy = np.sum((prediction>0.6) == targets) / targets.shape[0]
                    self.total_accuracy.append(accuracy)
                elif self.task_model=='multi':
                    accuracy = np.sum(np.argmax(prediction,1) == np.argmax(targets,1)) / targets.shape[0]
                    self.total_accuracy.append(accuracy)                    
                self.total_loss.append(forward_loss)
                if display:
                    if train_counts % 10 == 0:
                        if self.task_model == 'logistic' or self.task_model == 'multi':
                            print("After " + str(train_counts) + ", loss is ", forward_loss,
                            ", accuracy is ", accuracy)
                        else:
                            print("After " + str(train_counts) + ", loss is ", forward_loss)
                self.back_forward(targets=targets, loss=loss, regularization=regularization)
                self.steps(lr=lr, lr_decay=lr_decay)
                train_counts += 1

    def save_model(self, path):
        print("Saving the " + self.name + " model ...")
        for i in range(self.layer_number):
            np.savetxt(path  + self.name + "_layer_" + str(i) + "_W.txt", self.W[i])
            np.savetxt(path  + self.name + "_layer_" + str(i) + "_B.txt", self.B[i])
        print("Model saved !!!")

这里我们要对代码进行分析,可以看到MINIST的多分类问题是属于self.task_model='multi'的。

网络参数

lr=0.01
batch_size=100
train_epochs=50

网络层数和神经元个数

[718,512,256,128,10]

画出图来应该是这样的:

BP神经网络最大Mu bp神经网络 mnist_数据_34

实验过程

截图:

BP神经网络最大Mu bp神经网络 mnist_数据_35

可以看到这个准确率有点微低哦~那应该怎么办呢?

1、增加迭代次数10到50:

BP神经网络最大Mu bp神经网络 mnist_神经网络_36


BP神经网络最大Mu bp神经网络 mnist_神经网络_37


Spyder要想展示loss和acc,需要这么设置

BP神经网络最大Mu bp神经网络 mnist_神经网络_38


2、??目前我没招,等我有灵感了,再来补充

心得

说实在的,Spyder真的好用,手感像vscode,太舒服了。python编辑器,真的种草Anaconda Spyder…