循环神经反向传播循环神经网络原理步骤

转载

mob6454cc67bcfb 2024-04-08 20:30:30

文章标签 循环神经反向传播 python 深度学习人工智能神经网络 文章分类 深度学习人工智能

文章目录

一、循环核
二、循环核时间步展开
三、循环计算层
四、TF描述循环计算层
五、循环计算过程

1、RNN实现单个字母预测

（1）过程
（2）完整代码

2、RNN实现输入多个字母，预测一个字母

（1）过程
（2）完整代码

六、补充

1、AttributeError: 'tuple' object has no attribute 'shape'

一、循环核

首先回顾下卷积神经网络：

卷积核：参数空间共享，卷积层提取空间信息。
卷积神经网络：借助卷积核提取空间特征后，送入全连接网络。

然后引入循环核：

循环核：参数时间共享，循环曾提取时间信息。循环核具有记忆力，通过不同时刻的参数共享，实现了对时间序列的信息提取

循环核表示为下面结构，中间圆柱是记忆体，可以设定记忆体的个数，改变记忆容量，当记忆体个数被指定时，输入xt、输出yt维度被指定，周围这些待训练参数（Why，Whh，Wxh）的维度也就被限定了。

循环神经反向传播循环神经网络原理步骤_人工智能

记忆体内存储着每个时刻的状态信息ht，记忆体当前时刻存储的状态信息ht，等于下式：

循环神经反向传播循环神经网络原理步骤_人工智能_02

当前时刻循环核的输出特征yt，等于下式：

循环神经反向传播循环神经网络原理步骤_深度学习_03

前向传播时，记忆体内存储的状态信息ht，在每个时刻都被刷新，三个参数矩阵wxh、whh、why自始至终都是固定不变的。
反向传播时，三个参数矩阵wxh、whh、why被梯度下降法更新。

二、循环核时间步展开

按照时间步展开，就是把循环核按照时间轴方向展开，表示图如下：

循环神经反向传播循环神经网络原理步骤_神经网络_04

每个时刻记忆体状态信息ht被刷新，记忆体周围的参数矩阵wxh、whh、why是固定不变的，我们训练优化的就是这些参数矩阵，训练完成后，使用效果最好的参数矩阵，执行前向传播，输出预测结果。

循环神经网络：借助循环核提取时间特征后，送入全连接网络，实现连续数据的预测。

yt是整个循环网络的末层，从公式来看，就是一个全连接网络，实现连续数据预测。

三、循环计算层

每个循环核构成一层循环计算层，循环计算层的层数是向输出方向增长的，也就是循环核是纵向连接的，有几个循环核就是基层循环计算层，它们中的每个循环核中记忆体的个数，是根据需求任意指定的。

循环神经反向传播循环神经网络原理步骤_python_05

四、TF描述循环计算层

TF描述循环计算层的函数：

循环神经反向传播循环神经网络原理步骤_python_06

一般来讲，中间的层循环核用True，每个时间步都把ht输出给下一层，最后一层的循环核用False，仅在最后一个时间步输出ht。

True，每个时间步都把ht输出给下一层
False，仅在最后一个时间步输出ht

API对送入循环层的数据维度是有要求的，要求送入RNN的数据是三维的，格式如下：

例子如下：
一共要送入RNN层两组数据，每组数据经过一个时间步就会得到输出结果，每个时间步送入三个数值，因此输入循环层的数据维度就是[2,1,3]

循环神经反向传播循环神经网络原理步骤_深度学习_07

还有一组数据，分四个时间步送入循环层，每个时间步送入两个数值，因此输入循环层的数据维度就是[1,4,2]

五、循环计算过程

1、RNN实现单个字母预测

（1）过程

用字母预测的例子来展示循环计算过程，规则如下：

循环神经反向传播循环神经网络原理步骤_循环神经反向传播_08

1）首先，因为网络输入的都是数字，因此将字母以独热码形式进行编码表示；

循环神经反向传播循环神经网络原理步骤_python_09

2）随机生成Wxh、Whh、Why三个参数矩阵，记忆体的个数选取3，结构如下：

循环神经反向传播循环神经网络原理步骤_深度学习_10

3）ht的计算如下：

循环神经反向传播循环神经网络原理步骤_神经网络_11

4）过tanh激活函数后，得到当前时刻的状态信息ht，因此需要刷新记忆体存储的状态信息，也就是替换ht-1：

循环神经反向传播循环神经网络原理步骤_神经网络_12

5）然后计算输出yt，这里yt输出是把提取到的时间信息，通过全连接进行识别预测的过程：

循环神经反向传播循环神经网络原理步骤_人工智能_13

（2）完整代码

循环神经反向传播循环神经网络原理步骤_神经网络_14

# import相关模块
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Dense, SimpleRNN
import matplotlib.pyplot as plt
import os

# 用到的字母是abcde
input_word = "abcde"
# 为了送入神经网络，把字母表示为01234
w_to_id = {'a': 0, 'b': 1, 'c': 2, 'd': 3, 'e': 4}  # 单词映射到数值id的词典
# 01234编码为独热码
id_to_onehot = {0: [1., 0., 0., 0., 0.], 1: [0., 1., 0., 0., 0.], 2: [0., 0., 1., 0., 0.], 3: [0., 0., 0., 1., 0.],
                4: [0., 0., 0., 0., 1.]}  # id编码为one-hot

# 生成训练用的输入特征x_train，标签y_train
x_train = [id_to_onehot[w_to_id['a']], id_to_onehot[w_to_id['b']], id_to_onehot[w_to_id['c']],
           id_to_onehot[w_to_id['d']], id_to_onehot[w_to_id['e']]]
y_train = [w_to_id['b'], w_to_id['c'], w_to_id['d'], w_to_id['e'], w_to_id['a']]

# 打乱训练集顺序
np.random.seed(7)
np.random.shuffle(x_train)
np.random.seed(7)
np.random.shuffle(y_train)
tf.random.set_seed(7)

# 把输入特征变成RNN层期待的形状
# 使x_train符合SimpleRNN输入要求：[送入样本数， 循环核时间展开步数， 每个时间步输入特征个数]。
# 此处整个数据集送入，送入样本数为len(x_train)；输入1个字母出结果，循环核时间展开步数为1; 表示为独热码有5个输入特征，每个时间步输入特征个数为5
x_train = np.reshape(x_train, (len(x_train), 1, 5))
# y_train变为numpy格式
y_train = np.array(y_train)

# 搭建具有3个记忆体的循环层，3是自己随意设定的
model = tf.keras.Sequential([
    SimpleRNN(3),
    # 全连接层，实现了输出层yt的计算
    Dense(5, activation='softmax')
])

# 配置训练参数
model.compile(optimizer=tf.keras.optimizers.Adam(0.01),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
              metrics=['sparse_categorical_accuracy'])

checkpoint_save_path = "./checkpoint/rnn_onehot_1pre1.ckpt"

if os.path.exists(checkpoint_save_path + '.index'):
    print('-------------load the model-----------------')
    model.load_weights(checkpoint_save_path)

cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_save_path,
                                                 save_weights_only=True,
                                                 save_best_only=True,
                                                 monitor='loss')  # 由于fit没有给出测试集，不计算测试集准确率，根据loss，保存最优模型

# 执行训练过程
history = model.fit(x_train, y_train, batch_size=32, epochs=100, callbacks=[cp_callback])
# 显示网络参数和结构
model.summary()

# print(model.trainable_variables)
file = open('./weights.txt', 'w')  # 参数提取
for v in model.trainable_variables:
    file.write(str(v.name) + '\n')
    file.write(str(v.shape) + '\n')
    file.write(str(v.numpy()) + '\n')
file.close()

###############################################    show   ###############################################

# 显示训练集和验证集的acc和loss曲线
acc = history.history['sparse_categorical_accuracy']
loss = history.history['loss']

plt.subplot(1, 2, 1)
plt.plot(acc, label='Training Accuracy')
plt.title('Training Accuracy')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(loss, label='Training Loss')
plt.title('Training Loss')
plt.legend()
plt.show()

############### predict #############
# 展示效果的应用程序
# 先输入要执行几次预测任务
preNum = int(input("input the number of test alphabet:"))
for i in range(preNum):
    # 等待输入一个字母，并转换为独热码
    alphabet1 = input("input test alphabet:")
    alphabet = [id_to_onehot[w_to_id[alphabet1]]]
    # 使alphabet符合SimpleRNN输入要求：
    # [送入样本数， 循环核时间展开步数， 每个时间步输入特征个数]。此处验证效果送入了1个样本，送入样本数为1；输入1个字母出结果，所以循环核时间展开步数为1; 表示为独热码有5个输入特征，每个时间步输入特征个数为5
    alphabet = np.reshape(alphabet, (1, 1, 5))
    # predict输出预测结果
    result = model.predict(alphabet)
    # 选出预测结果最大的一个
    pred = tf.argmax(result, axis=1)
    pred = int(pred)
    tf.print(alphabet1 + '->' + input_word[pred])

2、RNN实现输入多个字母，预测一个字母

（1）过程

以连续输入四个字母，预测下一个字母的例子，来介绍循环核按时间展开后，循环计算过程。

1）使用三个记忆体，初始时刻，记忆体内的记忆是0，在这个过程中的每个时刻参数矩阵是固定的，记忆体会在每个时刻被更新。

2）在第一个时刻，b的独热码[0,1,0,0,0]输入，记忆体根据更新公式刷新为[-0.9,0.2,0.2]

3）在第二个时刻，c的独热码[0,0,1,0,0]输入，记忆体根据更新公式刷新为[0.8,1.0,0.8]

4）在第三个时刻，d的独热码[0,0,0,1,0]输入，记忆体根据更新公式刷新为[0.6,0.5,-0.1]

循环神经反向传播循环神经网络原理步骤_python_15

5）在第四个时刻，e的独热码[0,0,0,0,1]输入，记忆体根据更新公式刷新为[-1.0,-1.0,0.8]

6）输出预测通过全连接完成，代入yt计算公式，得到[0.71,0.14,0.10,0.05,0.00]

（2）完整代码

import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Dense, SimpleRNN
import matplotlib.pyplot as plt
import os

# 用到的字母是abcde
input_word = "abcde"
# 为了送入神经网络，把字母表示为01234
w_to_id = {'a': 0, 'b': 1, 'c': 2, 'd': 3, 'e': 4}  # 单词映射到数值id的词典
# 01234编码为独热码
id_to_onehot = {0: [1., 0., 0., 0., 0.], 1: [0., 1., 0., 0., 0.], 2: [0., 0., 1., 0., 0.], 3: [0., 0., 0., 1., 0.],
                4: [0., 0., 0., 0., 1.]}  # id编码为one-hot

# 生成训练用的输入特征x_train，标签y_train
x_train = [
    [id_to_onehot[w_to_id['a']], id_to_onehot[w_to_id['b']], id_to_onehot[w_to_id['c']], id_to_onehot[w_to_id['d']]],
    [id_to_onehot[w_to_id['b']], id_to_onehot[w_to_id['c']], id_to_onehot[w_to_id['d']], id_to_onehot[w_to_id['e']]],
    [id_to_onehot[w_to_id['c']], id_to_onehot[w_to_id['d']], id_to_onehot[w_to_id['e']], id_to_onehot[w_to_id['a']]],
    [id_to_onehot[w_to_id['d']], id_to_onehot[w_to_id['e']], id_to_onehot[w_to_id['a']], id_to_onehot[w_to_id['b']]],
    [id_to_onehot[w_to_id['e']], id_to_onehot[w_to_id['a']], id_to_onehot[w_to_id['b']], id_to_onehot[w_to_id['c']]],
]
y_train = [w_to_id['e'], w_to_id['a'], w_to_id['b'], w_to_id['c'], w_to_id['d']]

# 打乱训练集顺序
np.random.seed(7)
np.random.shuffle(x_train)
np.random.seed(7)
np.random.shuffle(y_train)
tf.random.set_seed(7)

# 把输入特征变成RNN层期待的形状
# 使x_train符合SimpleRNN输入要求：[送入样本数， 循环核时间展开步数， 每个时间步输入特征个数]。
# 此处整个数据集送入，送入样本数为len(x_train)；输入4个字母出结果，循环核时间展开步数为4; 表示为独热码有5个输入特征，每个时间步输入特征个数为5
x_train = np.reshape(x_train, (len(x_train), 4, 5))
# y_train变为numpy格式
y_train = np.array(y_train)

# 搭建具有3个记忆体的循环层，3是自己随意设定的
model = tf.keras.Sequential([
    SimpleRNN(3),
# 全连接层，实现了输出层yt的计算
    Dense(5, activation='softmax')
])

# 配置训练参数
model.compile(optimizer=tf.keras.optimizers.Adam(0.01),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
              metrics=['sparse_categorical_accuracy'])

checkpoint_save_path = "./checkpoint/rnn_onehot_4pre1.ckpt"

if os.path.exists(checkpoint_save_path + '.index'):
    print('-------------load the model-----------------')
    model.load_weights(checkpoint_save_path)

cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_save_path,
                                                 save_weights_only=True,
                                                 save_best_only=True,
                                                 monitor='loss')  # 由于fit没有给出测试集，不计算测试集准确率，根据loss，保存最优模型

# 执行训练过程
history = model.fit(x_train, y_train, batch_size=32, epochs=100, callbacks=[cp_callback])

model.summary()

# print(model.trainable_variables)
file = open('./weights.txt', 'w')  # 参数提取
for v in model.trainable_variables:
    file.write(str(v.name) + '\n')
    file.write(str(v.shape) + '\n')
    file.write(str(v.numpy()) + '\n')
file.close()

###############################################    show   ###############################################

# 显示训练集和验证集的acc和loss曲线
acc = history.history['sparse_categorical_accuracy']
loss = history.history['loss']

plt.subplot(1, 2, 1)
plt.plot(acc, label='Training Accuracy')
plt.title('Training Accuracy')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(loss, label='Training Loss')
plt.title('Training Loss')
plt.legend()
plt.show()

############### predict #############

preNum = int(input("input the number of test alphabet:"))
for i in range(preNum):
    alphabet1 = input("input test alphabet:")
    alphabet = [id_to_onehot[w_to_id[a]] for a in alphabet1]
    # 使alphabet符合SimpleRNN输入要求：[送入样本数， 循环核时间展开步数， 每个时间步输入特征个数]。
    # 此处验证效果送入了1个样本，送入样本数为1；输入4个字母出结果，所以循环核时间展开步数为4; 表示为独热码有5个输入特征，每个时间步输入特征个数为5
    alphabet = np.reshape(alphabet, (1, 4, 5))
    result = model.predict(alphabet)
    pred = tf.argmax(result, axis=1)
    pred = int(pred)
    tf.print(alphabet1 + '->' + input_word[pred])

六、补充

1、AttributeError: ‘tuple’ object has no attribute ‘shape’

预测部分的原始代码是这样的：

result = model.predict([alphabet])

然后在运行的时候，可以正常训练，但是预测总是出错，指向上面这一行，问题是 AttributeError: 'tuple' object has no attribute 'shape'，也没太看懂是什么意思，但是查资料看到predict里面好像直接写预测样本就可以，就尝试把[]去掉，然后成功解决哈哈哈
修改过后：