在以上的训练过程中,输出增量向后传播以获得隐藏节点增量,这个过程与前面讨论的增量规则基本相同。
Other than Steps 3 and 4, in which theoutput delta propagates backward to obtain the hidden node delta, this processis basically the same as that of the delta rule, which was previouslydiscussed.
虽然这个例子只包含了一个隐藏层,但是后向传播算法足以应用于更多隐藏层的训练中。
Although this example has only one hiddenlayer, the back-propagation algorithm is applicable for training many hiddenlayers.
只需要对每一个隐藏层重复步骤3即可。
Just repeat Step 3 of the previousalgorithm for each hidden layer.
示例:反向传播(Example: Back-Propagation)
在本节中,我们实现反向传播算法。
In this section, we implement theback-propagation algorithm.
训练数据包含四个元素,如下表所示。
The training data contains four elements asshown in the following table.
当然,因为这仍然是监督学习,训练数据包括输入和正确的输出对。
Of course, as this is about supervisedlearning, the data includes input and correct output pairs.
最右边加粗的数据是正确输出。
The bolded rightmost number of the data isthe correct output.
正如你可能已经注意到的,这些数据与我们在第二章中用于训练单层神经网络的数据相同;正是单层神经网络未能正确学习的数据。
As you may have noticed, this data is thesame one that we used in Chapter 2 for the training of the single-layer neuralnetwork; the one that the single-layer neural network had failed to learn.
忽略输入数据的z轴第三值,这个数据集实际上提供了异或逻辑运算。(即x和y相同输出0,不同输出1)
Ignoring the third value, the Z-axis, ofthe input, this dataset actually provides the XOR logic operations.
因此,如果我们用这个数据集训练神经网络,我们将得到XOR运算模型。
Therefore, if we train the neural networkwith this dataset, we would get the XOR operation model.
考虑一个由三个输入节点和一个输出节点组成的神经网络,如图3-8所示。
Consider a neural network that consists ofthree input nodes and a single output node, as shown in Figure 3-8.
图3-8 由三个输入节点和一个输出节点组成的神经网络Neural networkthat consists of three input nodes and a single output node
它包括一个由四个节点组成的隐藏层。
It has one hidden layer of four nodes.
隐藏层和输出层的激活函数采用sigmoid函数。
The sigmoid function is used as theactivation function for the hidden nodes and the output node.
本部分采用SGD实现反向传播算法。
This section employs SGD for theimplementation of the back-propagation algorithm.
当然,批处理方法也是可行的。
Of course, the batch method will work aswell.
我们需要做的只是使用权重更新的平均值,如第二章“示例:增量规则”部分中的例子所示。
What we have to do is use the average ofthe weight updates, as shown in the example in the “Example: Delta Rule”section of Chapter 2.
由于本节的主要目的是理解反向传播算法,因此我们将使用一种更简单、更直观的方法:SGD。
Since the primary objective of this sectionis to understand the back-propagation algorithm, we will stick to a simpler andmore intuitive method: the SGD.
异或问题(XOR Problem)
函数BackpropXOR使用SGD方法实现反向传播算法,它获取网络的权重和训练数据,并返回调整后的权重。
The function BackpropXOR, which implementsthe back-propagation algorithm using the SGD method, takes the network’sweights and training data and returns the adjusted weights.
[W1 W2] = BackpropXOR(W1, W2, X, D)
其中W1和W2分别为对应层的加权矩阵。
where W1 and W2 carries the weight matrixof the respective layer.
W1为输入层与隐藏层之间的加权矩阵,W2为隐藏层与输出层之间的加权矩阵。
W1 is the weight matrix between the inputlayer and hidden layer and W2 is the weight matrix between the hidden layer andoutput layer.
X和D分别为训练数据的输入和正确输出数据。
X and D are the input and correct output ofthe training data, respectively.
——本文译自Phil Kim所著的《Matlab Deep Learning》