神经网络的泛化性 神经泛化现象_sed


本文为荷兰代尔夫特理工大学(作者:Aarnoud Hoekstra)的毕业论文,共136页。

在模式识别实践中,使用神经网络,或者也被称为神经分类器,已经成为一种普遍的做法。与最近邻法等经典算法相比,神经网络被认为是非常强大的分类器。神经网络应用中所使用的算法能够根据有限的、通常是少量的训练实例找到一个较好的分类器。从模式识别的角度来看,这种能力也被称为泛化,因为能够使用相对较小的数据集来估计一组较大的参数,因此引起了人们的兴趣。

本文研究了神经网络的泛化行为。特别是,如何发现这种行为以及哪些因素影响这种行为的问题得到了回答。为了能够回答这些问题,需要对泛化的概念有一个适当的理解,这样就可以比较从引入的技术中获得的结果。因此,引入了广义化的操作定义,即分类器在一组测试样本上产生的预期错误数。

本文所研究的神经分类器集仅限于反向传播训练分类器和k近邻法、线性和二次型分类器等经典算法。第一个目标是对神经网络的行为有更多的了解,一种度量可以应用于神经和经典分类器。通过使用非线性度量,可以了解神经网络的泛化行为,此度量显示神经网络在多大程度上适应了数据集。它对数据的适应性越强,泛化误差就越小。然而,对数据的过度适应意味着高度非线性,会导致非泛化分类器。通过在训练过程中监测非线性测量值,可以避免这种情况。非线性测度的定义也适用于非神经分类器。这使我们能够比较经典的分类器和神经网络。结果表明,神经网络从线性解开始,逐渐以非线性方式适应。对于神经分类器来说,这种适应性甚至比经典分类器更强,从而导致更大的非线性。因此,与经典分类器相比,神经网络具有更大的有效容量。影响泛化行为的另一个因素是网络的结构。

在本文中,网络架构是由隐含单元的数量决定的。如果隐含单元的数量大于需要值,但并不太大,则网络学习速度会更快。这是由于网络中存在冗余。冗余能够导致神经元聚集。在学习周期的开始,神经元执行大致相同的功能。随着训练的进行,神经元开始分化。这种分化被称为对称性破坏,可以通过投影技术来可视化,投影技术映射隐含单元移动到二维平面上的高维空间。在这个二维平面上,可以描述训练过程中隐含单元的轨迹,以了解训练行为。最后,研究了神经网络分类的可靠性。训练周期结束后,人们对神经网络分类的可靠性感兴趣。这是通过估计分类的后验概率来实现的。这些概率可用于通过创建更好的网络或用于拒绝样本来提高网络可靠性。我们可以用置信值来估计概率。这些估计器可以通过三种技术来确定:网络输出、最近邻法和logistic估计器。利用这些估计量检验了网络分类的可靠性。然而,问题是这些估计器需要一个独立的测试集。这样的一个集合只能使用一次,以防止获得有偏的分类器。通过引入k近邻数据生成方法,解决了这一问题。使用这种方法,从学习集生成一个新的集合,这个新的集合称为验证集,可以作为测试集的替代。在应用了本文所提出的技术之后,我们可以得出这样的结论:我们对神经分类器的泛化行为有了更多的了解。特别值得注意的是,非线性度量能够客观地比较神经和非神经分类器。

The use of neural networks, or neural classifiersas they are also referred to, has become common practice in the patternrecognition practice. Neural networks are considered to be very powerfulclassifiers compared to classical algorithms such as the nearest neighbourmethod. The algorithms used in neural network applications are capable offinding a good classifier based on a limited and in general a small number oftraining examples. This capability, also referred to as generalisation, is ofinterest from a pattern recognition point of view since a large set ofparameters is estimated using a relatively small data set. In this thesis thegeneralisation behaviour of neural networks is studied. In particular, thequestion of how this behaviour can be detected and which factors influence itare answered. To be able to answers these questions, a proper understanding ofthe concept of generalisation is needed, such that the results obtained fromthe introduced techniques can be compared. Therefore, an operational definitionof generalisation is introduced, namely the number of expected errors made by aclassifier on a set of test samples. The set of neural classifiers studied inthis thesis was restricted to the class of backpropagation trained classifiersand classical algorithms as the k nearest neighbour method, linear andquadratic classifiers. The first objective is to gain more insight into thebehaviour of a neural network. Consecutively, a measure can be applied whichholds for both neural and classical classifiers. By using a nonlinearity measure,insight is obtained in the generalisation behaviour of a neural network. Thismeasure shows to which extent the network has adapted to the data set. Thebetter it adapts to the data, the smaller its generalisation error. Too muchadaptation to the data implying a high nonlinearity, however, results in anon-generalising classifier. By monitoring the value of the nonlinearitymeasure during training this might be avoided. The definition of thenonlinearity measure is such that it also applies to non-neural classifiers.This enables us to compare classical classifiers and neural networks. It showsthat neural networks start from a linear solution and gradually adapt in anonlinear fashion. This adaptation is even stronger, and hence resulting in alarger nonlinearity, for a neural classifier than a classical one. It mighttherefore indicate that a neural network has a larger effective capacitycompared to the classical classifiers. Another factor which influences thegeneralisation behaviour, is the architecture of a network. In this thesis thearchitecture is determined by the number of hidden units. If the number ofhidden units is larger than needed, but not too large, the network will learnfaster. This is due to the redundancy present in the network. Redundancy causesthe neurons to cluster. At the start of a learning cycle, neurons fullfilapproximately the same functions. As the training progresses, the neurons startto specialise. This specialisation, known as symmetry breaking, can bevisualised using a projection technique which maps the high dimensional spacein which the hidden units move onto a two dimensional plane. In this twodimensional plane the trajectories of the hidden units during training can bedepicted in order to understand the training behaviour. Finally, thereliability of a neural network classification was studied. After the trainingcycle one is interested in the reliability of the classifications made by aneural network. This is done by estimating the a posteriori probabilities ofthe classifications. These probabilities can be used to improve the networkreliability by creating better networks or used in rejecting samples. We canestimate the probabilities by using confidence value estimators. Theseestimators can be determined using three techniques: the network outputs, thenearest neighbour method and the logistic estimator. Using these estimators,the reliability of a network classification is checked. A problem, however, isthat these estimators need an independent test set. Such a set can only be usedonce in order to prevent obtained biased classifiers. This problem wascircumvented by the introduction of k nearest neighbour data generation method.Using this method a new set is generated from a learning set. This new set, referredto as validation set, can act as a substitute of a test set. After applying thetechniques presented in this thesis, it can be concluded that we have gainedmore insight into the generalisation behaviour of neural classifiers. Inparticular the nonlinearity measure is of interest since it enables thecomparison of neural and non-neural classifiers in an objective manner.

  1. 引言
  2. 泛化
  3. 分类器非线性
  4. 分类器冗余
  5. 分类可靠性
  6. 数字识别
  7. 结论