Results 1 
4 of
4
Information Geometry of the EM and em Algorithms for Neural Networks
 Neural Networks
, 1995
"... In order to realize an inputoutput relation given by noisecontaminated examples, it is effective to use a stochastic model of neural networks. A model network includes hidden units whose activation values are not specified nor observed. It is useful to estimate the hidden variables from the obs ..."
Abstract

Cited by 110 (9 self)
 Add to MetaCart
(Show Context)
In order to realize an inputoutput relation given by noisecontaminated examples, it is effective to use a stochastic model of neural networks. A model network includes hidden units whose activation values are not specified nor observed. It is useful to estimate the hidden variables from the observed or specified inputoutput data based on the stochastic model. Two algorithms, the EM  and emalgorithms, have so far been proposed for this purpose. The EMalgorithm is an iterative statistical technique of using the conditional expectation, and the emalgorithm is a geometrical one given by information geometry. The emalgorithm minimizes iteratively the KullbackLeibler divergence in the manifold of neural networks. These two algorithms are equivalent in most cases. The present paper gives a unified information geometrical framework for studying stochastic models of neural networks, by forcussing on the EM and em algorithms, and proves a condition which guarantees their equ...
Regression Modeling in BackPropagation and Projection Pursuit Learning
, 1994
"... We studied and compared two types of connectionist learning methods for modelfree regression problems in this paper. One is the popular backpropagation learning (BPL) well known in the artificial neural networks literature; the other is the projection pursuit learning (PPL) emerged in recent years ..."
Abstract

Cited by 70 (1 self)
 Add to MetaCart
We studied and compared two types of connectionist learning methods for modelfree regression problems in this paper. One is the popular backpropagation learning (BPL) well known in the artificial neural networks literature; the other is the projection pursuit learning (PPL) emerged in recent years in the statistical estimation literature. Both the BPL and the PPL are based on projections of the data in directions determined from interconnection weights. However, unlike the use of fixed nonlinear activations (usually sigmoidal) for the hidden neurons in BPL, the PPL systematically approximates the unknown nonlinear activations. Moreover, the BPL estimates all the weights simultaneously at each iteration, while the PPL estimates the weights cyclically (neuronbyneuron and layerbylayer) at each iteration. Although the BPL and the PPL have comparable training speed when based on a GaussNewton optimization algorithm, the PPL proves more parsimonious in that the PPL requires a fewer hi...
What's Wrong with A Cascaded Correlation Learning Network: A Projection Pursuit Learning Perspective
"... Cascaded correlation is a popular supervised learning architecture that dynamically grows layers of hidden neurons of fixed nonlinear activations (e.g., sigmoids), so that the network topology (size, depth) can be efficiently determined. Similar to a cascaded correlation learning network (CCLN), a p ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
(Show Context)
Cascaded correlation is a popular supervised learning architecture that dynamically grows layers of hidden neurons of fixed nonlinear activations (e.g., sigmoids), so that the network topology (size, depth) can be efficiently determined. Similar to a cascaded correlation learning network (CCLN), a projection pursuit learning network (PPLN) also dynamically grows the hidden neurons. Unlike a CCLN where cascaded connections from the existing hidden units to the new candidate hidden unit are required to establish highorder nonlinearity in approximating the residual error, a PPLN approximates the highorder nonlinearity by using (more flexible) trainable nonlinear nodal activation functions. Moreover, the maximum correlation training criterion used in a CCLN results in a poorer estimate of hidden weights when compared with the minimum mean squared error criterion used in a PPLN. The CCLN is thus excluded for most regression applications where smooth interpolation of functional values are ...