Results 1  10
of
39
The Evidence Framework applied to Classification Networks
 Neural Computation
, 1992
"... Three Bayesian ideas are presented for supervised adaptive classifiers. First, it is argued that the output of a classifier should be obtained by marginalising over the posterior distribution of the parameters; a simple approximation to this integral is proposed and demonstrated. This involves a `mo ..."
Abstract

Cited by 152 (10 self)
 Add to MetaCart
Three Bayesian ideas are presented for supervised adaptive classifiers. First, it is argued that the output of a classifier should be obtained by marginalising over the posterior distribution of the parameters; a simple approximation to this integral is proposed and demonstrated. This involves a `moderation' of the most probable classifier 's outputs, and yields improved performance. Second, it is demonstrated that the Bayesian framework for model comparison described for regression models in (MacKay, 1992a, 1992b) can also be applied to classification problems. This framework successfully chooses the magnitude of weight decay terms, and ranks solutions found using different numbers of hidden units. Third, an informationbased data selection criterion is derived and demonstrated within this framework. 1 Introduction A quantitative Bayesian framework has been described for learning of mappings in feedforward networks (MacKay, 1992a, 1992b). It was demonstrated that this `evidence' fram...
Efficient BackProp
, 1998
"... . The convergence of backpropagation learning is analyzed so as to explain common phenomenon observed by practitioners. Many undesirable behaviors of backprop can be avoided with tricks that are rarely exposed in serious technical publications. This paper gives some of those tricks, and offers expl ..."
Abstract

Cited by 125 (24 self)
 Add to MetaCart
. The convergence of backpropagation learning is analyzed so as to explain common phenomenon observed by practitioners. Many undesirable behaviors of backprop can be avoided with tricks that are rarely exposed in serious technical publications. This paper gives some of those tricks, and offers explanations of why they work. Many authors have suggested that secondorder optimization methods are advantageous for neural net training. It is shown that most "classical" secondorder methods are impractical for large neural networks. A few methods are proposed that do not have these limitations. 1 Introduction Backpropagation is a very popular neural network learning algorithm because it is conceptually simple, computationally efficient, and because it often works. However, getting it to work well, and sometimes to work at all, can seem more of an art than a science. Designing and training a network using backprop requires making many seemingly arbitrary choices such as the number ...
Training with Noise is Equivalent to Tikhonov Regularization
 Neural Computation
, 1994
"... It is well known that the addition of noise to the input data of a neural network during training can, in some circumstances, lead to significant improvements in generalization performance. Previous work has shown that such training with noise is equivalent to a form of regularization in which an ex ..."
Abstract

Cited by 113 (0 self)
 Add to MetaCart
It is well known that the addition of noise to the input data of a neural network during training can, in some circumstances, lead to significant improvements in generalization performance. Previous work has shown that such training with noise is equivalent to a form of regularization in which an extra term is added to the error function. However, the regularization term, which involves second derivatives of the error function, is not bounded below, and so can lead to difficulties if used directly in a learning algorithm based on error minimization. In this paper we show that, for the purposes of network training, the regularization term can be reduced to a positive definite form which involves only first derivatives of the network mapping. For a sumofsquares error function, the regularization term belongs to the class of generalized Tikhonov regularizers. Direct minimization of the regularized error function provides a practical alternative to training with noise. 1 Regularization...
Fast Exact Multiplication by the Hessian
 Neural Computation
, 1994
"... Just storing the Hessian H (the matrix of second derivatives d^2 E/dw_i dw_j of the error E with respect to each pair of weights) of a large neural network is difficult. Since a common use of a large matrix like H is to compute its product with various vectors, we derive a technique that directly ca ..."
Abstract

Cited by 70 (4 self)
 Add to MetaCart
Just storing the Hessian H (the matrix of second derivatives d^2 E/dw_i dw_j of the error E with respect to each pair of weights) of a large neural network is difficult. Since a common use of a large matrix like H is to compute its product with various vectors, we derive a technique that directly calculates Hv, where v is an arbitrary vector. This allows H to be treated as a generalized sparse matrix. To calculate Hv, we first define a differential operator R{f(w)} = (d/dr)f(w + rv)_{r=0}, note that R{grad_w} = Hv and R{w} = v, and then apply R{} to the equations used to compute grad_w. The result is an exact and numerically stable procedure for computing Hv, which takes about as much computation, and is about as local, as a gradient evaluation. We then apply the technique to backpropagation networks, recurrent backpropagation, and stochastic Boltzmann Machines. Finally, we show that this technique can be used at the heart of many iterative techniques for computing various properties of H, obviating the need for direct methods.
Curvature–driven smoothing: a learning algorithm for feedforward netsworks
 IEEE Transactions on Neural Networks
, 1993
"... AbstractThe performance of feedforward neural networks in real applications can often he improved significantly if use is made of a priori information. For interpolation problems this prior knowledge frequently includes smoothness requirements on the network mapping, and can he imposed by the addit ..."
Abstract

Cited by 39 (0 self)
 Add to MetaCart
AbstractThe performance of feedforward neural networks in real applications can often he improved significantly if use is made of a priori information. For interpolation problems this prior knowledge frequently includes smoothness requirements on the network mapping, and can he imposed by the addition to the emr function of suitable regularization terms. The new error function, however, now depends on the derivatives of the network mapping, and so the standard backpropagation algorithm cannot he applied. In this letter, we derive a computationally efficient learning algorithm, for a feedforward network of arbitrary topology, which can he used to minimize such error functions. Networks having a single hidden layer, for which the learning algorithm simplifies, are treated as a special case. a crossvalidation data set, or by a variety of techniques based on the statistical properties of the training data [2]. The formalism of (2) and (3) has also been used in the training of radial basis function
A Review of Bayesian Neural Networks with an Application to Near Infrared Spectroscopy
 IEEE Transactions on Neural Networks
, 1995
"... MacKay's Bayesian framework for backpropagation is a practical and powerful means to improve the generalisation ability of neural networks. It is based on a Gaussian approximation to the posterior weight distribution. The framework is extended, reviewed and demonstrated in a pedagogical way. The ..."
Abstract

Cited by 36 (0 self)
 Add to MetaCart
MacKay's Bayesian framework for backpropagation is a practical and powerful means to improve the generalisation ability of neural networks. It is based on a Gaussian approximation to the posterior weight distribution. The framework is extended, reviewed and demonstrated in a pedagogical way. The notation is simplified using the ordinary weight decay parameter, and a detailed and explicit procedure for adjusting several weight decay parameters is given. Bayesian backprop is applied in the prediction of fat content in minced meat from near infrared spectra. It outperforms "early stopping" as well as quadratic regression. The evidence of a committee of differently trained networks is computed, and the corresponding improved generalisation is verified. The error bars on the predictions of the fat content are computed. There are three contributors: The random noise, the uncertainty in the weights, and the deviation among the committee members. The Bayesian framework is compare...
Computing Second Derivatives in FeedForward Networks: a Review
 IEEE Transactions on Neural Networks
, 1994
"... . The calculation of second derivatives is required by recent training and analyses techniques of connectionist networks, such as the elimination of superfluous weights, and the estimation of confidence intervals both for weights and network outputs. We here review and develop exact and approximate ..."
Abstract

Cited by 27 (4 self)
 Add to MetaCart
. The calculation of second derivatives is required by recent training and analyses techniques of connectionist networks, such as the elimination of superfluous weights, and the estimation of confidence intervals both for weights and network outputs. We here review and develop exact and approximate algorithms for calculating second derivatives. For networks with jwj weights, simply writing the full matrix of second derivatives requires O(jwj 2 ) operations. For networks of radial basis units or sigmoid units, exact calculation of the necessary intermediate terms requires of the order of 2h + 2 backward/forwardpropagation passes where h is the number of hidden units in the network. We also review and compare three approximations (ignoring some components of the second derivative, numerical differentiation, and scoring). Our algorithms apply to arbitrary activation functions, networks, and error functions (for instance, with connections that skip layers, or radial basis functions, or ...
LocationAware Computing: A Neural Network Model For Determining Location In Wireless LANs
, 2002
"... The strengths of the RF signals arriving from more access points in a wireless LANs are related to the position of the mobile terminal and can be used to derive the location of the user. In a ..."
Abstract

Cited by 22 (1 self)
 Add to MetaCart
The strengths of the RF signals arriving from more access points in a wireless LANs are related to the position of the mobile terminal and can be used to derive the location of the user. In a
Improved Learning Algorithms for Mixture of Experts in Multiclass Classification
, 1999
"... Mixture of experts (ME) is a modular neural network architecture for supervised learning. A doubleloop ExpectationMaximization (EM) algorithm has been introduced to the ME architecture for adjusting the parameters and the iteratively reweighted least squares (IRLS) algorithm is used to perform max ..."
Abstract

Cited by 18 (3 self)
 Add to MetaCart
Mixture of experts (ME) is a modular neural network architecture for supervised learning. A doubleloop ExpectationMaximization (EM) algorithm has been introduced to the ME architecture for adjusting the parameters and the iteratively reweighted least squares (IRLS) algorithm is used to perform maximization in the inner loop [Jordan, M.I., Jacobs, R.A. (1994). Hierarchical mixture of experts and the EM algorithm, Neural Computation, 6(2), 181214]. However, it is reported in literature that the IRLS algorithm is of instability and the ME architecture trained by the EM algorithm, where IRLS algorithm is used in the inner loop, often produces the poor performance in multiclass classification. In this paper, the reason of this instability is explored. We find out that due to an implicitly imposed incorrect assumption on parameter independence in multiclass classification, an incomplete Hessian matrix is used in that IRLS algorithm. Based on this finding, we apply the NewtonRaphson met...
Ace of Bayes: Application of Neural Networks with Pruning
 The Danish Meat Research Institute, Maglegaardsvej 2, DK4000
, 1993
"... MacKay's Bayesian framework for backpropagation is a practical and powerful means of improving the generalisation ability of neural networks. The framework is reviewed and extended in a pedagogical way. The notation is simplified using the ordinary weight decay parameter, and the noise parameter fi ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
MacKay's Bayesian framework for backpropagation is a practical and powerful means of improving the generalisation ability of neural networks. The framework is reviewed and extended in a pedagogical way. The notation is simplified using the ordinary weight decay parameter, and the noise parameter fi is shown to be nothing more than an overall scale. A detailed and explicit procedure for adjusting several weight decay parameters is given. Pruning is incorporated into the Bayesian framework. Appropriate symmetry factors on sparse architectures are deduced. Bayesian weight decay is demonstrated using artificial data generated by a sparsely connected network. Pruning yields computational advantages: by removing unimportant weights the posterior weight distribution becomes Gaussian, and pruning removes zeromodes of the Hessian and redundant hidden units. In addition, pruning improves generalisation. The Bayesian evidence is used as a stop criterion for pruning. Bayesian backprop is applied ...