Results 1  10
of
20
Comparison of Approximate Methods for Handling Hyperparameters
 NEURAL COMPUTATION
"... I examine two approximate methods for computational implementation of Bayesian hierarchical models, that is, models which include unknown hyperparameters such as regularization constants and noise levels. In the 'evidence framework' the model parameters are integrated over, and the resulting evid ..."
Abstract

Cited by 67 (1 self)
 Add to MetaCart
I examine two approximate methods for computational implementation of Bayesian hierarchical models, that is, models which include unknown hyperparameters such as regularization constants and noise levels. In the 'evidence framework' the model parameters are integrated over, and the resulting evidence is maximized over the hyperparameters. The optimized
Bayesian and Regularization Methods for Hyperparameter Estimation in Image Restoration
 IEEE Trans. Image Processing
, 1999
"... In this paper, we propose the application of the hierarchical Bayesian paradigm to the image restoration problem. We derive expressions for the iterative evaluation of the two hyperparameters applying the evidence and maximum a posteriori (MAP) analysis within the hierarchical Bayesian paradigm. We ..."
Abstract

Cited by 65 (26 self)
 Add to MetaCart
In this paper, we propose the application of the hierarchical Bayesian paradigm to the image restoration problem. We derive expressions for the iterative evaluation of the two hyperparameters applying the evidence and maximum a posteriori (MAP) analysis within the hierarchical Bayesian paradigm. We show analytically that the analysis provided by the evidence approach is more realistic and appropriate than the MAP approach for the image restoration problem. We furthermore study the relationship between the evidence and an iterative approach resulting from the set theoretic regularization approach for estimating the two hyperparameters, or their ratio, defined as the regularization parameter. Finally the proposed algorithms are tested experimentally.
The Relationship between PAC, the Statistical Physics framework, the Bayesian framework, and the VC framework
"... This paper discusses the intimate relationships between the supervised learning frameworks mentioned in the title. In particular, it shows how all those frameworks can be viewed as particular instances of a single overarching formalism. In doing this many commonly misunderstood aspects of those fram ..."
Abstract

Cited by 40 (7 self)
 Add to MetaCart
This paper discusses the intimate relationships between the supervised learning frameworks mentioned in the title. In particular, it shows how all those frameworks can be viewed as particular instances of a single overarching formalism. In doing this many commonly misunderstood aspects of those frameworks are explored. In addition the strengths and weaknesses of those frameworks are compared, and some novel frameworks are suggested (resulting, for example, in a "correction" to the familiar biasplusvariance formula).
A Review of Bayesian Neural Networks with an Application to Near Infrared Spectroscopy
 IEEE Transactions on Neural Networks
, 1995
"... MacKay's Bayesian framework for backpropagation is a practical and powerful means to improve the generalisation ability of neural networks. It is based on a Gaussian approximation to the posterior weight distribution. The framework is extended, reviewed and demonstrated in a pedagogical way. The ..."
Abstract

Cited by 36 (0 self)
 Add to MetaCart
MacKay's Bayesian framework for backpropagation is a practical and powerful means to improve the generalisation ability of neural networks. It is based on a Gaussian approximation to the posterior weight distribution. The framework is extended, reviewed and demonstrated in a pedagogical way. The notation is simplified using the ordinary weight decay parameter, and a detailed and explicit procedure for adjusting several weight decay parameters is given. Bayesian backprop is applied in the prediction of fat content in minced meat from near infrared spectra. It outperforms "early stopping" as well as quadratic regression. The evidence of a committee of differently trained networks is computed, and the corresponding improved generalisation is verified. The error bars on the predictions of the fat content are computed. There are three contributors: The random noise, the uncertainty in the weights, and the deviation among the committee members. The Bayesian framework is compare...
Bayesian Regularisation and Pruning using a Laplace Prior
 Neural Computation
, 1994
"... Standard techniques for improved generalisation from neural networks include weight decay and pruning. Weight decay has a Bayesian interpretation with the decay function corresponding to a prior over weights. The method of transformation groups and maximum entropy indicates a Laplace rather than a G ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
Standard techniques for improved generalisation from neural networks include weight decay and pruning. Weight decay has a Bayesian interpretation with the decay function corresponding to a prior over weights. The method of transformation groups and maximum entropy indicates a Laplace rather than a Gaussian prior. After training, the weights then arrange themselves into two classes: (1) those with a common sensitivity to the data error (2) those failing to achieve this sensitivity and which therefore vanish. Since the critical value is determined adaptively during training, pruningin the sense of setting weights to exact zerosbecomes a consequence of regularisation alone. The count of free parameters is also reduced automatically as weights are pruned. A comparison is made with results of MacKay using the evidence framework and a Gaussian regulariser. 1 Introduction Neural networks designed for regression or classification need to be trained using some form of stabilisation or re...
Ace of Bayes: Application of Neural Networks with Pruning
 The Danish Meat Research Institute, Maglegaardsvej 2, DK4000
, 1993
"... MacKay's Bayesian framework for backpropagation is a practical and powerful means of improving the generalisation ability of neural networks. The framework is reviewed and extended in a pedagogical way. The notation is simplified using the ordinary weight decay parameter, and the noise parameter fi ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
MacKay's Bayesian framework for backpropagation is a practical and powerful means of improving the generalisation ability of neural networks. The framework is reviewed and extended in a pedagogical way. The notation is simplified using the ordinary weight decay parameter, and the noise parameter fi is shown to be nothing more than an overall scale. A detailed and explicit procedure for adjusting several weight decay parameters is given. Pruning is incorporated into the Bayesian framework. Appropriate symmetry factors on sparse architectures are deduced. Bayesian weight decay is demonstrated using artificial data generated by a sparsely connected network. Pruning yields computational advantages: by removing unimportant weights the posterior weight distribution becomes Gaussian, and pruning removes zeromodes of the Hessian and redundant hidden units. In addition, pruning improves generalisation. The Bayesian evidence is used as a stop criterion for pruning. Bayesian backprop is applied ...
Bayesian Methods for Neural Networks: Theory and Applications
, 1995
"... this document. Before these are discussed however, perhaps we should have a tutorial on Bayesian probability theory and its application to model comparison problems. 2 Probability theory and Occam's razor ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
this document. Before these are discussed however, perhaps we should have a tutorial on Bayesian probability theory and its application to model comparison problems. 2 Probability theory and Occam's razor
Bayesian Backpropagation Over IO Functions Rather Than Weights
 Advances in Neural Information Processing Systems 6
, 1994
"... 1 INTRODUCTION In the conventional Bayesian view of backpropagation (BP) (Buntine and Weigend, 1991; Nowlan and Hinton, 1994; MacKay, 1992; Wolpert, 1993), one starts with the "likelihood" conditional distribution P(training set = t  weight vector w) and the "prior" distribution P(w). As an exampl ..."
Abstract

Cited by 13 (5 self)
 Add to MetaCart
1 INTRODUCTION In the conventional Bayesian view of backpropagation (BP) (Buntine and Weigend, 1991; Nowlan and Hinton, 1994; MacKay, 1992; Wolpert, 1993), one starts with the "likelihood" conditional distribution P(training set = t  weight vector w) and the "prior" distribution P(w). As an example, in regression one might have a "Gaussian likelihood", P(t  w) µ exp[c 2 (w, t)] º P i exp [{net(w, t X (i))  t y (i)} 2 / 2s 2 ] for some constant s. (t X (i) and t Y (i) are the successive input and output values in the training set respectively, and net(w, .) is the function, induced by w, taking input neuron values to output neuron values.) As another example, the "weight decay" (Gaussian) prior is P(w) µ exp(a(w 2 )) for some constant a. Bayes' theorem tells us that P(w  t) µ P(t  w) P(w). Accordingly, the most probable weight given the data  the "maximum a posteriori" (MAP) w  is the mode over w of P(t  w) P(w), which equals the mode over w of the "cost function" ...
What Bayes Has To Say About The Evidence Procedure
 Maximum Entropy and Bayesian Methods Conference
, 1994
"... The "evidence" procedure for setting hyperparameters is essentially the same as the techniques of MLII and generalized maximum likelihood. Unlike those older techniques however, the evidence procedure has been justified (and used) as an approximation to the hierarchical Bayesian calculation. We use ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
The "evidence" procedure for setting hyperparameters is essentially the same as the techniques of MLII and generalized maximum likelihood. Unlike those older techniques however, the evidence procedure has been justified (and used) as an approximation to the hierarchical Bayesian calculation. We use several examples to explore the validity of this justification. Then we derive upper and (often large) lower bounds on the difference between the evidence procedure's answer and the hierarchical Bayesian answer, for many different quantities. We also touch on subjects like the close relationship between the evidence procedure and maximum likelihood, and the selfconsistency of deriving priors by "firstprinciples" arguments that don't set the values of hyperparameters. "... any inference must be based on strict adherence to the laws of probability theory, because any deviation automatically leads to inconsistency."  S. Gull, in [5] "(Some have) estimated alpha from the data and then procee...
Efficient Covariance Matrix Methods for Bayesian Gaussian Processes and Hopfield Neural Networks
, 1999
"... Covariance matrices are important in many areas of neural modelling. In Hopfield networks they are used to form the weight matrix which controls the autoassociative properties of the network. In Gaussian processes, which have been shown to be the infinite neuron limit of many regularised feedforward ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Covariance matrices are important in many areas of neural modelling. In Hopfield networks they are used to form the weight matrix which controls the autoassociative properties of the network. In Gaussian processes, which have been shown to be the infinite neuron limit of many regularised feedforward neural networks, covariance matrices control the form of Bayesian prior distribution over function space. This thesis examines interesting modifications to the standard covariance matrix methods to increase functionality or efficiency of these neural techniques. Firstly the problem of adapting Gaussian process priors to perform regression on switching regimes is tackled. This involves the use of block covariance matrices and Gibbs sampling methods. Then the use of Toeplitz methods is proposed for Gaussian process regression where sampling positions can be chosen. A comparison is made between Hopfield weight matrices, and sample covariances. This allows work on sample covariances to be used ...