Results 1  10
of
21
Comparison of Approximate Methods for Handling Hyperparameters
 NEURAL COMPUTATION
"... I examine two approximate methods for computational implementation of Bayesian hierarchical models, that is, models which include unknown hyperparameters such as regularization constants and noise levels. In the 'evidence framework' the model parameters are integrated over, and the resu ..."
Abstract

Cited by 87 (1 self)
 Add to MetaCart
I examine two approximate methods for computational implementation of Bayesian hierarchical models, that is, models which include unknown hyperparameters such as regularization constants and noise levels. In the 'evidence framework' the model parameters are integrated over, and the resulting evidence is maximized over the hyperparameters. The optimized
Bayesian and Regularization Methods for Hyperparameter Estimation in Image Restoration
 IEEE Trans. Image Processing
, 1999
"... In this paper, we propose the application of the hierarchical Bayesian paradigm to the image restoration problem. We derive expressions for the iterative evaluation of the two hyperparameters applying the evidence and maximum a posteriori (MAP) analysis within the hierarchical Bayesian paradigm. We ..."
Abstract

Cited by 77 (28 self)
 Add to MetaCart
In this paper, we propose the application of the hierarchical Bayesian paradigm to the image restoration problem. We derive expressions for the iterative evaluation of the two hyperparameters applying the evidence and maximum a posteriori (MAP) analysis within the hierarchical Bayesian paradigm. We show analytically that the analysis provided by the evidence approach is more realistic and appropriate than the MAP approach for the image restoration problem. We furthermore study the relationship between the evidence and an iterative approach resulting from the set theoretic regularization approach for estimating the two hyperparameters, or their ratio, defined as the regularization parameter. Finally the proposed algorithms are tested experimentally.
The Relationship between PAC, the Statistical Physics framework, the Bayesian framework, and the VC framework
"... This paper discusses the intimate relationships between the supervised learning frameworks mentioned in the title. In particular, it shows how all those frameworks can be viewed as particular instances of a single overarching formalism. In doing this many commonly misunderstood aspects of those fram ..."
Abstract

Cited by 45 (8 self)
 Add to MetaCart
This paper discusses the intimate relationships between the supervised learning frameworks mentioned in the title. In particular, it shows how all those frameworks can be viewed as particular instances of a single overarching formalism. In doing this many commonly misunderstood aspects of those frameworks are explored. In addition the strengths and weaknesses of those frameworks are compared, and some novel frameworks are suggested (resulting, for example, in a "correction" to the familiar biasplusvariance formula).
A Review of Bayesian Neural Networks with an Application to Near Infrared Spectroscopy
 IEEE Transactions on Neural Networks
, 1995
"... MacKay's Bayesian framework for backpropagation is a practical and powerful means to improve the generalisation ability of neural networks. It is based on a Gaussian approximation to the posterior weight distribution. The framework is extended, reviewed and demonstrated in a pedagogical way ..."
Abstract

Cited by 43 (0 self)
 Add to MetaCart
(Show Context)
MacKay's Bayesian framework for backpropagation is a practical and powerful means to improve the generalisation ability of neural networks. It is based on a Gaussian approximation to the posterior weight distribution. The framework is extended, reviewed and demonstrated in a pedagogical way. The notation is simplified using the ordinary weight decay parameter, and a detailed and explicit procedure for adjusting several weight decay parameters is given. Bayesian backprop is applied in the prediction of fat content in minced meat from near infrared spectra. It outperforms "early stopping" as well as quadratic regression. The evidence of a committee of differently trained networks is computed, and the corresponding improved generalisation is verified. The error bars on the predictions of the fat content are computed. There are three contributors: The random noise, the uncertainty in the weights, and the deviation among the committee members. The Bayesian framework is compare...
Bayesian Regularisation and Pruning using a Laplace Prior
 Neural Computation
, 1994
"... Standard techniques for improved generalisation from neural networks include weight decay and pruning. Weight decay has a Bayesian interpretation with the decay function corresponding to a prior over weights. The method of transformation groups and maximum entropy indicates a Laplace rather than a G ..."
Abstract

Cited by 26 (0 self)
 Add to MetaCart
(Show Context)
Standard techniques for improved generalisation from neural networks include weight decay and pruning. Weight decay has a Bayesian interpretation with the decay function corresponding to a prior over weights. The method of transformation groups and maximum entropy indicates a Laplace rather than a Gaussian prior. After training, the weights then arrange themselves into two classes: (1) those with a common sensitivity to the data error (2) those failing to achieve this sensitivity and which therefore vanish. Since the critical value is determined adaptively during training, pruningin the sense of setting weights to exact zerosbecomes a consequence of regularisation alone. The count of free parameters is also reduced automatically as weights are pruned. A comparison is made with results of MacKay using the evidence framework and a Gaussian regulariser. 1 Introduction Neural networks designed for regression or classification need to be trained using some form of stabilisation or re...
Ace of Bayes: Application of Neural Networks with Pruning
 The Danish Meat Research Institute, Maglegaardsvej 2, DK4000
, 1993
"... MacKay's Bayesian framework for backpropagation is a practical and powerful means of improving the generalisation ability of neural networks. The framework is reviewed and extended in a pedagogical way. The notation is simplified using the ordinary weight decay parameter, and the noise paramete ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
(Show Context)
MacKay's Bayesian framework for backpropagation is a practical and powerful means of improving the generalisation ability of neural networks. The framework is reviewed and extended in a pedagogical way. The notation is simplified using the ordinary weight decay parameter, and the noise parameter fi is shown to be nothing more than an overall scale. A detailed and explicit procedure for adjusting several weight decay parameters is given. Pruning is incorporated into the Bayesian framework. Appropriate symmetry factors on sparse architectures are deduced. Bayesian weight decay is demonstrated using artificial data generated by a sparsely connected network. Pruning yields computational advantages: by removing unimportant weights the posterior weight distribution becomes Gaussian, and pruning removes zeromodes of the Hessian and redundant hidden units. In addition, pruning improves generalisation. The Bayesian evidence is used as a stop criterion for pruning. Bayesian backprop is applied ...
Bayesian Backpropagation Over IO Functions Rather Than Weights
 Advances in Neural Information Processing Systems 6
, 1994
"... 1 INTRODUCTION In the conventional Bayesian view of backpropagation (BP) (Buntine and Weigend, 1991; Nowlan and Hinton, 1994; MacKay, 1992; Wolpert, 1993), one starts with the "likelihood" conditional distribution P(training set = t  weight vector w) and the "prior" distributio ..."
Abstract

Cited by 15 (5 self)
 Add to MetaCart
(Show Context)
1 INTRODUCTION In the conventional Bayesian view of backpropagation (BP) (Buntine and Weigend, 1991; Nowlan and Hinton, 1994; MacKay, 1992; Wolpert, 1993), one starts with the "likelihood" conditional distribution P(training set = t  weight vector w) and the "prior" distribution P(w). As an example, in regression one might have a "Gaussian likelihood", P(t  w) µ exp[c 2 (w, t)] º P i exp [{net(w, t X (i))  t y (i)} 2 / 2s 2 ] for some constant s. (t X (i) and t Y (i) are the successive input and output values in the training set respectively, and net(w, .) is the function, induced by w, taking input neuron values to output neuron values.) As another example, the "weight decay" (Gaussian) prior is P(w) µ exp(a(w 2 )) for some constant a. Bayes' theorem tells us that P(w  t) µ P(t  w) P(w). Accordingly, the most probable weight given the data  the "maximum a posteriori" (MAP) w  is the mode over w of P(t  w) P(w), which equals the mode over w of the "cost function" ...
Bayesian Methods for Neural Networks: Theory and Applications
, 1995
"... this document. Before these are discussed however, perhaps we should have a tutorial on Bayesian probability theory and its application to model comparison problems. 2 Probability theory and Occam's razor ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
this document. Before these are discussed however, perhaps we should have a tutorial on Bayesian probability theory and its application to model comparison problems. 2 Probability theory and Occam's razor
What Bayes Has To Say About The Evidence Procedure
 Maximum Entropy and Bayesian Methods Conference
, 1994
"... The "evidence" procedure for setting hyperparameters is essentially the same as the techniques of MLII and generalized maximum likelihood. Unlike those older techniques however, the evidence procedure has been justified (and used) as an approximation to the hierarchical Bayesian calculati ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
(Show Context)
The "evidence" procedure for setting hyperparameters is essentially the same as the techniques of MLII and generalized maximum likelihood. Unlike those older techniques however, the evidence procedure has been justified (and used) as an approximation to the hierarchical Bayesian calculation. We use several examples to explore the validity of this justification. Then we derive upper and (often large) lower bounds on the difference between the evidence procedure's answer and the hierarchical Bayesian answer, for many different quantities. We also touch on subjects like the close relationship between the evidence procedure and maximum likelihood, and the selfconsistency of deriving priors by "firstprinciples" arguments that don't set the values of hyperparameters. "... any inference must be based on strict adherence to the laws of probability theory, because any deviation automatically leads to inconsistency."  S. Gull, in [5] "(Some have) estimated alpha from the data and then procee...