## A Review of Bayesian Neural Networks with an Application to Near Infrared Spectroscopy (1995)

Venue: | IEEE Transactions on Neural Networks |

Citations: | 36 - 0 self |

### BibTeX

@ARTICLE{Thodberg95areview,

author = {Hans Henrik Thodberg},

title = {A Review of Bayesian Neural Networks with an Application to Near Infrared Spectroscopy},

journal = {IEEE Transactions on Neural Networks},

year = {1995},

volume = {7},

pages = {56--72}

}

### Years of Citing Articles

### OpenURL

### Abstract

MacKay's Bayesian framework for backpropagation is a practical and powerful means to improve the generalisation ability of neural networks. It is based on a Gaussian approximation to the posterior weight distribution. The framework is extended, reviewed and demonstrated in a pedagogical way. The notation is simplified using the ordinary weight decay parameter, and a detailed and explicit procedure for adjusting several weight decay parameters is given. Bayesian backprop is applied in the prediction of fat content in minced meat from near infrared spectra. It outperforms "early stopping" as well as quadratic regression. The evidence of a committee of differently trained networks is computed, and the corresponding improved generalisation is verified. The error bars on the predictions of the fat content are computed. There are three contributors: The random noise, the uncertainty in the weights, and the deviation among the committee members. The Bayesian framework is compare...

### Citations

1236 |
Statistical decision theory and Bayesian analysis (2nd ed
- Berger
- 1980
(Show Context)
Citation Context ...., and the approach is both theoretical and applied. However, the Maximum Entropy principle plays hardly any role in Bayesian backprop. Bayesian backprop is based on the Bayesian school of statistics =-=[7, 8]. This is distinct f-=-rom mainstream "sampling theory" (or "frequentist") statistics, where the concept of probability must be attached to frequencies of samples drawn from a distribution. In contrast, ... |

699 |
Numerical Recipes
- Press, Flannery, et al.
- 1986
(Show Context)
Citation Context ...Hessian is symmetrised by replacing B by 1 2 (B + B T ) (The asymmetry of the Hessian is an efficient way to check the precision of the computation). Finally B is diagonalised using the Jacobi method =-=[18]-=-, which requires approximately 24k 3 floating-point operations. This renders the recipe unpractical for networks considerably larger than 1000 connections 8 . 9. Compute the number of well-determined ... |

608 | Bayesian Learning for Neural Networks
- Neal
- 1996
(Show Context)
Citation Context ...l number of hyperparameters. This controversy has perhaps distracted from consideration of other problems with Gaussian approximation methods that are in my opinion more significant. " (Page 23-2=-=4 in [12]-=-). Neal replaces the Gaussian approximation with a Monte Carlo method. In this paper, on the other hand, we pursue MacKay's method but in accordance with Neal, we keep an open eye on the Gaussian appr... |

520 | Bayesian interpolation
- MacKay
- 1992
(Show Context)
Citation Context ... validation set is used. Hence all available data can be used for training, which gives better models. 1.2 The Proper Treatment of Bayesian Backprop Bayesian backprop was introduced by MacKay in 1991 =-=[2, 3, 4, 5]-=- as a radically different approach to the problem of overfitting and model comparison. The Bayesian framework for backprop originated in the field of Maximum Entropy [6] which develops better models f... |

399 | A practical Bayesian framework for backpropagation networks
- MacKay
- 1992
(Show Context)
Citation Context ... validation set is used. Hence all available data can be used for training, which gives better models. 1.2 The Proper Treatment of Bayesian Backprop Bayesian backprop was introduced by MacKay in 1991 =-=[2, 3, 4, 5]-=- as a radically different approach to the problem of overfitting and model comparison. The Bayesian framework for backprop originated in the field of Maximum Entropy [6] which develops better models f... |

324 | Information-based objective functions for active data selection
- MacKay
- 1992
(Show Context)
Citation Context ...ring training. Generality. The Bayesian framework is more general than the GPE. It can be used to determine error bars on each prediction and it applies to classification problems and active learning =-=[4, 5]-=-. 6 Practical Comments This section discusses details pertaining to the practical application of Bayesian backprop. 6.1 Checking the Model It is important to check the distribution of the weights and ... |

169 | The Effective Number of Parameters: An Analysis of generalization and regularization in nonlinear learning systems
- Moody
- 1992
(Show Context)
Citation Context ...ed that the size of the residuals are independent of the predicted fat value on C, M and T [19]. The residuals are consistent with the error bars as shown in figure 3 and table 3. 5 Moody's GPE Moody =-=[21]-=- proposed an estimator of the generalisation error for neural networks. Being a generalisation of Akaike's Final Prediction Error it is called the Generalised Prediction Error. GPE predicts the test e... |

153 | The evidence framework applied to classification networks
- MacKay
- 1992
(Show Context)
Citation Context ... validation set is used. Hence all available data can be used for training, which gives better models. 1.2 The Proper Treatment of Bayesian Backprop Bayesian backprop was introduced by MacKay in 1991 =-=[2, 3, 4, 5]-=- as a radically different approach to the problem of overfitting and model comparison. The Bayesian framework for backprop originated in the field of Maximum Entropy [6] which develops better models f... |

149 | Bayesian Methods for Adaptive Models
- MacKay
- 1991
(Show Context)
Citation Context ...pends on the output or input. A possible solution is to transform the output variable or to introduce a noise level which depends on the output. A treatment of input dependent noise level is given in =-=[23]-=-. 6.2 Testing for Non-linearities Neural networks are well suited to test whether a data set defines a linear or a nonlinear regression. Linear models and neural networks are trained and the evidences... |

47 |
Exact calculation of the hessian matrix for the multilayer perceptron
- Bishop
- 1992
(Show Context)
Citation Context ...h log 2 for each hidden layer. Negative eigenvalues are left out. 11. Steps 7 to 10 are repeated 5 times. 7 The Hessian can be evaluated analytically using an extension to backprop developed by Bishop=-=[17]-=-. This involves some elaborate programming but then the Hessian is evaluated in just h epochs. 8 This is according to the following argument: Assume that the number of cases is proportional to the num... |

36 |
Bayesian nonlinear modeling for the energy prediction competition
- Mackay
- 1993
(Show Context)
Citation Context ... the component, and in this section we improve the model by incorporating this knowledge into the model prior. This is done using Automatic Relevance Determination (ARD) introduced by MacKay and Neal =-=[24, 12]-=-. ARD is simply a special case of the Bayesian framework, where each input corresponds to a weight group containing the weights from this input to the hidden layer. The model automatically adjusts the... |

21 | On the use of evidence in neural networks
- Wolpert
- 1993
(Show Context)
Citation Context ...an approximation to the true answer, which would be obtained by integrating over the hyperparameters as well as the parameters, but experience has shown that it is often a good approximation. Wolpert =-=[10]-=- criticizes the use of this procedure for neural networks on the grounds that by analytically integrating over the hyperparameters, in the manner of Buntine and Weigend, one can obtain the relative po... |

21 | Bayesian model comparison and backprop nets
- DHAENE, MacKay
(Show Context)
Citation Context ...n. Properties. GPE has a simpler structure than the evidence. It does not involve the sometimes ill-defined determinant of the Hessian. The GPE scales differently from the evidence with fl and N (see =-=[22]-=-) Use in Regularisation. We used GPE for networks which were already regularised with the Bayesian method. It has not yet been demonstrated that 26 Square root of GPE 0.5 0.6 0.7 0.8 0.9 1 hidden unit... |

17 |
Improving generalization on neural networks through pruning
- Thodberg
- 1991
(Show Context)
Citation Context ...etwork for T=6 epochs, i.e. perform a gradient descent towards a minimum of C. An epoch is one traversal of the training set; we typically use T = 10; 000. The learning rate was set dynamically as in =-=[16]-=-. 6. Initialise the number of well-determined parameters: fl g = 9 10 k g for g = 1; : : : ; G. 7. Train the network for T/6 epochs. The weight decay parameters are re--estimated according to (19) aft... |

5 |
Rumelhart: "Prediction the future: A Connectionist Approach", Int
- Weigend, Huberman, et al.
- 1990
(Show Context)
Citation Context ...work with the best error on M was selected. This training technique, denoted "early stopping", is rather successful for many applications, as it is an easy way to cope with the problem of ov=-=erfitting [20]-=-. The optimal architecture 21 Table 1: The spectroscopic data sets and their use when training networks with early stopping Data set Use Number of cases C Training 129 M Monitoring 43 T Testing 43 E1 ... |

3 |
P.Salamon, "Neural Network Ensembles
- Hansen
- 1990
(Show Context)
Citation Context ...ons. 6 . The committee gives two advantages. ffl The predictions of the committee, i.e. the average prediction of the committee gives a better generalisation than the average network in the committee =-=[14]-=-. 6 The uncertainty on the log evidence due to zero-modes mentioned in footnote 5 is another way to estimate oe. 14 ffl The degree of dissent within the committee contributes to the uncertainty of the... |

2 |
Bayesian Learning via Stochastic Dynamics", Neural Information Processing Systems, Vol.5 ed. C.L.Giles, S.J.Hanson and J.D.Cowan
- Neal
- 1993
(Show Context)
Citation Context ... predicted quantity at the mode (the maximum) is not always a good approximation to the integral over the posterior distribution. To overcome these problems Neal has developed a Monte Carlo technique =-=[15]-=-. ffl The evidence as a quality measure could reflect a mixture of virtues, of which the generalisation error is just one. Other virtues could be correct architecture, explanatory power or correct est... |

2 |
H.H.Thodberg, "Optimal Minimal Neural Interpretation of Spectra", Analytic Chemistry 64
- Borggaard
- 1992
(Show Context)
Citation Context ...at industry. The data were recorded by a Tecator near-infrared spectrometer (the Infratec Food and Feed Analyzer) which measured the spectrum of light transmitted through a sample of minced pork meat =-=[19]-=-. The spectrum consists af the absorbances at 100 wavelengths in the region 850-1050 nm. We want to calibrate the spectrometer to determine the fat content from the spectrum. The target values of the ... |

1 |
A Bayesian Approach to Pruning of Neural Networks", submitted to IEEE Trans. of Neural Networks
- Thodberg
- 1993
(Show Context)
Citation Context ...nodes, layers and inputs can be compared and ranked according to the evidence. The evidence can also be used as a stop criterion for network growing or pruning. Pruning is treated in a separate paper =-=[1]-=-. ffl The instrument of the Bayesian analysis described in the above two items is the training data and the network itself. No separate validation set is used. Hence all available data can be used for... |

1 |
Hyperparameters: Optimise or integrate out?", Maximum entropy and Bayesian
- MacKay
- 1993
(Show Context)
Citation Context ...of the task. The posterior probability densities for different parameter values are, in themselves, of no interest - all that matters is how well 4 the predictive distribution is approximated. MacKay =-=[11]-=- shows that in approximating this predictive distribution, it is more important to integrate over the large number of parameters in the network than over the typically small number of hyperparameters.... |