Bayesian Regularisation and Pruning using a Laplace Prior (1994)
| Venue: | Neural Computation |
| Citations: | 12 - 0 self |
BibTeX
@ARTICLE{Williams94bayesianregularisation,
author = {Peter M. Williams},
title = {Bayesian Regularisation and Pruning using a Laplace Prior},
journal = {Neural Computation},
year = {1994},
volume = {7},
pages = {117--143}
}
Years of Citing Articles
OpenURL
Abstract
Standard techniques for improved generalisation from neural networks include weight decay and pruning. Weight decay has a Bayesian interpretation with the decay function corresponding to a prior over weights. The method of transformation groups and maximum entropy indicates a Laplace rather than a Gaussian prior. After training, the weights then arrange themselves into two classes: (1) those with a common sensitivity to the data error (2) those failing to achieve this sensitivity and which therefore vanish. Since the critical value is determined adaptively during training, pruning---in the sense of setting weights to exact zeros---becomes a consequence of regularisation alone. The count of free parameters is also reduced automatically as weights are pruned. A comparison is made with results of MacKay using the evidence framework and a Gaussian regulariser. 1 Introduction Neural networks designed for regression or classification need to be trained using some form of stabilisation or re...







