• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Bayesian neural networks for classification: How useful is the evidence framework (1998)

by S J Roberts, W D Penny
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 14
Next 10 →

Bayesian Approach for Neural Networks - Review and Case Studies

by Jouko Lampinen, Aki Vehtari - Neural Networks , 2001
"... We give a short review on the Bayesian approach for neural network learning and demonstrate the advantages of the approach in three real applications. We discuss the Bayesian approach with emphasis on the role of prior knowledge in Bayesian models and in classical error minimization approaches. The ..."
Abstract - Cited by 16 (9 self) - Add to MetaCart
We give a short review on the Bayesian approach for neural network learning and demonstrate the advantages of the approach in three real applications. We discuss the Bayesian approach with emphasis on the role of prior knowledge in Bayesian models and in classical error minimization approaches. The generalization capability of a statistical model, classical or Bayesian, is ultimately based on the prior assumptions. The Bayesian approach permits propagation of uncertainty in quantities which are unknown to other assumptions in the model, which may be more generally valid or easier to guess in the problem. The case problems studied in this paper include a regression, a classification, and an inverse problem. In the most thoroughly analyzed regression problem, the best models were those with less restrictive priors. This emphasizes the major advantage of the Bayesian approach, that we are not forced to guess attributes that are unknown, such as the number of degrees of freedom in the model, non-linearity of the model with respect to each input variable, or the exact form for the distribution of the model residuals.

Robust Full Bayesian Learning for Neural Networks

by Christophe Andrieu, Nando de Freitas, Arnaud Doucet, Jfg De Freitas, A Doucet , 1999
"... In this paper, we propose a hierarchical full Bayesian model for neural networks. This model treats the model dimension (number of neurons), model parameters, regularisation parameters and noise parameters as random variables that need to be estimated. We develop a reversible jump Markov chain Monte ..."
Abstract - Cited by 11 (8 self) - Add to MetaCart
In this paper, we propose a hierarchical full Bayesian model for neural networks. This model treats the model dimension (number of neurons), model parameters, regularisation parameters and noise parameters as random variables that need to be estimated. We develop a reversible jump Markov chain Monte Carlo (MCMC) method to perform the necessary computations. We find that the results obtained using this method are not only better than the ones reported previously, but also appear to be robust with respect to the prior specification. In addition, we propose a novel and computationally efficient reversible jump MCMC simulated annealing algorithm to optimise neural networks. This algorithm enables us to maximise the joint posterior distribution of the network parameters and the number of basis function. It performs a global search in the joint space of the parameters and number of parameters, thereby surmounting the problem of local minima. We show that by calibrating the full hierarchical ...

An Empirical Evaluation of Bayesian Sampling with Hybrid Monte Carlo for Training Neural Network Classifiers

by Dirk Husmeier, William D. Penny, Stephen J. Roberts - Neural Networks , 1998
"... This article gives a concise overview of Bayesian sampling for neural networks, and then presents an extensive evaluation on a set of various benchmark classification problems. The main objective is to study the sensitivity of this scheme to changes in the prior distribution of the parameters and hy ..."
Abstract - Cited by 10 (4 self) - Add to MetaCart
This article gives a concise overview of Bayesian sampling for neural networks, and then presents an extensive evaluation on a set of various benchmark classification problems. The main objective is to study the sensitivity of this scheme to changes in the prior distribution of the parameters and hyperparameters, and to evaluate the efficiency of the so-called automatic relevance determination (ARD) method. The paper concludes with a comparison of the achieved classification results with those obtained with (i) the evidence scheme and (ii) with non-Bayesian methods. Keywords Bayesian statistics, prior and posterior distribution, parameters and hyperparameters, Gibbs sampling, hybrid Monte Carlo, automatic relevance determination (ARD), evidence approximation, classification problems, benchmarking. 1 Theory: Sampling of network weights and hyperparameters from the posterior distribution The objective of this section is to give a concise yet self-contained overview of the Bayesian app...

The EM Algorithm and Neural Networks for Nonlinear State Space Extimation

by J F G de Freitas, M Niranjan, A H Gee , 1999
"... In this paper, we derive an EM algorithm for nonlinear state space models. We use it to estimate jointly the neural network weights, the model uncertainty and the noise in the data. In the E-step we apply a forward-backward Rauch-Tung-Striebel smoother to compute the network weights. For the M-step, ..."
Abstract - Cited by 9 (5 self) - Add to MetaCart
In this paper, we derive an EM algorithm for nonlinear state space models. We use it to estimate jointly the neural network weights, the model uncertainty and the noise in the data. In the E-step we apply a forward-backward Rauch-Tung-Striebel smoother to compute the network weights. For the M-step, we derive expressions to compute the model uncertainty and the measurement noise. We find that the method is intrinsically very powerful, simple, elegant and stable.

A COMPARISON OF STATE-OF-THE-ART CLASSIFICATION TECHNIQUES FOR EXPERT AUTOMOBILE INSURANCE CLAIM FRAUD DETECTION

by Stijn Viaene, Richard A. Derrig, Bart Baesens, Guido Dedene , 2002
"... Several state-of-the-art binary classification techniques are experimentally evaluated in the context of expert automobile insurance claim fraud detection. The predictive power of logistic regression, C4.5 decision tree, k-nearest neighbor, Bayesian learning multilayer perceptron neural network, lea ..."
Abstract - Cited by 7 (2 self) - Add to MetaCart
Several state-of-the-art binary classification techniques are experimentally evaluated in the context of expert automobile insurance claim fraud detection. The predictive power of logistic regression, C4.5 decision tree, k-nearest neighbor, Bayesian learning multilayer perceptron neural network, least-squares support vector machine, naive Bayes, and tree-augmented naive Bayes classification is contrasted. For most of these algorithm types, we report on several operationalizations using alternative hyperparameter or design choices. We compare these in terms of mean percentage correctly classified (PCC) and mean area under the receiver operating characteristic (AUROC) curve using a stratified, blocked, ten-fold cross-validation experiment. We also contrast algorithm type performance visually by means of the convex hull of the receiver operating characteristic (ROC) curves associated with the alternative operationalizations per algorithm type. The study is based on a data set of 1,399 personal injury protection claims from 1993 accidents collected by the Automobile Insurers Bureau of Massachusetts. To stay as close to real-life operating conditions as possible, we consider only predictors that are known relatively early in the life of a claim. Furthermore, based on the qualification of each available claim by both a verbal expert assessment of suspicion of fraud and a ten-point-scale expert suspicion score, we can

Classification With Sparse Grids Using Simplicial Basis Functions

by Jochen Garcke, Michael Griebel , 2002
"... Recently we presented a new approach [20] to the classification problem arising in data mining. It is based on the regularization network approach but in contrast to other methods, which employ ansatz functions associated to data points, we use a grid in the usually high-dimensional feature space fo ..."
Abstract - Cited by 6 (6 self) - Add to MetaCart
Recently we presented a new approach [20] to the classification problem arising in data mining. It is based on the regularization network approach but in contrast to other methods, which employ ansatz functions associated to data points, we use a grid in the usually high-dimensional feature space for the minimization process. To cope with the curse of dimensionality, we employ sparse grids [52]. Thus, only O(h 1 n n d 1 ) instead of O(h d n ) grid points and unknowns are involved. Here d denotes the dimension of the feature space and hn = 2 n gives the mesh size. We use the sparse grid combination technique [30] where the classification problem is discretized and solved on a sequence of conventional grids with uniform mesh sizes in each dimension. The sparse grid solution is then obtained by linear combination. The method computes a nonlinear classifier but scales only linearly with the number of data points and is well suited for data mining applications where the amount of data is very large, but where the dimension of the feature space is moderately high. In contrast to our former work, where d-linear functions were used, we now apply linear basis functions based on a simplicial discretization. This allows to handle more dimensions and the algorithm needs less operations per data point. We further extend the method to so-called anisotropic sparse grids, where now different a-priori chosen mesh sizes can be used for the discretization of each attribute. This can improve the run time of the method and the approximation results in the case of data sets with different importance of the attributes. We describe the sparse grid combination technique for the classification problem, give implementational details and discuss the complexity of the algorithm. It turns out that...

Nonlinear State Space Estimation With Neural Networks And The Em Algorithm

by Nando de Freitas, Mahesan Niranjan, Andrew Gee, Jfg De Freitas, M Niranjan, Ah Gee , 1999
"... In this paper, we derive an EM algorithm for nonlinear state space models. We use it to estimate jointly the neural network weights, the model uncertainty and the noise in the data. In the E-step we apply a forward-backward Rauch-Tung-Striebel smoother to compute the network weights. For the M-step, ..."
Abstract - Cited by 5 (0 self) - Add to MetaCart
In this paper, we derive an EM algorithm for nonlinear state space models. We use it to estimate jointly the neural network weights, the model uncertainty and the noise in the data. In the E-step we apply a forward-backward Rauch-Tung-Striebel smoother to compute the network weights. For the M-step, we derive expressions to compute the model uncertainty and the measurement noise. We find that the method is intrinsically very powerful, simple, elegant and stable. i Contents 1 Introduction 1 2 Background 2 3 Nonlinear State Space Modelling 2 4 Inference with MLPs and extended Kalman smoothing 3 4.1 The extended Kalman smoother . . . . . . . . . . . . . . . . . . . . . . . . . . 3 4.2 Training MLPs with the EKF . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 5 The EM algorithm 6 6 The EM algorithm for nonlinear state space models 8 6.1 Mathematical preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 6.2 Computing the expectation of the log-likelihood ...

Empirical Evaluation of Bayesian Sampling for Neural Classifiers

by Dirk Husmeier, William D. Penny, Stephen J. Roberts - 1CANN'98: Proceedings of the 8th International Conference on Artificial Neural Networks , 1998
"... Adopting a Bayesian approach and sampling the network parameters from their posterior distribution is a rather novel and promising method for improving the generalisation performance of neural network predictors. The present empirical study applies this scheme to a set of different synthetic and rea ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
Adopting a Bayesian approach and sampling the network parameters from their posterior distribution is a rather novel and promising method for improving the generalisation performance of neural network predictors. The present empirical study applies this scheme to a set of different synthetic and real-world classification problems. The paper focuses on the dependence of the prediction results on the prior distribution of the network parameters and hyperparameters, and provides a critical evaluation of the automatic relevance determination (ARD) scheme for detecting irrelevant inputs. 1 Introduction Consider a K-fold classification problem, where an m-dimensional feature vector x t is assigned to one of K classes fC 1 ; : : : ; CK g indicated by a label vector y t = (y 1 t ; : : : ; y K t ); y k t = 1 if x t 2 C k ; y k t = 0 if x t 62 C k ; ky t k = 1: For a neural network (NN) with K softmax 1 units in the final layer, the network outputs f k (x t ; w) 2 [0; 1] can be interp...

Gauss-Markov-Potts Priors for Images in Computer Tomography Resulting to Joint Reconstruction and segmentation

by Ali Mohammad-djafari , 2007
"... In many applications of Computed Tomography (CT), we may know that the object under the test is composed of a finite number of materials meaning that the images to be reconstructed are composed of a finite number of homogeneous area. To account for this prior knowledge, we propose a family of Gauss- ..."
Abstract - Cited by 2 (2 self) - Add to MetaCart
In many applications of Computed Tomography (CT), we may know that the object under the test is composed of a finite number of materials meaning that the images to be reconstructed are composed of a finite number of homogeneous area. To account for this prior knowledge, we propose a family of Gauss-Markov fields with hidden Potts label fields. Then, using these models in a Bayesian inference framework, we are able to jointly reconstruct the images and segment them in an optimal way. In this paper, we first present these prior models, then propose appropriate MCMC or variational methods to compute the mean posterior estimators. We finally show a few results showing the efficiency of the proposed methods for CT with limited angle and number of projections. Keywords: Computed Tomography; Gauss-Markov-Potts Priors; Bayesian computation; MCMC; Joint Segmentation and Reconstruction 1 This discretized presentation of CT, gives the possibility to analyse the most classical methods of image reconstruction [3, 4]. For example, it is very easy to see that the solution ̂f = H t g = ∑ l H t l gl (5) corresponds to the classical Backprojection (BP) and the minimum norm solution of Hf = g: ̂f = H t (HH t) −1 g = ∑ l H t l (HlH t l) −1 gl (6) can be identified to the classical Filtered Backprojection (FBP) and the least squares (LS) solution ̂f = (H t H) −1 H t g (7) can be identified to the Backprojection and Filtering (BPF). Also, defining the LS criterion

Information fusion for subband-HMM speaker recognition

by J. E. Higgins, R. I. Damper, T. J. Dodd, Southampton So Lbj - In: Proc
"... Previous work has demonstrated the performance gains that can be obtained in speaker recognition by apply-ing subband processing, together with hidden Markov modelling and multiple classifier recombination. Two recombination rules have been investigated: the sum of log likelihoods, which corresponds ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Previous work has demonstrated the performance gains that can be obtained in speaker recognition by apply-ing subband processing, together with hidden Markov modelling and multiple classifier recombination. Two recombination rules have been investigated: the sum of log likelihoods, which corresponds to the optimal Bayes ’ rule under certain constraints, and multilayer perceptrons (MLP), which are not subject to these con-straints. It was found that for two spoken digits in the presence of a single case of narrowband noise the sum of log likelihoods and MLP achieved comparable per-formance. In this paper, the previous work is extended in the direction of investigating the robustness of the recognition system to different narrowband noise. Two approaches are taken towards this aim. Firstly, nar-rowband noise is added at different centre frequencies. Secondly, a Bayesian MLP approach is investigated us-ing automatic relevance determination (ARD) on the subband inputs to the MLP. From this it is possi-ble to assess the relative importance of the subbands to recognition performance. Results for the new noise conditions show that the sum of log likelihoods gener-ally does better than the (average) MLP fusion. 1
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University