Results 1  10
of
18
Bayesian Approach for Neural Networks  Review and Case Studies
 Neural Networks
, 2001
"... We give a short review on the Bayesian approach for neural network learning and demonstrate the advantages of the approach in three real applications. We discuss the Bayesian approach with emphasis on the role of prior knowledge in Bayesian models and in classical error minimization approaches. The ..."
Abstract

Cited by 18 (9 self)
 Add to MetaCart
We give a short review on the Bayesian approach for neural network learning and demonstrate the advantages of the approach in three real applications. We discuss the Bayesian approach with emphasis on the role of prior knowledge in Bayesian models and in classical error minimization approaches. The generalization capability of a statistical model, classical or Bayesian, is ultimately based on the prior assumptions. The Bayesian approach permits propagation of uncertainty in quantities which are unknown to other assumptions in the model, which may be more generally valid or easier to guess in the problem. The case problems studied in this paper include a regression, a classification, and an inverse problem. In the most thoroughly analyzed regression problem, the best models were those with less restrictive priors. This emphasizes the major advantage of the Bayesian approach, that we are not forced to guess attributes that are unknown, such as the number of degrees of freedom in the model, nonlinearity of the model with respect to each input variable, or the exact form for the distribution of the model residuals.
Robust Full Bayesian Learning for Neural Networks
, 1999
"... In this paper, we propose a hierarchical full Bayesian model for neural networks. This model treats the model dimension (number of neurons), model parameters, regularisation parameters and noise parameters as random variables that need to be estimated. We develop a reversible jump Markov chain Monte ..."
Abstract

Cited by 12 (9 self)
 Add to MetaCart
In this paper, we propose a hierarchical full Bayesian model for neural networks. This model treats the model dimension (number of neurons), model parameters, regularisation parameters and noise parameters as random variables that need to be estimated. We develop a reversible jump Markov chain Monte Carlo (MCMC) method to perform the necessary computations. We find that the results obtained using this method are not only better than the ones reported previously, but also appear to be robust with respect to the prior specification. In addition, we propose a novel and computationally efficient reversible jump MCMC simulated annealing algorithm to optimise neural networks. This algorithm enables us to maximise the joint posterior distribution of the network parameters and the number of basis function. It performs a global search in the joint space of the parameters and number of parameters, thereby surmounting the problem of local minima. We show that by calibrating the full hierarchical ...
An Empirical Evaluation of Bayesian Sampling with Hybrid Monte Carlo for Training Neural Network Classifiers
 Neural Networks
, 1998
"... This article gives a concise overview of Bayesian sampling for neural networks, and then presents an extensive evaluation on a set of various benchmark classification problems. The main objective is to study the sensitivity of this scheme to changes in the prior distribution of the parameters and hy ..."
Abstract

Cited by 12 (4 self)
 Add to MetaCart
This article gives a concise overview of Bayesian sampling for neural networks, and then presents an extensive evaluation on a set of various benchmark classification problems. The main objective is to study the sensitivity of this scheme to changes in the prior distribution of the parameters and hyperparameters, and to evaluate the efficiency of the socalled automatic relevance determination (ARD) method. The paper concludes with a comparison of the achieved classification results with those obtained with (i) the evidence scheme and (ii) with nonBayesian methods. Keywords Bayesian statistics, prior and posterior distribution, parameters and hyperparameters, Gibbs sampling, hybrid Monte Carlo, automatic relevance determination (ARD), evidence approximation, classification problems, benchmarking. 1 Theory: Sampling of network weights and hyperparameters from the posterior distribution The objective of this section is to give a concise yet selfcontained overview of the Bayesian app...
A COMPARISON OF STATEOFTHEART CLASSIFICATION TECHNIQUES FOR EXPERT AUTOMOBILE INSURANCE CLAIM FRAUD DETECTION
, 2002
"... Several stateoftheart binary classification techniques are experimentally evaluated in the context of expert automobile insurance claim fraud detection. The predictive power of logistic regression, C4.5 decision tree, knearest neighbor, Bayesian learning multilayer perceptron neural network, lea ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
Several stateoftheart binary classification techniques are experimentally evaluated in the context of expert automobile insurance claim fraud detection. The predictive power of logistic regression, C4.5 decision tree, knearest neighbor, Bayesian learning multilayer perceptron neural network, leastsquares support vector machine, naive Bayes, and treeaugmented naive Bayes classification is contrasted. For most of these algorithm types, we report on several operationalizations using alternative hyperparameter or design choices. We compare these in terms of mean percentage correctly classified (PCC) and mean area under the receiver operating characteristic (AUROC) curve using a stratified, blocked, tenfold crossvalidation experiment. We also contrast algorithm type performance visually by means of the convex hull of the receiver operating characteristic (ROC) curves associated with the alternative operationalizations per algorithm type. The study is based on a data set of 1,399 personal injury protection claims from 1993 accidents collected by the Automobile Insurers Bureau of Massachusetts. To stay as close to reallife operating conditions as possible, we consider only predictors that are known relatively early in the life of a claim. Furthermore, based on the qualification of each available claim by both a verbal expert assessment of suspicion of fraud and a tenpointscale expert suspicion score, we can
The EM Algorithm and Neural Networks for Nonlinear State Space Extimation
, 1999
"... In this paper, we derive an EM algorithm for nonlinear state space models. We use it to estimate jointly the neural network weights, the model uncertainty and the noise in the data. In the Estep we apply a forwardbackward RauchTungStriebel smoother to compute the network weights. For the Mstep, ..."
Abstract

Cited by 11 (6 self)
 Add to MetaCart
In this paper, we derive an EM algorithm for nonlinear state space models. We use it to estimate jointly the neural network weights, the model uncertainty and the noise in the data. In the Estep we apply a forwardbackward RauchTungStriebel smoother to compute the network weights. For the Mstep, we derive expressions to compute the model uncertainty and the measurement noise. We find that the method is intrinsically very powerful, simple, elegant and stable.
Classification With Sparse Grids Using Simplicial Basis Functions
, 2002
"... Recently we presented a new approach [20] to the classification problem arising in data mining. It is based on the regularization network approach but in contrast to other methods, which employ ansatz functions associated to data points, we use a grid in the usually highdimensional feature space fo ..."
Abstract

Cited by 7 (7 self)
 Add to MetaCart
Recently we presented a new approach [20] to the classification problem arising in data mining. It is based on the regularization network approach but in contrast to other methods, which employ ansatz functions associated to data points, we use a grid in the usually highdimensional feature space for the minimization process. To cope with the curse of dimensionality, we employ sparse grids [52]. Thus, only O(h 1 n n d 1 ) instead of O(h d n ) grid points and unknowns are involved. Here d denotes the dimension of the feature space and hn = 2 n gives the mesh size. We use the sparse grid combination technique [30] where the classification problem is discretized and solved on a sequence of conventional grids with uniform mesh sizes in each dimension. The sparse grid solution is then obtained by linear combination. The method computes a nonlinear classifier but scales only linearly with the number of data points and is well suited for data mining applications where the amount of data is very large, but where the dimension of the feature space is moderately high. In contrast to our former work, where dlinear functions were used, we now apply linear basis functions based on a simplicial discretization. This allows to handle more dimensions and the algorithm needs less operations per data point. We further extend the method to socalled anisotropic sparse grids, where now different apriori chosen mesh sizes can be used for the discretization of each attribute. This can improve the run time of the method and the approximation results in the case of data sets with different importance of the attributes. We describe the sparse grid combination technique for the classification problem, give implementational details and discuss the complexity of the algorithm. It turns out that...
Nonlinear State Space Estimation With Neural Networks And The Em Algorithm
, 1999
"... In this paper, we derive an EM algorithm for nonlinear state space models. We use it to estimate jointly the neural network weights, the model uncertainty and the noise in the data. In the Estep we apply a forwardbackward RauchTungStriebel smoother to compute the network weights. For the Mstep, ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
In this paper, we derive an EM algorithm for nonlinear state space models. We use it to estimate jointly the neural network weights, the model uncertainty and the noise in the data. In the Estep we apply a forwardbackward RauchTungStriebel smoother to compute the network weights. For the Mstep, we derive expressions to compute the model uncertainty and the measurement noise. We find that the method is intrinsically very powerful, simple, elegant and stable. i Contents 1 Introduction 1 2 Background 2 3 Nonlinear State Space Modelling 2 4 Inference with MLPs and extended Kalman smoothing 3 4.1 The extended Kalman smoother . . . . . . . . . . . . . . . . . . . . . . . . . . 3 4.2 Training MLPs with the EKF . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 5 The EM algorithm 6 6 The EM algorithm for nonlinear state space models 8 6.1 Mathematical preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 6.2 Computing the expectation of the loglikelihood ...
GaussMarkovPotts Priors for Images in Computer Tomography Resulting to Joint Reconstruction and segmentation
, 2007
"... In many applications of Computed Tomography (CT), we may know that the object under the test is composed of a finite number of materials meaning that the images to be reconstructed are composed of a finite number of homogeneous area. To account for this prior knowledge, we propose a family of Gauss ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
In many applications of Computed Tomography (CT), we may know that the object under the test is composed of a finite number of materials meaning that the images to be reconstructed are composed of a finite number of homogeneous area. To account for this prior knowledge, we propose a family of GaussMarkov fields with hidden Potts label fields. Then, using these models in a Bayesian inference framework, we are able to jointly reconstruct the images and segment them in an optimal way. In this paper, we first present these prior models, then propose appropriate MCMC or variational methods to compute the mean posterior estimators. We finally show a few results showing the efficiency of the proposed methods for CT with limited angle and number of projections. Keywords: Computed Tomography; GaussMarkovPotts Priors; Bayesian computation; MCMC; Joint Segmentation and Reconstruction 1 This discretized presentation of CT, gives the possibility to analyse the most classical methods of image reconstruction [3, 4]. For example, it is very easy to see that the solution ̂f = H t g = ∑ l H t l gl (5) corresponds to the classical Backprojection (BP) and the minimum norm solution of Hf = g: ̂f = H t (HH t) −1 g = ∑ l H t l (HlH t l) −1 gl (6) can be identified to the classical Filtered Backprojection (FBP) and the least squares (LS) solution ̂f = (H t H) −1 H t g (7) can be identified to the Backprojection and Filtering (BPF). Also, defining the LS criterion
Joint NDT image restoration and segmentation using Gauss–Markov– Potts prior models and variational bayesian computation
 IEEE Transactions on Image Processing
, 2010
"... In this paper, we propose a method to simultaneously restore and to segment piecewise homogenous images degraded by a known point spread function (PSF) and additive noise. For this purpose, we propose a family of nonhomogeneous GaussMarkov fields with Potts region labels model for images to be use ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
In this paper, we propose a method to simultaneously restore and to segment piecewise homogenous images degraded by a known point spread function (PSF) and additive noise. For this purpose, we propose a family of nonhomogeneous GaussMarkov fields with Potts region labels model for images to be used in a Bayesian estimation framework. The joint posterior law of all the unknowns (the unknown image, its segmentation (hidden variable) and all the hyperparameters) is approximated by a separable probability law via the variational Bayes technique. This approximation gives the possibility to obtain practically implemented joint restoration and segmentation algorithm. We will present some preliminary results and comparison with a MCMC Gibbs sampling based algorithm. We may note that the prior models proposed in this work are particularly appropriate for the images of the scenes or objects that are composed of a finite set of homogeneous materials. This is the case of many images obtained in nondestructive testing (NDT) applications.
Empirical Evaluation of Bayesian Sampling for Neural Classifiers
 1CANN'98: Proceedings of the 8th International Conference on Artificial Neural Networks
, 1998
"... Adopting a Bayesian approach and sampling the network parameters from their posterior distribution is a rather novel and promising method for improving the generalisation performance of neural network predictors. The present empirical study applies this scheme to a set of different synthetic and rea ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Adopting a Bayesian approach and sampling the network parameters from their posterior distribution is a rather novel and promising method for improving the generalisation performance of neural network predictors. The present empirical study applies this scheme to a set of different synthetic and realworld classification problems. The paper focuses on the dependence of the prediction results on the prior distribution of the network parameters and hyperparameters, and provides a critical evaluation of the automatic relevance determination (ARD) scheme for detecting irrelevant inputs. 1 Introduction Consider a Kfold classification problem, where an mdimensional feature vector x t is assigned to one of K classes fC 1 ; : : : ; CK g indicated by a label vector y t = (y 1 t ; : : : ; y K t ); y k t = 1 if x t 2 C k ; y k t = 0 if x t 62 C k ; ky t k = 1: For a neural network (NN) with K softmax 1 units in the final layer, the network outputs f k (x t ; w) 2 [0; 1] can be interp...