Results 1  10
of
14
Bayesian Neural Networks and Density Networks
 Nuclear Instruments and Methods in Physics Research, A
, 1994
"... This paper reviews the Bayesian approach to learning in neural networks, then introduces a new adaptive model, the density network. This is a neural network for which target outputs are provided, but the inputs are unspecied. When a probability distribution is placed on the unknown inputs, a latent ..."
Abstract

Cited by 39 (8 self)
 Add to MetaCart
This paper reviews the Bayesian approach to learning in neural networks, then introduces a new adaptive model, the density network. This is a neural network for which target outputs are provided, but the inputs are unspecied. When a probability distribution is placed on the unknown inputs, a latent variable model is dened that is capable of discovering the underlying dimensionality of a data set. A Bayesian learning algorithm for these networks is derived and demonstrated. 1 Introduction to the Bayesian view of learning A binary classier is a parameterized mapping from an input x to an output y 2 [0; 1]); when its parameters w are specied, the classier states the probability that an input x belongs to class t = 1, rather than the alternative t = 0. Consider a binary classier which models the probability as a sigmoid function of x: P (t = 1jx; w;H) = y(x; w;H) = 1 1 + e wx (1) This form of model is known to statisticians as a linear logistic model, and in the neural networks ...
Issues in Bayesian Analysis of Neural Network Models
, 1998
"... This paper discusses these issues exploring the potentiality of Bayesian ideas in the analysis of NN models. Buntine and Weigend (1991) and MacKay (1992) have provided frameworks for their Bayesian analysis based on Gaussian approximations and Neal (1993) has applied hybrid Monte Carlo ideas. Ripley ..."
Abstract

Cited by 31 (0 self)
 Add to MetaCart
This paper discusses these issues exploring the potentiality of Bayesian ideas in the analysis of NN models. Buntine and Weigend (1991) and MacKay (1992) have provided frameworks for their Bayesian analysis based on Gaussian approximations and Neal (1993) has applied hybrid Monte Carlo ideas. Ripley (1993) and Cheng and Titterington (1994) have dwelt on the power of these ideas, specially as far as interpretation and architecture selection are concerned. See MacKay (1995) for a recent review. From a statistical modeling point of view NN's are a special instance of mixture models. Many issues about posterior multimodality and computational strategies in NN modeling are of relevance in the wider class of mixture models. Related recent references in the Bayesian literature on mixture models include Diebolt and Robert (1994), Escobar and West (1994), Robert and Mengersen (1995), Roeder and Wasserman (1995), West (1994), West and Cao (1993), West, Muller and Escobar (1994), and West and Turner (1994). We concentrate on approximation problems, though many of our suggestions can be translated to other areas. For those problems, NN's are viewed as highly nonlinear (semiparametric) approximators, where parameters are typically estimated by least squares. Applications of interest for practicioners include nonlinear regression, stochastic optimisation and regression metamodels for simulation output. The main issue we address here is how to undertake a Bayesian analysis of a NN model, and the uses of it we may make. Our contributions include: an evaluation of computational approaches to Bayesian analysis of NN models, including a novel Markov chain Monte Carlo scheme; a suggestion of a scheme for handling a variable architecture model and a scheme for combining NN models with more ...
Hyperparameters: optimize, or integrate out?
 IN MAXIMUM ENTROPY AND BAYESIAN METHODS, SANTA BARBARA
, 1996
"... I examine two approximate methods for computational implementation of Bayesian hierarchical models, that is, models which include unknown hyperparameters such as regularization constants. In the `evidence framework' the model parameters are integrated over, and the resulting evidence is maximized o ..."
Abstract

Cited by 18 (4 self)
 Add to MetaCart
I examine two approximate methods for computational implementation of Bayesian hierarchical models, that is, models which include unknown hyperparameters such as regularization constants. In the `evidence framework' the model parameters are integrated over, and the resulting evidence is maximized over the hyperparameters. The optimized hyperparameters are used to define a Gaussian approximation to the posterior distribution. In the alternative `MAP' method, the true posterior probability is found by integrating over the hyperparameters. The true posterior is then maximized over the model parameters, and a Gaussian approximation is made. The similarities of the two approaches, and their relative merits, are discussed, and comparisons are made with the ideal hierarchical Bayesian solution. In moderately illposed problems, integration over hyperparameters yields a probability distribution with a skew peak which causes significant biases to arise in the MAP method. In contrast, the evidence framework is shown to introduce negligible predictive error, under straightforward conditions. General lessons are drawn concerning the distinctive properties of inference in many dimensions.
A Probabilistic Neural Network Framework for Detection of Malignant Melanoma
 Malignant Melanoma,” Artificial Neural Networks in Cancer Diagnosis, Prognosis and Patient Management
, 1999
"... Contents 1 INTRODUCTION 3 1.1 Malignant melanoma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Evolution of malignant melanoma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Image acquisition techniques . . . . . . . . . . . . . . . . . ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
Contents 1 INTRODUCTION 3 1.1 Malignant melanoma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Evolution of malignant melanoma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Image acquisition techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3.1 Traditional imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3.2 Dermatoscopic imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.4 Dermatoscopic features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2 FEATURE EXTRACTION IN DERMATOSCOPIC IMAGES 8 2.1 Image acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Image preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.1 Median filtering . . . . . . . . . . . . . . . . . . . . . . . . . . .
Generalization in Neural Networks
 Projektarbejde ved Elektronisk Institut, DTU
, 1993
"... 1 Abstract This report is concerned with methods for optimizing the generalization ability of neural networks. The framework is developed to deal with regression type problems, where the networks are trained on a limited amount of noisy data. In this context the problem can be formulated as findin ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
1 Abstract This report is concerned with methods for optimizing the generalization ability of neural networks. The framework is developed to deal with regression type problems, where the networks are trained on a limited amount of noisy data. In this context the problem can be formulated as finding the optimal trade off between data fit and model complexity. Two paradigms for reducing model complexity are discussed: pruning and weight decay. It is shown by numerical experiments that application of weight decay is essential for obtaining good generalization performance. This is explained by the way in which weight decay confines the space of possible networks to a space of `reasonable' networks. Two methods for making statistical estimates of the generalization performance without use of validation sets are presented: the Generalization method and the Bayesian method. The advantage of not needing validation sets is that all available data can be utilized in the training phase. This f...
Extended Bayesian Learning
 Proceedings of ESANN 97, European Symposium on Artificial neural networks, Bruges
, 1997
"... . In Bayesian learning one represents the relative degree of believe in different values of the weight vector  including biases  by considering a probability distribution function over weight space. In general, this a priori probability is expected to come from a Gaussian with zero mean and flexib ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
. In Bayesian learning one represents the relative degree of believe in different values of the weight vector  including biases  by considering a probability distribution function over weight space. In general, this a priori probability is expected to come from a Gaussian with zero mean and flexible variance which is callded a hyperparameter. It can be optimized automatically during training by maximizing the evidence. The extended Bayesian learning (EBL) approach consists of considering a more general form of priors by using several weight classes and by considering the mean of the Gaussian distribution to be another hyperparameter. We propose an algorithm which determines automatically the optimal number of different weight classes and where the weights can change from one class to another. Our approach is applied in several benchmark problems and outperforms simple Bayesian learning as well as other optimization strategies. 1. Introduction We begin by considering the problem of ...
Evaluating Confidence Measures in a Neural Network Based Sleep Stager
, 1997
"... In this paper we report about an extensive investigation on neural networks  multilayer perceptrons (MLP), in particular  in the task of automatic sleep staging based on electroencephalogram (EEG) and electrooculogram (EOG) signals. After the important first step of preprocessing and feature sel ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
In this paper we report about an extensive investigation on neural networks  multilayer perceptrons (MLP), in particular  in the task of automatic sleep staging based on electroencephalogram (EEG) and electrooculogram (EOG) signals. After the important first step of preprocessing and feature selection (for which, a searchbased selection technique could reduce the large number of features to a feature vector of size ten), the main focus was on evaluating the used of socalled "doubtlevels" and "confidence intervals" ("error bars") in improving the results by rejecting uncertain cases and patterns not well represented by the training set. The main technique used here is that of Bayesian inference to arrive at distributions of network weights based on training data. We compare the results of the fullblown Bayesian method with a reduced method calculating only the maximum posterior solution and with an MLP trained with the more common gradient descent technique for minimizing an err...
Experiences with Bayesian Learning in a Real World Application
 Advances in Neural Information Processing Systems 10
, 1998
"... This paper reports about an application of Bayes' inferred neural network classifiers to the field of automatic sleep staging. Up to our current knowledge this is one of the first real world applications of Bayesian inference. We therefore want to share our experience of this learning paradigm with ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
This paper reports about an application of Bayes' inferred neural network classifiers to the field of automatic sleep staging. Up to our current knowledge this is one of the first real world applications of Bayesian inference. We therefore want to share our experience of this learning paradigm with a wider audience. The reason for using Bayesian learning for this task is twofold. First, Bayesian inference is known to embody regularization automatically. Second, a side effect of Bayesian learning leads to larger variance of network outputs in regions without training data. This results in well known moderation effects, which can be used to detect outiers. In a 5 fold crossvalidation experiment the full Bayesian solution was not better than a single maximum aposteriori (MAP) solution found with D.J. MacKay's evidence approximation (see [6]). In a second experiment we studied the properties of both solutions in rejecting classification of movement artefacts. 1 1 Category: Application...
Connectionist Adaptive Control
, 1993
"... The work considers a general framework for learning control, known as reinforcement learning. It documents the first application of a reinforcement learning controller to the task of regulating an inverted pendulum in hardware. It explores the application of nonlinear parametric models known as con ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
The work considers a general framework for learning control, known as reinforcement learning. It documents the first application of a reinforcement learning controller to the task of regulating an inverted pendulum in hardware. It explores the application of nonlinear parametric models known as connectionist models, or neural networks, to learning control. It approaches learning control as an optimization problem, and proposes a promising new learning control algorithm. The algorithm comprises two optimisations  the first learns what the task is, the second learns how to complete the task efficiently.
Divide and Conquer: Pattern Recognition using Mixtures of Experts
, 1997
"... speech recognition task. The mixture of experts is shown to be a superior method for speaker adaptation of connectionist models to new conditions. In addition, the significant improvement of the performance of an ensemble of classifiers via the mixture framework is demonstrated. In addition to these ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
speech recognition task. The mixture of experts is shown to be a superior method for speaker adaptation of connectionist models to new conditions. In addition, the significant improvement of the performance of an ensemble of classifiers via the mixture framework is demonstrated. In addition to these applications, a number of theoretical extensions of the mixture of experts have been made in this thesis. The link between hierarchical mixtures of experts (HME) and other tree based models is described and used to motivate a new training algorithm for the HME, known as tree growing. Tree growing is a constructive algorithm which results in faster training and a more efficient use of parameters than standard training methods. The second extension described is path pruning which is a fast training and evaluation algorithm for deep hierarchies in which paths through the tree which have low probability are ignored. A stabilising method for the algorithm based on weight decay regularisation is