Results 1  10
of
25
An Introduction to MCMC for Machine Learning
, 2003
"... This purpose of this introductory paper is threefold. First, it introduces the Monte Carlo method with emphasis on probabilistic machine learning. Second, it reviews the main building blocks of modern Markov chain Monte Carlo simulation, thereby providing and introduction to the remaining papers of ..."
Abstract

Cited by 221 (2 self)
 Add to MetaCart
This purpose of this introductory paper is threefold. First, it introduces the Monte Carlo method with emphasis on probabilistic machine learning. Second, it reviews the main building blocks of modern Markov chain Monte Carlo simulation, thereby providing and introduction to the remaining papers of this special issue. Lastly, it discusses new interesting research horizons.
Neural networks for classification: a survey
 and Cybernetics  Part C: Applications and Reviews
, 2000
"... Abstractâ€”Classification is one of the most active research and application areas of neural networks. The literature is vast and growing. This paper summarizes the some of the most important developments in neural network classification research. Specifically, the issues of posterior probability esti ..."
Abstract

Cited by 45 (0 self)
 Add to MetaCart
Abstractâ€”Classification is one of the most active research and application areas of neural networks. The literature is vast and growing. This paper summarizes the some of the most important developments in neural network classification research. Specifically, the issues of posterior probability estimation, the link between neural and conventional classifiers, learning and generalization tradeoff in classification, the feature variable selection, as well as the effect of misclassification costs are examined. Our purpose is to provide a synthesis of the published research in this area and stimulate further research interests and efforts in the identified topics. Index Termsâ€”Bayesian classifier, classification, ensemble methods, feature variable selection, learning and generalization, misclassification costs, neural networks. I.
Robust Full Bayesian Learning for Radial Basis Networks
, 2001
"... We propose a hierachical full Bayesian model for radial basis networks. This model treats the model dimension (number of neurons), model parameters,... ..."
Abstract

Cited by 24 (4 self)
 Add to MetaCart
We propose a hierachical full Bayesian model for radial basis networks. This model treats the model dimension (number of neurons), model parameters,...
Robust Full Bayesian Learning for Neural Networks
, 1999
"... In this paper, we propose a hierarchical full Bayesian model for neural networks. This model treats the model dimension (number of neurons), model parameters, regularisation parameters and noise parameters as random variables that need to be estimated. We develop a reversible jump Markov chain Monte ..."
Abstract

Cited by 12 (9 self)
 Add to MetaCart
In this paper, we propose a hierarchical full Bayesian model for neural networks. This model treats the model dimension (number of neurons), model parameters, regularisation parameters and noise parameters as random variables that need to be estimated. We develop a reversible jump Markov chain Monte Carlo (MCMC) method to perform the necessary computations. We find that the results obtained using this method are not only better than the ones reported previously, but also appear to be robust with respect to the prior specification. In addition, we propose a novel and computationally efficient reversible jump MCMC simulated annealing algorithm to optimise neural networks. This algorithm enables us to maximise the joint posterior distribution of the network parameters and the number of basis function. It performs a global search in the joint space of the parameters and number of parameters, thereby surmounting the problem of local minima. We show that by calibrating the full hierarchical ...
Sequential Monte Carlo Methods For Optimisation Of Neural Network Models
, 1998
"... We discuss a novel strategy for training neural networks using sequential Monte Carlo algorithms and propose a new hybrid gradient descent/sampling importance resampling algorithm (HySIR). In terms of both computational time and accuracy, the hybrid SIR is a clear improvement over conventional seque ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
We discuss a novel strategy for training neural networks using sequential Monte Carlo algorithms and propose a new hybrid gradient descent/sampling importance resampling algorithm (HySIR). In terms of both computational time and accuracy, the hybrid SIR is a clear improvement over conventional sequential Monte Carlo techniques. The new algorithm may be viewed as a global optimisation strategy, which allows us to learn the probability distributions of the network weights and outputs in a sequential framework. It is well suited to applications involving online, nonlinear and nonGaussian signal processing. We show how the new algorithm outperforms extended Kalman filter training on several problems. In particular, we address the problem of pricing option contracts, traded in financial markets. In this context, we are able to estimate the onestepahead probability density functions of the options prices.
On Bayesian model assessment and choice using crossvalidation predictive densities
, 2001
"... We consider the problem of estimating the distribution of the expected utility of the Bayesian model (expected utility is also known as generalization error). We use the crossvalidation predictive densities to compute the expected utilities. We demonstrate that in flexible nonlinear models having ..."
Abstract

Cited by 7 (7 self)
 Add to MetaCart
We consider the problem of estimating the distribution of the expected utility of the Bayesian model (expected utility is also known as generalization error). We use the crossvalidation predictive densities to compute the expected utilities. We demonstrate that in flexible nonlinear models having many parameters, the importance sampling approximated leaveoneout crossvalidation (ISLOOCV) proposed in (Gelfand et al., 1992) may not work. We discuss how the reliability of the importance sampling can be evaluated and in case there is reason to suspect the reliability of the importance sampling, we suggest to use predictive densities from the kfold crossvalidation (kfoldCV). We also note that the kfoldCV has to be used if data points have certain dependencies. As the kfoldCV predictive densities are based on slightly smaller data sets than the full data set, we use a bias correction proposed in (Burman, 1989) when computing the expected utilities. In order to assess the reliability of the estimated expected utilities, we suggest a quick and generic approach based on the Bayesian bootstrap for obtaining samples from the distributions of the expected utilities. Our main goal is to estimate how good (in terms of application field) the predictive ability of the model is, but the distributions of the expected utilities can also be used for comparing different models. With the proposed method, it is easy to compute the probability that one method has better expected utility than some other method. If the predictive likelihood is used as a utility (instead
Consistency of Posterior Distributions for Neural Networks
 Neural Networks
, 1998
"... In this paper we show that the posterior distribution for feedforward neural networks is asymptotically consistent. This paper extends earlier results on universal approximation properties of neural networks to the Bayesian setting. The proof of consistency embeds the problem in a density estimation ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
In this paper we show that the posterior distribution for feedforward neural networks is asymptotically consistent. This paper extends earlier results on universal approximation properties of neural networks to the Bayesian setting. The proof of consistency embeds the problem in a density estimation problem, then uses bounds on the bracketing entropy to show that the posterior is consistent over Hellinger neighborhoods. It then relates this result back to the regression setting. We show consistency in both the setting of the number of hidden nodes growing with the sample size, and in the case where the number of hidden nodes is treated as a parameter. Thus we provide a theoretical justification for using neural networks for nonparametric regression in a Bayesian framework. Keywords: Bayesian statistics, Asymptotic consistency, Posterior approximation, Nonparametric regression, Sieve Asymptotics, Hellinger distance, Bracketing entropy The author is indebted to Larry Wasserman for all ...
Bayesian neural network approaches to ovarian cancer identification from highresolution mass spectrometry data
 BIOINFORMATICS
, 2005
"... ..."
Learning Hyperparameters for Neural Network Models Using Hamiltonian Dynamics
, 2000
"... Learning Hyperparameters for Neural Network Models Using Hamiltonian Dynamics Kiam Choo Master of Science Graduate Department of Computer Science University of Toronto 2000 We consider a feedforward neural network model with hyperparameters controlling groups of weights. Given some training da ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Learning Hyperparameters for Neural Network Models Using Hamiltonian Dynamics Kiam Choo Master of Science Graduate Department of Computer Science University of Toronto 2000 We consider a feedforward neural network model with hyperparameters controlling groups of weights. Given some training data, the posterior distribution of the weights and the hyperparameters can be obtained by alternately updating the weights with hybrid Monte Carlo and sampling from the hyperparameters using Gibbs sampling. However, this method becomes slow for networks with large hidden layers. We address this problem by incorporating the hyperparameters into the hybrid Monte Carlo update. However, the region of state space under the posterior with large hyperparameters is huge and has low probability density, while the region with small hyperparameters is very small and very high density. As hybrid Monte Carlo inherently does not move well between such regions, we reparameterize the weights to make the two ...