Results 1  10
of
11
On Bayesian analysis of mixtures with an unknown number of components
 INSTITUTE OF INTERNATIONAL ECONOMICS PROJECT ON INTERNATIONAL COMPETITION POLICY," COM/DAFFE/CLP/TD(94)42
, 1997
"... ..."
Marginal likelihood from the Gibbs output
 J. Am. Stat. Assoc
, 1995
"... Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at ..."
Abstract

Cited by 324 (19 self)
 Add to MetaCart
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
Efficient approximations for the marginal likelihood of Bayesian networks with hidden variables
 Machine Learning
, 1997
"... We discuss Bayesian methods for learning Bayesian networks when data sets are incomplete. In particular, we examine asymptotic approximations for the marginal likelihood of incomplete data given a Bayesian network. We consider the Laplace approximation and the less accurate but more efficient BIC/MD ..."
Abstract

Cited by 178 (10 self)
 Add to MetaCart
We discuss Bayesian methods for learning Bayesian networks when data sets are incomplete. In particular, we examine asymptotic approximations for the marginal likelihood of incomplete data given a Bayesian network. We consider the Laplace approximation and the less accurate but more efficient BIC/MDL approximation. We also consider approximations proposed by Draper (1993) and Cheeseman and Stutz (1995). These approximations are as efficient as BIC/MDL, but their accuracy has not been studied in any depth. We compare the accuracy of these approximations under the assumption that the Laplace approximation is the most accurate. In experiments using synthetic data generated from discrete naiveBayes models having a hidden root node, we find that (1) the BIC/MDL measure is the least accurate, having a bias in favor of simple models, and (2) the Draper and CS measures are the most accurate. 1
Dealing with label switching in mixture models
 Journal of the Royal Statistical Society, Series B
, 2000
"... In a Bayesian analysis of finite mixture models, parameter estimation and clustering are sometimes less straightforward that might be expected. In particular, the common practice of estimating parameters by their posterior mean, and summarising joint posterior distributions by marginal distributions ..."
Abstract

Cited by 109 (0 self)
 Add to MetaCart
In a Bayesian analysis of finite mixture models, parameter estimation and clustering are sometimes less straightforward that might be expected. In particular, the common practice of estimating parameters by their posterior mean, and summarising joint posterior distributions by marginal distributions, often leads to nonsensical answers. This is due to the socalled “labelswitching” problem, which is caused by symmetry in the likelihood of the model parameters. A frequent response to this problem is to remove the symmetry using artificial identifiability constraints. We demonstrate that this fails in general to solve the problem, and describe an alternative class of approaches, relabelling algorithms, which arise from attempting to minimise the posterior expected loss under a class of loss functions. We describe in detail one particularly simple and general relabelling algorithm, and illustrate its success in dealing with the labelswitching problem on two examples.
Bayesian Analysis of Mixture Models with an Unknown Number of Components  an alternative to reversible jump methods
, 1998
"... Richardson and Green (1997) present a method of performing a Bayesian analysis of data from a finite mixture distribution with an unknown number of components. Their method is a Markov Chain Monte Carlo (MCMC) approach, which makes use of the "reversible jump" methodology described by Green (1995). ..."
Abstract

Cited by 62 (0 self)
 Add to MetaCart
Richardson and Green (1997) present a method of performing a Bayesian analysis of data from a finite mixture distribution with an unknown number of components. Their method is a Markov Chain Monte Carlo (MCMC) approach, which makes use of the "reversible jump" methodology described by Green (1995). We describe an alternative MCMC method which views the parameters of the model as a (marked) point process, extending methods suggested by Ripley (1977) to create a Markov birthdeath process with an appropriate stationary distribution. Our method is easy to implement, even in the case of data in more than one dimension, and we illustrate it on both univariate and bivariate data. Keywords: Bayesian analysis, Birthdeath process, Markov process, MCMC, Mixture model, Model Choice, Reversible Jump, Spatial point process 1 Introduction Finite mixture models are typically used to model data where each observation is assumed to have arisen from one of k groups, each group being suitably modelle...
Issues in Bayesian Analysis of Neural Network Models
, 1998
"... This paper discusses these issues exploring the potentiality of Bayesian ideas in the analysis of NN models. Buntine and Weigend (1991) and MacKay (1992) have provided frameworks for their Bayesian analysis based on Gaussian approximations and Neal (1993) has applied hybrid Monte Carlo ideas. Ripley ..."
Abstract

Cited by 31 (0 self)
 Add to MetaCart
This paper discusses these issues exploring the potentiality of Bayesian ideas in the analysis of NN models. Buntine and Weigend (1991) and MacKay (1992) have provided frameworks for their Bayesian analysis based on Gaussian approximations and Neal (1993) has applied hybrid Monte Carlo ideas. Ripley (1993) and Cheng and Titterington (1994) have dwelt on the power of these ideas, specially as far as interpretation and architecture selection are concerned. See MacKay (1995) for a recent review. From a statistical modeling point of view NN's are a special instance of mixture models. Many issues about posterior multimodality and computational strategies in NN modeling are of relevance in the wider class of mixture models. Related recent references in the Bayesian literature on mixture models include Diebolt and Robert (1994), Escobar and West (1994), Robert and Mengersen (1995), Roeder and Wasserman (1995), West (1994), West and Cao (1993), West, Muller and Escobar (1994), and West and Turner (1994). We concentrate on approximation problems, though many of our suggestions can be translated to other areas. For those problems, NN's are viewed as highly nonlinear (semiparametric) approximators, where parameters are typically estimated by least squares. Applications of interest for practicioners include nonlinear regression, stochastic optimisation and regression metamodels for simulation output. The main issue we address here is how to undertake a Bayesian analysis of a NN model, and the uses of it we may make. Our contributions include: an evaluation of computational approaches to Bayesian analysis of NN models, including a novel Markov chain Monte Carlo scheme; a suggestion of a scheme for handling a variable architecture model and a scheme for combining NN models with more ...
Bayesian regularization for normal mixture estimation and modelbased clustering
, 2005
"... Normal mixture models are widely used for statistical modeling of data, including cluster analysis. However maximum likelihood estimation (MLE) for normal mixtures using the EM algorithm may fail as the result of singularities or degeneracies. To avoid this, we propose replacing the MLE by a maximum ..."
Abstract

Cited by 27 (4 self)
 Add to MetaCart
Normal mixture models are widely used for statistical modeling of data, including cluster analysis. However maximum likelihood estimation (MLE) for normal mixtures using the EM algorithm may fail as the result of singularities or degeneracies. To avoid this, we propose replacing the MLE by a maximum a posteriori (MAP) estimator, also found by the EM algorithm. For choosing the number of components and the model parameterization, we propose a modified version of BIC, where the likelihood is evaluated at the MAP instead of the MLE. We use a highly dispersed proper conjugate prior, containing a small fraction of one observation’s worth of information. The resulting method avoids degeneracies and singularities, but when these are not present it gives similar results to the standard method using MLE, EM and BIC. Key words: BIC; EM algorithm; mixture models; modelbased clustering; conjugate prior; posterior mode. 1
Dealing With Multimodal Posteriors and NonIdentifiability in Mixture Models
, 1999
"... In a Bayesian analysis of finite mixture models, the lack of identifiability of the parameters often leads to a posterior distribution which is highly multimodal and symmetric, making it difficult to interpret or summarize. A common approach to this problem is to make the parameters identifiable by ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
In a Bayesian analysis of finite mixture models, the lack of identifiability of the parameters often leads to a posterior distribution which is highly multimodal and symmetric, making it difficult to interpret or summarize. A common approach to this problem is to make the parameters identifiable by imposing artificial constraints. We demonstrate that this may fail to solve the problem, and describe and illustrate an alternative solution which involves postprocessing the results of a Markov Chain Monte Carlo (MCMC) scheme. Our method can be viewed either as a method of searching for a reasonable summary of the posterior distribution, or as a method of revising the prior distribution. KEYWORDS: Bayesian, Classification, Clustering, Identifiability, MCMC, Mixture model, Multimodal posterior 1 Introduction In this paper we consider problems which arise when taking a Bayesian approach to classification and clustering using mixture models. We consider the setting where we have observation...
Bayesian Model Selection for M/G/1 Queues
"... We describe several issues concerning model selection in M/G/1 queues, that is, systems in which a service unit attends customers arriving following a Poisson process and demanding service according to a general distribution. The main issue here is with this service distribution. We first consider s ..."
Abstract
 Add to MetaCart
We describe several issues concerning model selection in M/G/1 queues, that is, systems in which a service unit attends customers arriving following a Poisson process and demanding service according to a general distribution. The main issue here is with this service distribution. We first consider some traditional models. Since any positive, continuous distribution may be well approximated by a mixture of gamma distributions, we suggest using them to model service distributions. We describe a computational scheme for performing inference with those mixtures, including a discussion concerning prior modelling and model selection issues. An application to modelling an email service is provided. KEYWORDS: Queueing models, M/G/1, Mixtures of gammas, Markov chain Monte Carlo, Model selection, email models. 1 Introduction We are concerned here with queueing models, see Allen (1990) or Nelson (1995). Stemming from MacGrath et al (1987), MacGrath and Singpurwalla (1987), and Lehoczky (1990),...
Posterior Simulation for Feed Forward Neural Network Models
"... research. However, it leads to difficult computational problems, stemming from nonnormality and multimodality of posterior distributions, which hinder the use of methods like Laplace integration, Gaussian quadrature and Monte Carlo importance sampling. Multimodality issues have predated discussions ..."
Abstract
 Add to MetaCart
research. However, it leads to difficult computational problems, stemming from nonnormality and multimodality of posterior distributions, which hinder the use of methods like Laplace integration, Gaussian quadrature and Monte Carlo importance sampling. Multimodality issues have predated discussions in neural network research, see e.g. Ripley (1993), and are relevant as well for mixture models, see West, Muller and Escobar (1994) and Crawford (1994), of which FFNN's are a special case. There are three main reasons for multimodality of posterior models in FFNN's. The first one is symmetries due to relabeling; we mitigate this problem introducing appropriate inequality constraints among parameters. The second, and most worrisome, is the inclusion of several copies of the same term, in our case, terms with the same fl vector. Node duplication may be actually viewed as a manifestation of model mixing. The third one is inherent