Results 1  10
of
34
Unsupervised learning of finite mixture models
 IEEE Transactions on pattern analysis and machine intelligence
, 2002
"... AbstractÐThis paper proposes an unsupervised algorithm for learning a finite mixture model from multivariate data. The adjective ªunsupervisedº is justified by two properties of the algorithm: 1) it is capable of selecting the number of components and 2) unlike the standard expectationmaximization ..."
Abstract

Cited by 271 (20 self)
 Add to MetaCart
AbstractÐThis paper proposes an unsupervised algorithm for learning a finite mixture model from multivariate data. The adjective ªunsupervisedº is justified by two properties of the algorithm: 1) it is capable of selecting the number of components and 2) unlike the standard expectationmaximization (EM) algorithm, it does not require careful initialization. The proposed method also avoids another drawback of EM for mixture fitting: the possibility of convergence toward a singular estimate at the boundary of the parameter space. The novelty of our approach is that we do not use a model selection criterion to choose one among a set of preestimated candidate models; instead, we seamlessly integrate estimation and model selection in a single algorithm. Our technique can be applied to any type of parametric mixture model for which it is possible to write an EM algorithm; in this paper, we illustrate it with experiments involving Gaussian mixtures. These experiments testify for the good performance of our approach. Index TermsÐFinite mixtures, unsupervised learning, model selection, minimum message length criterion, Bayesian methods, expectationmaximization algorithm, clustering. æ 1
ModelBased Clustering, Discriminant Analysis, and Density Estimation
 JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
, 2000
"... Cluster analysis is the automated search for groups of related observations in a data set. Most clustering done in practice is based largely on heuristic but intuitively reasonable procedures and most clustering methods available in commercial software are also of this type. However, there is little ..."
Abstract

Cited by 270 (24 self)
 Add to MetaCart
Cluster analysis is the automated search for groups of related observations in a data set. Most clustering done in practice is based largely on heuristic but intuitively reasonable procedures and most clustering methods available in commercial software are also of this type. However, there is little systematic guidance associated with these methods for solving important practical questions that arise in cluster analysis, such as \How many clusters are there?", "Which clustering method should be used?" and \How should outliers be handled?". We outline a general methodology for modelbased clustering that provides a principled statistical approach to these issues. We also show that this can be useful for other problems in multivariate analysis, such as discriminant analysis and multivariate density estimation. We give examples from medical diagnosis, mineeld detection, cluster recovery from noisy data, and spatial density estimation. Finally, we mention limitations of the methodology, a...
Practical Bayesian Density Estimation Using Mixtures Of Normals
 Journal of the American Statistical Association
, 1995
"... this paper, we propose some solutions to these problems. Our goal is to come up with a simple, practical method for estimating the density. This is an interesting problem in its own right, as well as a first step towards solving other inference problems, such as providing more flexible distributions ..."
Abstract

Cited by 115 (2 self)
 Add to MetaCart
this paper, we propose some solutions to these problems. Our goal is to come up with a simple, practical method for estimating the density. This is an interesting problem in its own right, as well as a first step towards solving other inference problems, such as providing more flexible distributions in hierarchical models. To see why the posterior is improper under the usual reference prior, we write the model in the following way. Let Z = (Z 1 ; : : : ; Z n ) and X = (X 1 ; : : : ; X n ). The Z
Computational and Inferential Difficulties With Mixture Posterior Distributions
 Journal of the American Statistical Association
, 1999
"... This paper deals with both exploration and interpretation problems related to posterior distributions for mixture models. The specification of mixture posterior distributions means that the presence of k! modes is known immediately. Standard Markov chain Monte Carlo techniques usually have difficult ..."
Abstract

Cited by 112 (12 self)
 Add to MetaCart
This paper deals with both exploration and interpretation problems related to posterior distributions for mixture models. The specification of mixture posterior distributions means that the presence of k! modes is known immediately. Standard Markov chain Monte Carlo techniques usually have difficulties with wellseparated modes such as occur here; the Markov chain Monte Carlo sampler stays within a neighbourhood of a local mode and fails to visit other equally important modes. We show that exploration of these modes can be imposed on the Markov chain Monte Carlo sampler using tempered transitions based on Langevin algorithms. However, as the prior distribution does not distinguish between the different components, the posterior mixture distribution is symmetric and thus standard estimators such as posterior means cannot be used. Since this is also true for most nonsymmetric priors, we propose alternatives for Bayesian inference for permutation invariant posteriors, including a cluster...
Markov Chain Monte Carlo methods and the label switching problem in Bayesian mixture modelling
 Statistical Science
"... Abstract. In the past ten years there has been a dramatic increase of interest in the Bayesian analysis of finite mixture models. This is primarily because of the emergence of Markov chain Monte Carlo (MCMC) methods. While MCMC provides a convenient way to draw inference from complicated statistical ..."
Abstract

Cited by 51 (4 self)
 Add to MetaCart
Abstract. In the past ten years there has been a dramatic increase of interest in the Bayesian analysis of finite mixture models. This is primarily because of the emergence of Markov chain Monte Carlo (MCMC) methods. While MCMC provides a convenient way to draw inference from complicated statistical models, there are many, perhaps underappreciated, problems associated with the MCMC analysis of mixtures. The problems are mainly caused by the nonidentifiability of the components under symmetric priors, which leads to socalled label switching in the MCMC output. This means that ergodic averages of component specific quantities will be identical and thus useless for inference. We review the solutions to the label switching problem, such as artificial identifiability constraints, relabelling algorithms and label invariant loss functions. We also review various MCMC sampling schemes that have been suggested for mixture models and discuss posterior sensitivity to prior specification.
Issues in Bayesian Analysis of Neural Network Models
, 1998
"... This paper discusses these issues exploring the potentiality of Bayesian ideas in the analysis of NN models. Buntine and Weigend (1991) and MacKay (1992) have provided frameworks for their Bayesian analysis based on Gaussian approximations and Neal (1993) has applied hybrid Monte Carlo ideas. Ripley ..."
Abstract

Cited by 31 (0 self)
 Add to MetaCart
This paper discusses these issues exploring the potentiality of Bayesian ideas in the analysis of NN models. Buntine and Weigend (1991) and MacKay (1992) have provided frameworks for their Bayesian analysis based on Gaussian approximations and Neal (1993) has applied hybrid Monte Carlo ideas. Ripley (1993) and Cheng and Titterington (1994) have dwelt on the power of these ideas, specially as far as interpretation and architecture selection are concerned. See MacKay (1995) for a recent review. From a statistical modeling point of view NN's are a special instance of mixture models. Many issues about posterior multimodality and computational strategies in NN modeling are of relevance in the wider class of mixture models. Related recent references in the Bayesian literature on mixture models include Diebolt and Robert (1994), Escobar and West (1994), Robert and Mengersen (1995), Roeder and Wasserman (1995), West (1994), West and Cao (1993), West, Muller and Escobar (1994), and West and Turner (1994). We concentrate on approximation problems, though many of our suggestions can be translated to other areas. For those problems, NN's are viewed as highly nonlinear (semiparametric) approximators, where parameters are typically estimated by least squares. Applications of interest for practicioners include nonlinear regression, stochastic optimisation and regression metamodels for simulation output. The main issue we address here is how to undertake a Bayesian analysis of a NN model, and the uses of it we may make. Our contributions include: an evaluation of computational approaches to Bayesian analysis of NN models, including a novel Markov chain Monte Carlo scheme; a suggestion of a scheme for handling a variable architecture model and a scheme for combining NN models with more ...
On Fitting Mixture Models
, 1999
"... Consider the problem of fitting a finite Gaussian mixture, with an unknown number of components, to observed data. This paper proposes a new minimum description length (MDL) type criterion, termed MMDL (for mixture MDL), to select the number of components of the model. MMDL is based on the ident ..."
Abstract

Cited by 22 (4 self)
 Add to MetaCart
Consider the problem of fitting a finite Gaussian mixture, with an unknown number of components, to observed data. This paper proposes a new minimum description length (MDL) type criterion, termed MMDL (for mixture MDL), to select the number of components of the model. MMDL is based on the identification of an "equivalent sample size", for each component, which does not coincide with the full sample size. We also introduce an algorithm based on the standard expectationmaximization (EM) approach together with a new agglomerative step, called agglomerative EM (AEM). The experiments here reported have shown that MMDL outperforms existing criteria of comparable computational cost. The good behavior of AEM, namely its good robustness with respect to initialization, is also illustrated experimentally.
Bayesian estimation of switching ARMA models
 Journal of Econometrics
, 1996
"... Switching ARMA processes have recently appeared as an efficient modelling to nonlinear time series models, because they can represent multiple or heterogeneous dynamics through simple components. The levels of dependence between the observations are double: at a first level, the parameters of the mo ..."
Abstract

Cited by 20 (4 self)
 Add to MetaCart
Switching ARMA processes have recently appeared as an efficient modelling to nonlinear time series models, because they can represent multiple or heterogeneous dynamics through simple components. The levels of dependence between the observations are double: at a first level, the parameters of the model (AR, MA or ARMA) are selected by a Markovian procedure. At a second level, the next observation is generated according to a standard time series model. When the model involves a moving average structure, the complexity of the resulting likelihood function is such that simulation techniques, like those proposed by Shephard (1994) and Billio and Monfort (1995), are necessary to derive an inference on the parameters of the model. We propose in this paper a Bayesian approach with a noninformative prior distribution developed in Mengersen and Robert (1996) and Robert and Titterington (1996) in the setup of mixtures of distributions and hidden Markov models, respectively. The computation of th...
Flexible Parametric Measurement Error Models
 Biometrics
, 1999
"... SUMMARY. Inferences in measurement error models can be sensitive to modeling assumptions. Specifically, if the model is incorrect, the estimates can be inconsistent. To reduce sensitivity to modeling assumptions and yet still retain the efficiency of parametric inference, we propose using flexible p ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
SUMMARY. Inferences in measurement error models can be sensitive to modeling assumptions. Specifically, if the model is incorrect, the estimates can be inconsistent. To reduce sensitivity to modeling assumptions and yet still retain the efficiency of parametric inference, we propose using flexible parametric models that can accommodate departures from standard parametric models. We use mixtures of normals for this purpose. We study two cases in detail: a linear errorsinvariables model and a changepoint Berkson model.
Bayesian finite mixtures with an unknown number of components: the allocation sampler
 University of Glasgow
, 2005
"... A new Markov chain Monte Carlo method for the Bayesian analysis of finite mixture distributions with an unknown number of components is presented. The sampler is characterized by a state space consisting only of the number of components and the latent allocation variables. Its main advantage is that ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
A new Markov chain Monte Carlo method for the Bayesian analysis of finite mixture distributions with an unknown number of components is presented. The sampler is characterized by a state space consisting only of the number of components and the latent allocation variables. Its main advantage is that it can be used, with minimal changes, for mixtures of components from any parametric family, under the assumption that the component parameters can be integrated out of the model analytically. Artificial and real data sets are used to illustrate the method and mixtures of univariate and of multivariate normals are explicitly considered. The problem of label switching, when parameter inference is of interest, is addressed in a postprocessing stage.