Results 1 - 10
of
21
Bayesian Analysis of Mixture Models with an Unknown Number of Components -- an alternative to reversible jump methods
, 1998
"... Richardson and Green (1997) present a method of performing a Bayesian analysis of data from a finite mixture distribution with an unknown number of components. Their method is a Markov Chain Monte Carlo (MCMC) approach, which makes use of the "reversible jump" methodology described by Green (1995). ..."
Abstract
-
Cited by 41 (0 self)
- Add to MetaCart
Richardson and Green (1997) present a method of performing a Bayesian analysis of data from a finite mixture distribution with an unknown number of components. Their method is a Markov Chain Monte Carlo (MCMC) approach, which makes use of the "reversible jump" methodology described by Green (1995). We describe an alternative MCMC method which views the parameters of the model as a (marked) point process, extending methods suggested by Ripley (1977) to create a Markov birth-death process with an appropriate stationary distribution. Our method is easy to implement, even in the case of data in more than one dimension, and we illustrate it on both univariate and bivariate data. Keywords: Bayesian analysis, Birth-death process, Markov process, MCMC, Mixture model, Model Choice, Reversible Jump, Spatial point process 1 Introduction Finite mixture models are typically used to model data where each observation is assumed to have arisen from one of k groups, each group being suitably modelle...
Entropies and rates of convergence for maximum likelihood and Bayes estimation for mixtures of normal densities
- Ann. Statist
, 2001
"... We study the rates of convergence of the maximum likelihood estimator (MLE) and posterior distribution in density estimation problems, where the densities are location or location-scale mixtures of normal distributions with the scale parameter lying between two positive numbers. The true density is ..."
Abstract
-
Cited by 23 (9 self)
- Add to MetaCart
We study the rates of convergence of the maximum likelihood estimator (MLE) and posterior distribution in density estimation problems, where the densities are location or location-scale mixtures of normal distributions with the scale parameter lying between two positive numbers. The true density is also assumed to lie in this class with the true mixing distribution either compactly supported or having sub-Gaussian tails. We obtain bounds for Hellinger bracketing entropies for this class, and from these bounds, we deduce the convergence rates of (sieve) MLEs in Hellinger distance. The rate turns out to be �log n � κ / √ n, where κ ≥ 1 is a constant that depends on the type of mixtures and the choice of the sieve. Next, we consider a Dirichlet mixture of normals as a prior on the unknown density. We estimate the prior probability of a certain Kullback-Leibler type neighborhood and then invoke a general theorem that computes the posterior convergence rate in terms the growth rate of the Hellinger entropy and the concentration rate of the prior. The posterior distribution is also seen to converge at the rate �log n � κ / √ n in, where κ now depends on the tail behavior of the base measure of the Dirichlet process. 1. Introduction. A
Hypothesis Testing and Model Selection Via Posterior Simulation
- In Practical Markov Chain
, 1995
"... Introduction To motivate the methods described in this chapter, consider the following inference problem in astronomy (Soubiran, 1993). Until fairly recently, it has been believed that the Galaxy consists of two stellar populations, the disk and the halo. More recently, it has been hypothesized tha ..."
Abstract
-
Cited by 21 (1 self)
- Add to MetaCart
Introduction To motivate the methods described in this chapter, consider the following inference problem in astronomy (Soubiran, 1993). Until fairly recently, it has been believed that the Galaxy consists of two stellar populations, the disk and the halo. More recently, it has been hypothesized that there are in fact three stellar populations, the old (or thin) disk, the thick disk, and the halo, distinguished by their spatial distributions, their velocities, and their metallicities. These hypotheses have different implications for theories of the formation of the Galaxy. Some of the evidence for deciding whether there are two or three populations is shown in Figure 1, which shows radial and rotational velocities for n = 2; 370 stars. A natural model for this situation is a mixture model with J components, namely y i = J X j=1 ae j
On Fitting Mixture Models
, 1999
"... Consider the problem of fitting a finite Gaussian mixture, with an unknown number of components, to observed data. This paper proposes a new minimum description length (MDL) type criterion, termed MMDL (for mixture MDL), to select the number of components of the model. MMDL is based on the ident ..."
Abstract
-
Cited by 18 (4 self)
- Add to MetaCart
Consider the problem of fitting a finite Gaussian mixture, with an unknown number of components, to observed data. This paper proposes a new minimum description length (MDL) type criterion, termed MMDL (for mixture MDL), to select the number of components of the model. MMDL is based on the identification of an "equivalent sample size", for each component, which does not coincide with the full sample size. We also introduce an algorithm based on the standard expectationmaximization (EM) approach together with a new agglomerative step, called agglomerative EM (AEM). The experiments here reported have shown that MMDL outperforms existing criteria of comparable computational cost. The good behavior of AEM, namely its good robustness with respect to initialization, is also illustrated experimentally.
Learning hybrid Bayesian networks from data
, 1998
"... We illustrate two different methodologies for learning Hybrid Bayesian networks, that is, Bayesian networks containing both continuous and discrete variables, from data. The two methodologies differ in the way of handling continuous data when learning the Bayesian network structure. The first method ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
We illustrate two different methodologies for learning Hybrid Bayesian networks, that is, Bayesian networks containing both continuous and discrete variables, from data. The two methodologies differ in the way of handling continuous data when learning the Bayesian network structure. The first methodology uses discretized data to learn the Bayesian network structure, and the original non-discretized data for the parameterization of the learned structure. The second methodology uses non-discretized data both to learn the Bayesian network structure and its parameterization. For the direct handling of continuous data, we propose the use of artificial neural networks as probability estimators, to be used as an integral part of the scoring metric defined to search the space of Bayesian network structures. With both methodologies, we assume the availability of a complete dataset, with no missing values or hidden variables. We report experimental results aimed at comparing the two methodologies. These results provide evidence that learning with discretized data presents advantages both in terms of efficiency and in terms of accuracy of the learned models over the alternative approach of using non-discretized data.
Rates Of Convergence For The Gaussian Mixture Sieve
- The Annals of Statistics
, 2000
"... Gaussian mixtures provide a convenient method of density estimation that lies somewhere between parametric models and kernel... ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Gaussian mixtures provide a convenient method of density estimation that lies somewhere between parametric models and kernel...
Perfect Slice Samplers for Mixtures of Distributions
, 1999
"... This paper extends the result of Hobert et al. (1999) to the case of general nite mixtures of exponential family distributions, under conjugate priors, by proposing a dierent and more generic approach to the problem. The foundation of the technique used here relies on the facts that, under conjugate ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
This paper extends the result of Hobert et al. (1999) to the case of general nite mixtures of exponential family distributions, under conjugate priors, by proposing a dierent and more generic approach to the problem. The foundation of the technique used here relies on the facts that, under conjugate priors, the marginal posterior distribution of the latent variables Z is known in closed form, up to a constant, as exhibited and exploited for importance sampling in Casella et al. (1999), and that, moreover, minimum and maximum points can be found for this distribution. The results of Mira et al. (1999) on perfect slice samplers can then be adapted to this setting. While its practical implementation is limited to small sample sizes, we show that a coupling strategy of Breyer and Roberts (1999) can be easily implemented on the same principle for much larger sample sizes. The paper is organised as follows. In Section 2, we recall the marginalisation argument of Casella et al. (1999) to show that the posterior marginal distribution of Z is available in closed form and allows for a minimum and a maximum point in usual cases. Section 3 then provide a detailed description of the perfect slice sampling techniques, rst via a brute-force uniform proposal, then through an augmented distribution. Section 4 introduce a slow but generic perfect sampler based on a single backward chain and evaluates the validity of the approximation of Section 3. Perfect slice sampling for mixtures 3
Bayesian Time Series Classification
- Advances in Neural Processing Systems 14
, 2002
"... This paper proposes an approach to classification of adjacent segments of a time series as being either of K classes. We use a hierarchical model that consists of a feature extraction stage and a generative classifier which is built on top of these features. Such two stage approaches are often us ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
This paper proposes an approach to classification of adjacent segments of a time series as being either of K classes. We use a hierarchical model that consists of a feature extraction stage and a generative classifier which is built on top of these features. Such two stage approaches are often used in signal and image processing. The novel part of our work is that we link these stages probabilistically by using a latent feature space. To use one joint model is a Bayesian requirement, which has the advantage to fuse information according to its certainty.
Bayesian Computational Approaches to Model Selection
, 2000
"... this paper was to provide a summary of the stateof -the-art theory on Bayesian model selection and the application of MCMC algorithms. It has been shown how applications of considerable complexity can be handled successfully within this framework. Several methods for dealing with the use of default, ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
this paper was to provide a summary of the stateof -the-art theory on Bayesian model selection and the application of MCMC algorithms. It has been shown how applications of considerable complexity can be handled successfully within this framework. Several methods for dealing with the use of default, improper priors in the Bayesian model selection 506 Andrieu, Doucet et al. framework has been shown. Special care has been taken to pinpoint the subtleties of jumping from one parameter space to another, and in general, to show the construction of MCMC samplers in such scenarios. The focus in the paper was on the reversible jump MCMC algorithm as this is the most widely used of all existing methods; it is easy to use, flexible and has nice properties. Many references have been cited, with the emphasis being given to articles with signal processing applications. A Notation
Bayesian Inference for Mixtures of Stable Distributions
"... In many different fields such as hydrology, telecommunications, physics of condensed matter and finance, the gaussian model results unsatisfactory and reveals difficulties in fitting data with skewness, heavy tails and multimodality. The use of stable distributions allows for modelling skewness and ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
In many different fields such as hydrology, telecommunications, physics of condensed matter and finance, the gaussian model results unsatisfactory and reveals difficulties in fitting data with skewness, heavy tails and multimodality. The use of stable distributions allows for modelling skewness and heavy tails but gives rise to inferential problems related to the estimation of the stable distribution's parameters. The aim of this work is to generalise the stable distribution framework by introducing a model that accounts also for multimodality. In particular we introduce a stable mixture model and a suitable reparameterisation of the mixture, which allow us to make inference on the mixture parameters. We use a full Bayesian approach and MCMC simulation techniques for the estimation of the posterior distribution.

