Results 1  10
of
24
Clustering using Monte Carlo CrossValidation
, 1996
"... Finding the "right" number of clusters, k, for a data set is a difficult, and often illposed, problem. In a probabilistic clustering context, likelihoodratios, penalized likelihoods, and Bayesian techniques are among the more popular techniques. In this paper a new crossvalidated likeli ..."
Abstract

Cited by 66 (0 self)
 Add to MetaCart
Finding the "right" number of clusters, k, for a data set is a difficult, and often illposed, problem. In a probabilistic clustering context, likelihoodratios, penalized likelihoods, and Bayesian techniques are among the more popular techniques. In this paper a new crossvalidated likelihood criterion is investigated for determining cluster structure. A practical clustering algorithm based on Monte Carlo crossvalidation (MCCV) is introduced. The algorithm permits the data analyst to judge if there is strong evidence for a particular k, or perhaps weaker evidence over a subrange of k values. Experimental results with Gaussian mixtures on real and simulated data suggest that MCCV provides genuine insight into cluster structure. vfold crossvalidation appears inferior to the penalized likelihood method (BIC), a Bayesian algorithm (AutoClass v2.0), and the new MCCV algorithm. Overall, MCCV and AutoClass appear the most reliable of the methods. MCCV provides the dataminer with a usefu...
Estimating the integrated likelihood via posterior simulation using the harmonic mean identity
 Bayesian Statistics
, 2007
"... The integrated likelihood (also called the marginal likelihood or the normalizing constant) is a central quantity in Bayesian model selection and model averaging. It is defined as the integral over the parameter space of the likelihood times the prior density. The Bayes factor for model comparison a ..."
Abstract

Cited by 48 (2 self)
 Add to MetaCart
(Show Context)
The integrated likelihood (also called the marginal likelihood or the normalizing constant) is a central quantity in Bayesian model selection and model averaging. It is defined as the integral over the parameter space of the likelihood times the prior density. The Bayes factor for model comparison and Bayesian testing is a ratio of integrated likelihoods, and the model weights in Bayesian model averaging are proportional to the integrated likelihoods. We consider the estimation of the integrated likelihood from posterior simulation output, aiming at a generic method that uses only the likelihoods from the posterior simulation iterations. The key is the harmonic mean identity, which says that the reciprocal of the integrated likelihood is equal to the posterior harmonic mean of the likelihood. The simplest estimator based on the identity is thus the harmonic mean of the likelihoods. While this is an unbiased and simulationconsistent estimator, its reciprocal can have infinite variance and so it is unstable in general. We describe two methods for stabilizing the harmonic mean estimator. In the first one, the parameter space is reduced in such a way that the modified estimator involves a harmonic mean of heaviertailed densities, thus resulting in a finite variance estimator. The resulting
Inference in modelbased cluster analysis
, 1995
"... A new approach to cluster analysis has been introduced based on parsimonious geometric modelling of the withingroup covariance matrices in a mixture of multivariate normal distributions, using hierarchical agglomeration and iterative relocation. It works well and is widely used via the MCLUST softw ..."
Abstract

Cited by 31 (7 self)
 Add to MetaCart
A new approach to cluster analysis has been introduced based on parsimonious geometric modelling of the withingroup covariance matrices in a mixture of multivariate normal distributions, using hierarchical agglomeration and iterative relocation. It works well and is widely used via the MCLUST software available in SPLUS and StatLib. However, it has several limitations: there is no assessment of the uncertainty about the classification, the partition can be suboptimal, parameter estimates are biased, the shape matrix has to be specified by the user, prior group probabilities are assumed to be equal, the method for choosing the number of groups is based on a crude approximation, and no formal way of choosing between the various possible models is included. Here, we propose a new approach which overcomes all these difficulties. It consists of exact Bayesian inference via Gibbs sampling, and the calculation of Bayes factors (for choosing the model and the number of groups) from the output using the LaplaceMetropolis estimator. It works well in several real and simulated examples.
Representing Degree Distributions, Clustering, and Homophily in Social Networks With Latent Cluster Random Effects Models
, 2007
"... preparation of this paper. Social network data often involve transitivity, homophily on observed attributes, clustering, and heterogeneity of actors. We propose the latent cluster random effects model to take account of all of these features, and we describe a Bayesian estimation method. The model f ..."
Abstract

Cited by 20 (1 self)
 Add to MetaCart
(Show Context)
preparation of this paper. Social network data often involve transitivity, homophily on observed attributes, clustering, and heterogeneity of actors. We propose the latent cluster random effects model to take account of all of these features, and we describe a Bayesian estimation method. The model fits two real datasets well. We show by simulation that networks with the same degree distribution can have very different clustering behaviors. This suggests that scalefree and smallworld network models may not be adequate for all types of network, while our model recovers both the clustering and the degree distribution. 1
A Bayesian approach to the selection and testing of mixture models
 Statistica Sinica
, 2001
"... Abstract: An important aspect of mixture modeling is the selection of the number of mixture components. In this paper, we discuss the Bayes factor as a selection tool. The discussion will focus on two aspects: computation of the Bayes factor and prior sensitivity. For the computation, we propose a v ..."
Abstract

Cited by 17 (4 self)
 Add to MetaCart
Abstract: An important aspect of mixture modeling is the selection of the number of mixture components. In this paper, we discuss the Bayes factor as a selection tool. The discussion will focus on two aspects: computation of the Bayes factor and prior sensitivity. For the computation, we propose a variant of Chib’s estimator that accounts for the nonidentifiability of the mixture components. To reduce the prior sensitivity of the Bayes factor, we propose to extend the model with a hyperprior. We further discuss the use of posterior predictive checks for examining the fit of the model. The ideas are illustrated by means of a psychiatric diagnosis example.
A novel approach for clustering proteomics data using bayesian fast fourier transform
 Bioinformatics
, 2005
"... * To Whom correspondence should be addressed ..."
Alternatives to the Gibbs Sampling Scheme
, 1992
"... A variation of the Gibbs sampling scheme is defined by driving the simulated Markov chain by the conditional distributions of an approximation to the posterior rather than the posterior distribution itself. Choosing a multivariate normal mixture form for the approximation enables reparametrization w ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
A variation of the Gibbs sampling scheme is defined by driving the simulated Markov chain by the conditional distributions of an approximation to the posterior rather than the posterior distribution itself. Choosing a multivariate normal mixture form for the approximation enables reparametrization which is crucial to improve convergence in the Gibbs sampler. Using an approximation to the posterior density also opens the possiblity to include a learning process about the  in the operational sense of evaluating posterior integrals  unknown posterior density in the algorithm. While ideally this should be done using available pointwise evaluations of the posterior density, this is too difficult in a general framework and we use instead the currently available Monte Carlo sample to adjust the approximating density. This is done using a simple multivariate implementation of the mixture of Dirichlet density estimation algorithm. Keywords: Markov chain Monte Carlo, Bayesian sampling, stocha...
Bayesian Inference on Mixtures of Distributions
, 2008
"... This survey covers stateoftheart Bayesian techniques for the estimation of mixtures. It complements the earlier Marin et al. (2005) by studying new types of distributions, the multinomial, latent class and t distributions. It also exhibits closed form solutions for Bayesian inference in some disc ..."
Abstract

Cited by 9 (7 self)
 Add to MetaCart
(Show Context)
This survey covers stateoftheart Bayesian techniques for the estimation of mixtures. It complements the earlier Marin et al. (2005) by studying new types of distributions, the multinomial, latent class and t distributions. It also exhibits closed form solutions for Bayesian inference in some discrete setups. At last, it sheds a new light on the computation of Bayes factors via the approximation of Chib (1995).