Results 1  10
of
62
Clustering Using Objective Functions and Stochastic Search
, 2007
"... Summary. A new approach to clustering multivariate data, based on a multilevel linear mixed model, is proposed. A key feature of the model is that observations from the same cluster are correlated, because they share clusterspecific random effects. The inclusion of clusterspecific random effects a ..."
Abstract

Cited by 29 (3 self)
 Add to MetaCart
Summary. A new approach to clustering multivariate data, based on a multilevel linear mixed model, is proposed. A key feature of the model is that observations from the same cluster are correlated, because they share clusterspecific random effects. The inclusion of clusterspecific random effects allows parsimonious departure from an assumed base model for cluster mean profiles. This departure is captured statistically via the posterior expectation, or best linear unbiased predictor. One of the parameters in the model is the true underlying partition of the data, and the posterior distribution of this parameter, which is known up to a normalizing constant, is used to cluster the data. The problem of finding partitions with high posterior probability is not amenable to deterministic methods such as the EM algorithm. Thus, we propose a stochastic search algorithm that is driven by a Markov chain that is a mixture of two Metropolis–Hastings algorithms—one that makes small scale changes to individual objects and another that performs large scale moves involving entire clusters. The methodology proposed is fundamentally different from the wellknown finite mixture model approach to clustering, which does not explicitly include the partition as a parameter, and involves an independent and identically distributed structure.
Bayesian modelbased clustering procedures
 Journal of Computational and Graphical Statistics
"... This article establishes a general formulation for Bayesian modelbased clustering, in which subset labels are exchangeable, and items are also exchangeable, possibly up to covariate effects. The notational framework is rich enough to encompass a variety of existing procedures, including some recent ..."
Abstract

Cited by 22 (1 self)
 Add to MetaCart
This article establishes a general formulation for Bayesian modelbased clustering, in which subset labels are exchangeable, and items are also exchangeable, possibly up to covariate effects. The notational framework is rich enough to encompass a variety of existing procedures, including some recently discussed methods involving stochastic search or hierarchical clustering, but more importantly allows the formulation of clustering procedures that are optimal with respect to a specified loss function. Our focus is on loss functions based on pairwise coincidences, that is, whether pairs of items are clustered into the same subset or not. Optimization of the posterior expected loss function can be formulated as a binary integer programming problem, which can be readily solved by standard software when clustering a modest number of items, but quickly becomes impractical as problem scale increases. To combat this, a new heuristic itemswapping algorithm is introduced. This performs well in our numerical experiments, on both simulated and real data examples. The article includes a comparison of the statistical performance of the (approximate) optimal clustering with earlier methods that are modelbased but ad hoc in their detailed definition.
Bayesian model based clustering procedures
 Journal of Computational and Graphical Statistics Lo
, 2006
"... This paper establishes a general framework for Bayesian modelbased clustering, in which subset labels are exchangeable, and items are also exchangeable, possibly up to covariate effects. It is rich enough to encompass a variety of existing procedures, including some recently discussed methodologies ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
This paper establishes a general framework for Bayesian modelbased clustering, in which subset labels are exchangeable, and items are also exchangeable, possibly up to covariate effects. It is rich enough to encompass a variety of existing procedures, including some recently discussed methodologies involving stochastic search or hierarchical clustering, but more importantly allows the formulation of clustering procedures that are optimal with respect to a specified loss function. Our focus is on loss functions based on pairwise coincidences, that is, whether pairs of items are clustered into the same subset or not. Optimisation of the posterior expected loss function can be formulated as a binary integer programming problem, which can be readily solved, for example by the simplex method, when clustering a modest number of items, but quickly becomes impractical as problem scale increases. To combat this, a new heuristic itemswapping algorithm is introduced. This performs well in our numerical experiments, on both simulated and real data examples. The paper includes a comparison of the statistical performance of the (approximate) optimal clustering with earlier methods that are modelbased but ad hoc in their detailed definition.
Populationbased reversible jump Markov chain Monte Carlo
, 2007
"... In this paper we present an extension of populationbased Markov chain Monte Carlo (MCMC) to the transdimensional case. One of the main challenges in MCMCbased inference is that of simulating from high and transdimensional target measures. In such cases, MCMC methods may not adequately traverse t ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
In this paper we present an extension of populationbased Markov chain Monte Carlo (MCMC) to the transdimensional case. One of the main challenges in MCMCbased inference is that of simulating from high and transdimensional target measures. In such cases, MCMC methods may not adequately traverse the support of the target; the simulation results will be unreliable. We develop population methods to deal with such problems, and give a result proving the uniform ergodicity of these population algorithms, under mild assumptions. This result is used to demonstrate the superiority, in terms of convergence rate, of a population transition kernel over a reversible jump sampler for a Bayesian variable selection problem. We also give an example of a population algorithm for a Bayesian multivariate mixture model with an unknown number of components. This is applied to gene expression data of 1000 data points in six dimensions and it is demonstrated that our algorithm out performs some competing Markov chain samplers.
Gene Expression Time Course Clustering with Countably Infinite Hidden Markov Models
"... Most existing approaches to clustering gene expression time course data treat the different time points as independent dimensions and are invariant to permutations, such as reversal, of the experimental time course. Approaches utilizing HMMs have been shown to be helpful in this regard, but are hamp ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
Most existing approaches to clustering gene expression time course data treat the different time points as independent dimensions and are invariant to permutations, such as reversal, of the experimental time course. Approaches utilizing HMMs have been shown to be helpful in this regard, but are hampered by having to choose model architectures with appropriate complexities. Here we propose for a clustering application an HMM with a countably infinite state space; inference in this model is possible by recasting it in the hierarchical Dirichlet process (HDP) framework (Teh et al. 2006), and hence we call it the HDPHMM. We show that the infinite model outperforms model selection methods over finite models, and traditional timeindependent methods, as measured by a variety of external and internal indices for clustering on two large publicly available data sets. Moreover, we show that the infinite models utilize more hidden states and employ richer architectures (e.g. statetostate transitions) without the damaging effects of overfitting. 1
Interacting Multiple Try Algorithms with Different Proposal Distributions
, 2010
"... ar ..."
(Show Context)
Modeling and Visualizing Uncertainty in Gene Expression Clusters using Dirichlet Process Mixtures
, 2007
"... Although the use of clustering methods has rapidly become one of the standard computational approaches in the literature of microarray gene expression data, little attention has been paid to uncertainty in the results obtained. Dirichlet process mixture models provide a nonparametric Bayesian alter ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
(Show Context)
Although the use of clustering methods has rapidly become one of the standard computational approaches in the literature of microarray gene expression data, little attention has been paid to uncertainty in the results obtained. Dirichlet process mixture models provide a nonparametric Bayesian alternative to the bootstrap approach to modeling uncertainty in gene expression clustering. Most previously published applications of Bayesian model based clustering methods have been to short time series data. In this paper we present a case study of the application of nonparametric Bayesian clustering methods to the clustering of highdimensional nontime series gene expression data using full Gaussian covariances. We use the probability that two genes belong to the same cluster in a Dirichlet process mixture model as a measure of the similarity of these gene expression profiles. Conversely, this probability can be used to define a dissimilarity measure, which, for the purposes of visualization, can be input to one of the standard linkage algorithms used for hierarchical clustering. Biologically plausible results are obtained from the Rosetta compendium of expression profiles which extend previously published cluster analyses of this data.
Bayesian MAP Model Selection of Chain Event Graphs
, 2009
"... The class of chain event graph models is a generalisation of the class of discrete Bayesian networks, retaining most of the structural advantages of the Bayesian network for model interrogation, propagation and learning, while more naturally encoding asymmetric state spaces and the order in which ev ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
The class of chain event graph models is a generalisation of the class of discrete Bayesian networks, retaining most of the structural advantages of the Bayesian network for model interrogation, propagation and learning, while more naturally encoding asymmetric state spaces and the order in which events happen. In this paper we demonstrate how with complete sampling, conjugate closed form model selection based on product Dirichlet priors is possible, and prove that suitable homogeneity assumptions characterise the product Dirichlet prior on this class of models. We demonstrate our techniques using two educational examples.
Posterior simulation across nonparametric models for functional clustering
 Journal of the Royal Statistical Society B
, 2007
"... Summary. By choosing a species sampling random probability measure for the distribution of the basis coefficients, a general class of nonparametric Bayesian methods for clustering of functional data is developed. Allowing the basis functions to be unknown, one faces the problem of posterior simulati ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
Summary. By choosing a species sampling random probability measure for the distribution of the basis coefficients, a general class of nonparametric Bayesian methods for clustering of functional data is developed. Allowing the basis functions to be unknown, one faces the problem of posterior simulation over a highdimensional space of semiparametric models. To address this problem, we propose a novel MetropolisHastings algorithm for moving between models, with a nested generalized collapsed Gibbs sampler for updating the model parameters. Focusing on Dirichlet process priors for the distribution of the basis coefficients in multivariate linear spline models, we apply the approach to the problem of clustering of hormone trajectories. This approach allows the number of clusters and the shape of the trajectories within each cluster to be
Penalized Clustering of Large Scale Functional Data with Multiple Covariates
"... In this article, we propose a penalized clustering method for large scale data with multiple covariates through a functional data approach. In the proposed method, responses and covariates are linked together through nonparametric multivariate functions (fixed effects), which have great flexibility ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
In this article, we propose a penalized clustering method for large scale data with multiple covariates through a functional data approach. In the proposed method, responses and covariates are linked together through nonparametric multivariate functions (fixed effects), which have great flexibility in modeling a variety of function features, such as jump points, branching, and periodicity. Functional ANOVA is employed to further decompose multivariate functions in a reproducing kernel Hilbert space and provide associated notions of main effect and interaction. Parsimonious random effects are used to capture various correlation structures. The mixedeffect models are nested under a general mixture model, in which the heterogeneity of functional data is characterized. We propose a penalized Henderson’s likelihood approach for modelfitting and design a rejectioncontrolled EM algorithm for the estimation. Our method selects