Results 1 - 10
of
33
How many clusters? Which clustering method? Answers via model-based cluster analysis
- THE COMPUTER JOURNAL
, 1998
"... ..."
Model-Based Clustering, Discriminant Analysis, and Density Estimation
- JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
, 2000
"... Cluster analysis is the automated search for groups of related observations in a data set. Most clustering done in practice is based largely on heuristic but intuitively reasonable procedures and most clustering methods available in commercial software are also of this type. However, there is little ..."
Abstract
-
Cited by 171 (23 self)
- Add to MetaCart
Cluster analysis is the automated search for groups of related observations in a data set. Most clustering done in practice is based largely on heuristic but intuitively reasonable procedures and most clustering methods available in commercial software are also of this type. However, there is little systematic guidance associated with these methods for solving important practical questions that arise in cluster analysis, such as \How many clusters are there?", "Which clustering method should be used?" and \How should outliers be handled?". We outline a general methodology for model-based clustering that provides a principled statistical approach to these issues. We also show that this can be useful for other problems in multivariate analysis, such as discriminant analysis and multivariate density estimation. We give examples from medical diagnosis, mineeld detection, cluster recovery from noisy data, and spatial density estimation. Finally, we mention limitations of the methodology, a...
An experimental comparison of several clustering and intialization methods
, 1998
"... We examine methods for clustering in high dimensions. In the first part of the paper, we perform an experimental comparison between three batch clustering algorithms: the Expectation–Maximization (EM) algorithm, a “winner take all ” version of the EM algorithm reminiscent of the K-means algorithm, a ..."
Abstract
-
Cited by 67 (0 self)
- Add to MetaCart
We examine methods for clustering in high dimensions. In the first part of the paper, we perform an experimental comparison between three batch clustering algorithms: the Expectation–Maximization (EM) algorithm, a “winner take all ” version of the EM algorithm reminiscent of the K-means algorithm, and model-based hierarchical agglomerative clustering. We learn naive-Bayes models with a hidden root node, using high-dimensional discrete-variable data sets (both real and synthetic). We find that the EM algorithm significantly outperforms the other methods, and proceed to investigate the effect of various initialization schemes on the final solution produced by the EM algorithm. The initializations that we consider are (1) parameters sampled from an uninformative prior, (2) random perturbations of the marginal distribution of the data, and (3) the output of hierarchical agglomerative clustering. Although the methods are substantially different, they lead to learned models that are strikingly similar in quality. 1
A Unified Framework for Model-based Clustering
- Journal of Machine Learning Research
, 2003
"... Model-based clustering techniques have been widely used and have shown promising results in many applications involving complex data. This paper presents a unified framework for probabilistic model-based clustering based on a bipartite graph view of data and models that highlights the commonaliti ..."
Abstract
-
Cited by 43 (6 self)
- Add to MetaCart
Model-based clustering techniques have been widely used and have shown promising results in many applications involving complex data. This paper presents a unified framework for probabilistic model-based clustering based on a bipartite graph view of data and models that highlights the commonalities and differences among existing model-based clustering algorithms. In this view, clusters are represented as probabilistic models in a model space that is conceptually separate from the data space. For partitional clustering, the view is conceptually similar to the ExpectationMaximization (EM) algorithm. For hierarchical clustering, the graph-based view helps to visualize critical/important distinctions between similarity-based approaches and model-based approaches.
MCLUST: Software for Model-based Cluster Analysis
- Journal of Classification
, 1999
"... MCLUST is a software package for cluster analysis written in Fortran and interfaced to the S-PLUS commercial software package1. It implements parameterized Gaussian hierarchical clustering algorithms [16, 1, 7] and the EM algorithm for parameterized Gaussian mixture models [5, 13, 3, 14] with the po ..."
Abstract
-
Cited by 39 (16 self)
- Add to MetaCart
MCLUST is a software package for cluster analysis written in Fortran and interfaced to the S-PLUS commercial software package1. It implements parameterized Gaussian hierarchical clustering algorithms [16, 1, 7] and the EM algorithm for parameterized Gaussian mixture models [5, 13, 3, 14] with the possible addition of a Poisson noise term. MCLUST also includes functions that combine hierarchical clustering, EM and the Bayesian Information Criterion (BIC) in a comprehensive clustering strategy [4, 8]. Methods of this type have shown promise in a number of practical applications, including character recognition [16], tissue segmentation [1], mine eld and seismic fault detection [4], identi cation of textile aws from images [2], and classi cation of astronomical data [3, 15]. Aweb page with related links can be found at
Hierarchical Latent Class Models for Cluster Analysis
- Journal of Machine Learning Research
, 2002
"... Latent class models are used for cluster analysis of categorical data. Underlying such a model is the assumption that the observed variables are mutually independent given the class variable. A serious problem with the use of latent class models, known as local dependence, is that this assumption is ..."
Abstract
-
Cited by 34 (9 self)
- Add to MetaCart
Latent class models are used for cluster analysis of categorical data. Underlying such a model is the assumption that the observed variables are mutually independent given the class variable. A serious problem with the use of latent class models, known as local dependence, is that this assumption is often untrue. In this paper we propose hierarchical latent class models as a framework where the local dependence problem can be addressed in a principled manner. We develop a search-based algorithm for learning hierarchical latent class models from data. The algorithm is evaluated using both synthetic and real-world data.
MCLUST: Software for Model-Based Cluster and Discriminant Analysis
, 1998
"... - k ) , (1) where x represents the data, and k is an integer subscript specifying a particular cluster. Clusters are ellipsoidal, centered at the means k . The covariances # k determine their other geometric features. # Funded by the O#ce of Naval Research under contracts N00014-96-1-0192 an ..."
Abstract
-
Cited by 29 (1 self)
- Add to MetaCart
- k ) , (1) where x represents the data, and k is an integer subscript specifying a particular cluster. Clusters are ellipsoidal, centered at the means k . The covariances # k determine their other geometric features. # Funded by the O#ce of Naval Research under contracts N00014-96-1-0192 and N00014-96-1-0330. 1 MathSoft, Inc., Seattle, WA USA --- http://www.mathsoft.com/splus 2 see http://lib.stat.cmu.edu/R/CRAN 1 Each covariance matrix is parameterized by eigenvalue decomposition in the form # k = # k D k A k D T k , where D k is the orthogonal matrix of eigenvectors, A k is a diagonal matrix whose elements are proportional to the eigenvalues of # k , and # k is a scalar. The orie
AE: MCLUST Version 3 for R: Normal Mixture Modeling and Model-Based Clustering
- Department of Statistics, University of Washington
, 2006
"... MCLUST is a contributed R package for normal mixture modeling and model-based clustering. It provides functions for parameter estimation via the EM algorithm for normal mixture models with a variety of covariance structures, and functions for simulation from these models. Also included are functions ..."
Abstract
-
Cited by 25 (1 self)
- Add to MetaCart
MCLUST is a contributed R package for normal mixture modeling and model-based clustering. It provides functions for parameter estimation via the EM algorithm for normal mixture models with a variety of covariance structures, and functions for simulation from these models. Also included are functions that combine model-based hierarchical clustering, EM for mixture estimation and the Bayesian Information Criterion (BIC) in comprehensive strategies for clustering, density estimation and discriminant analysis. There is additional functionality for displaying and visualizing the models along with clustering and classification results. A number of features of the software have been changed in this version, and the functionality has been expanded to include regularization for normal mixture models via a Bayesian prior. MCLUST is licensed by the University of Washington and distributed through
Model-Based Hierarchical Clustering
- In Proc. 16th Conf. Uncertainty in Artificial Intelligence
, 2000
"... We present an approach to model-based hierarchical clustering by formulating an objective function based on a Bayesian analysis. This model organizes the data into a cluster hierarchy while specifying a complex feature-set partitioning that is a key component of our model. Features can have ei ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
We present an approach to model-based hierarchical clustering by formulating an objective function based on a Bayesian analysis. This model organizes the data into a cluster hierarchy while specifying a complex feature-set partitioning that is a key component of our model. Features can have either a unique distribution in every cluster or a common distribution over some (or even all) of the clusters. The cluster subsets over which these features have such a common distribution correspond to the nodes (clusters) of the tree representing the hierarchy. We apply this general model to the problem of document clustering for which we use a multinomial likelihood function and Dirichlet priors. Our algorithm consists of a two-stage process wherein we first perform a flat clustering followed by a modified hierarchical agglomerative merging process that includes determining the features that will have common distributions over the merged clusters. The regularization induced...

