## Clustering ensembles: Models of consensus and weak partitions (2005)

### Cached

### Download Links

- [dataclustering.cse.msu.edu]
- [www.cse.msu.edu]
- [dataclustering.cse.msu.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE |

Citations: | 46 - 3 self |

### BibTeX

@ARTICLE{Topchy05clusteringensembles:,

author = {Alexander Topchy and Anil K. Jain and William Punch},

title = {Clustering ensembles: Models of consensus and weak partitions},

journal = {IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE},

year = {2005},

volume = {27},

pages = {1866--1881}

}

### Years of Citing Articles

### OpenURL

### Abstract

Clustering ensembles have emerged as a powerful method for improving both the robustness as well as the stability of unsupervised classification solutions. However, finding a consensus clustering from multiple partitions is a difficult problem that can be approached from graph-based, combinatorial or statistical perspectives. This study extends previous research on clustering ensembles in several respects. First, we introduce a unified representation for multiple clusterings and formulate the corresponding categorical clustering problem. Second, we propose a probabilistic model of consensus using a finite mixture of multinomial distributions in a space of clusterings. A combined partition is found as a solution to the corresponding maximum likelihood problem using the EM algorithm. Third, we define a new consensus function that is related to the classical intra-class variance criterion using the generalized mutual information definition. Finally, we demonstrate the efficacy of combining partitions generated by weak clustering algorithms that use data projections and random data splits. A simple explanatory model is offered for the behavior of combinations of such weak clustering components. Combination accuracy is analyzed as a function of several parameters that control the power and resolution of component partitions as well as the number of partitions. We also analyze clustering ensembles with incomplete information and the effect of missing cluster labels on the quality of overall consensus. Experimental results demonstrate the effectiveness of the proposed methods on several real-world datasets.

### Citations

8089 | Maximum likelihood from incomplete data via the EM algorithm
- Dempster, Laird, et al.
- 1977
(Show Context)
Citation Context ...istribution corresponds to a cluster in the target consensus 3spartition, and is assumed to be a multivariate multinomial distribution. The maximum likelihood problem is solved using the EM algorithm =-=[8]-=-. There are several advantages to QMI and EM consensus functions. These include: (i) complete avoidance of solving the label correspondence problem, (ii) low computational complexity, and (iii) abilit... |

2152 |
Algorithms for Clustering Data
- Jain, Dubes
- 1988
(Show Context)
Citation Context ...mining, e.g., see review in [41]. Several recent independent studies [10, 12, 14, 15, 43, 47] have pioneered clustering ensembles as a new branch in the conventional taxonomy of clustering algorithms =-=[26, 27]-=-. Please see the Appendix for detailed review of the related work, including [7, 11, 16, 19, 28, 31, 35]. The problem of clustering combination can be defined generally as follows: given multiple clus... |

1631 | Experiments with a new boosting algorithm
- Freund, Schapire
- 1996
(Show Context)
Citation Context ...olutions reduce the variance component of the expected error rate and increase the robustness of the solution. 2sFrom the supervised case we also learn that the proper combination of weak classifiers =-=[32, 25, 18, 6]-=- may achieve arbitrarily low error rates on training data, as well as reduce the predictive error. One can expect that using many simple, but computationally inexpensive components will be preferred t... |

1300 | Data clustering: A review
- Jain, Murty, et al.
- 1999
(Show Context)
Citation Context ...mining, e.g., see review in [41]. Several recent independent studies [10, 12, 14, 15, 43, 47] have pioneered clustering ensembles as a new branch in the conventional taxonomy of clustering algorithms =-=[26, 27]-=-. Please see the Appendix for detailed review of the related work, including [7, 11, 16, 19, 28, 31, 35]. The problem of clustering combination can be defined generally as follows: given multiple clus... |

794 | A fast and high quality multilevel scheme for partitioning irregular graphs
- Karypis, Kumar
- 1998
(Show Context)
Citation Context ...ns. We have investigated the performance of a family of consensus functions based on categorical clustering including the co-association based hierarchical methods [15, 16, 17], hypergraph algorithms =-=[47, 29, 30]-=- and our new consensus functions. Combination accuracy is analyzed as a function of the number and the resolution of the clustering components. In addition, we study clustering performance when some c... |

643 | Knowledge acquisition via incremental conceptual clustering - Fisher - 1987 |

601 | On the Optimality of the Simple Bayesian Classifier under Zero-One Loss
- Domingos, Pazzani
- 1997
(Show Context)
Citation Context ...e clusters in πC are much less sensitive to the conditional independence approximation than the estimated values of probabilities P( y | Θ) , as supported by the analysis of naïve Bayes classifier in =-=[9]-=-. m ij m i m m i m ). (3) (5) 7s( j) ( j) The last ingredient of the mixture model is the choice of a probability density P ( y | θ ) for the components of the vectors yi. Since the variables yij take... |

562 | Automatic subspace clustering of high dimensional data for data mining applications
- Agrawal, Gehrke, et al.
- 1998
(Show Context)
Citation Context ...he availability of prior information about the data domain is crucial for successful clustering, though such information can be hard to obtain, even from experts. Identification of relevant subspaces =-=[2]-=- or visualization [24] may help to establish the sample data’s conformity to the underlying distributions or, at least, to the proper number of clusters. The exploratory nature of clustering tasks dem... |

396 | Cluster ensembles — a knowledge reuse framework for combining multiple partitions
- Strehl, Ghosh
(Show Context)
Citation Context ...e functions. Fusion of clusterings using multiple 1ssources of data or features becomes increasingly important in distributed data mining, e.g., see review in [41]. Several recent independent studies =-=[10, 12, 14, 15, 43, 47]-=- have pioneered clustering ensembles as a new branch in the conventional taxonomy of clustering algorithms [26, 27]. Please see the Appendix for detailed review of the related work, including [7, 11, ... |

355 | The Random Subspace Method for Constructing Decision Forests
- Ho
- 1998
(Show Context)
Citation Context ...olutions reduce the variance component of the expected error rate and increase the robustness of the solution. 2sFrom the supervised case we also learn that the proper combination of weak classifiers =-=[32, 25, 18, 6]-=- may achieve arbitrarily low error rates on training data, as well as reduce the predictive error. One can expect that using many simple, but computationally inexpensive components will be preferred t... |

334 | An analysis of Bayesian classifiers
- Langley, Iba, et al.
- 1992
(Show Context)
Citation Context ...nt clustering algorithms (indexed by j) are not truly independent, the approximation by product in Eq. (5) can be justified by the excellent performance of naive Bayes classifiers in discrete domains =-=[34]-=-. Our ultimate goal is to make a discrete label assignment to the data in X through an indirect route of density estimation of Y. The assignments of patterns to the clusters in πC are much less sensit... |

278 | Arcing classifiers
- Breiman
- 1998
(Show Context)
Citation Context ...olutions reduce the variance component of the expected error rate and increase the robustness of the solution. 2sFrom the supervised case we also learn that the proper combination of weak classifiers =-=[32, 25, 18, 6]-=- may achieve arbitrarily low error rates on training data, as well as reduce the predictive error. One can expect that using many simple, but computationally inexpensive components will be preferred t... |

271 | Bagging, boosting, and C4.5 - Quinlan - 1996 |

268 | Unsupervised learning of finite mixture models
- Figueredo, Jain
- 2001
(Show Context)
Citation Context ...ssume that the target number of clusters is predetermined. It should be noted, however, that mixture model in unsupervised classification greatly facilitates estimation of the true number of clusters =-=[13]-=-. Maximum likelihood formulation of the problem specifically allows us to estimate M by using additional objective functions during the inference, such as the minimum description length of the model. ... |

227 |
Latent Variable Models and Factor Analysis
- Bartholomew
- 1987
(Show Context)
Citation Context ...ective functions during the inference, such as the minimum description length of the model. In addition, the proposed consensus algorithm can be viewed as a version of Latent Class Analysis (e.g. see =-=[4]-=-), which has rigorous statistical means for quantifying plausibility of a candidate mixture model. Whereas the finite mixture model may not be valid for the patterns in the original space (the initial... |

188 | Data Clustering
- Jain, Murty, et al.
- 1999
(Show Context)
Citation Context ...mance of different consensus functions. We have investigated the performance of a family of consensus functions based on categorical clustering including the co-association based hierarchical methods =-=[15, 16, 17]-=-, hypergraph algorithms [47, 29, 30] and our new consensus functions. Combination accuracy is analyzed as a function of the number and the resolution of the clustering components. In addition, we stud... |

184 | Supervised learning from incomplete data via an EM approach
- Ghahramani, Jordan
- 1994
(Show Context)
Citation Context ...rmation can occur in clustering combination of distributed data or ensemble of clusterings of non-identical replicas of a dataset. It is possible to apply the EM algorithm in the case of missing data =-=[20]-=-, namely missing cluster labels for some of the data points. In these situations, each vector yi in Y can be split into observed and missing components yi = (yi obs , yi mis ). Incorporation of a miss... |

169 |
Measures of Association for Cross Classification
- Goodman, Kruskal
- 1954
(Show Context)
Citation Context ...orrectly predicted both with the knowledge of clustering πC and without it. The category utility function can also be written as Goodman-Kruskal index for the contingency table between two partitions =-=[22, 39]-=-. The overall utility of the partition πC with respect to all the partitions in Π can be measured as the sum of pair-wise agreements: H U( π , Π ) =∑ U( π , π ) . C C i i= 1 Therefore, the best median... |

164 | Support vector clustering
- Ben-Hur, Horn, et al.
- 2001
(Show Context)
Citation Context ...rtitions). It is somewhat reminiscent of classification approaches based on kernel methods which rely on linear discriminant functions in the transformed space. For example, Support Vector Clustering =-=[5]-=- seeks spherical clusters after the kernel transformation that corresponds to more complex cluster shapes in the original pattern space. 4 Information-Theoretic Consensus of Clusterings Another candid... |

101 | Bagging to improve the accuracy of a clustering procedure
- Dudoit, Fridlyand
(Show Context)
Citation Context ...e functions. Fusion of clusterings using multiple 1ssources of data or features becomes increasingly important in distributed data mining, e.g., see review in [41]. Several recent independent studies =-=[10, 12, 14, 15, 43, 47]-=- have pioneered clustering ensembles as a new branch in the conventional taxonomy of clustering algorithms [26, 27]. Please see the Appendix for detailed review of the related work, including [7, 11, ... |

96 | Data clustering using evidence accumulation
- Fred, Jain
- 2002
(Show Context)
Citation Context ...e functions. Fusion of clusterings using multiple 1ssources of data or features becomes increasingly important in distributed data mining, e.g., see review in [41]. Several recent independent studies =-=[10, 12, 14, 15, 43, 47]-=- have pioneered clustering ensembles as a new branch in the conventional taxonomy of clustering algorithms [26, 27]. Please see the Appendix for detailed review of the related work, including [7, 11, ... |

93 | An impossibility theorem for clustering
- Kleinberg
- 2002
(Show Context)
Citation Context ...t to supervised classification, clustering is inherently an ill-posed problem, whose solution violates at least one of the common assumptions about scale-invariance, richness, and cluster consistency =-=[33]-=-. Different clustering solutions may seem equally plausible without a priori knowledge about the underlying data distributions. Every clustering algorithm implicitly or explicitly assumes a certain da... |

92 | Automated Construction of Classifications: Conceptual Clustering Versus Numerical Taxonomy
- Michalski, Stepp
- 1983
(Show Context)
Citation Context ...of components (features) as well as analyze various sample size issues. Perhaps the main advantage of this representation is that it facilitates the use of known algorithms for categorical clustering =-=[37, 48]-=- and allows one to design new consensus heuristics in a transparent way. The extended representation of data X can be illustrated by a table with N rows and (d+H) columns: The consensus clustering is ... |

84 | Quantification method of classification process, concept of structural α-entorpy, Kybernetika 3
- Havrda, Charvát
- 1967
(Show Context)
Citation Context ...will reduce the mutual information criterion to the category utility function discussed before. We proceed from the generalized entropy of degree s for a discrete probability distribution P=(p1,…,pn) =-=[23]-=-: H s n 1−s −1⎛ ( P) = ( 2 −1) ⎜∑ ⎝ i= Shannon’s entropy is the limit form of Eq.(18): s→1 1 p i= 1 s i ⎞ −1⎟, ⎠ s lim H ( P) = −∑ p log p . n i 2 i s > 0, s ≠1 Generalized mutual information between ... |

84 | A monte carlo algorithm for fast projective clustering
- Procopiuc, Jones, et al.
- 2002
(Show Context)
Citation Context ... are an excellent source of clustering diversity that provides different views of the data. Projective clustering is an active topic in data mining. For example, algorithms such as CLIQUE [2] and DOC =-=[42]-=- can discover both useful projections as well as data clusters. Here, however, we are only concerned with the use of random projections for the purpose of clustering combination. 18sx x Figure 3. Depe... |

75 | Dimensionality reduction using genetic algorithms
- Raymer, Punch, et al.
(Show Context)
Citation Context ...true number of clusters. Moreover, the clustering error obtained by EM and MCLA algorithms with k=4 for “Biochemistry” data [1] was the same as found by supervised classifiers applied to this dataset =-=[45]-=-. 6.4 Experiments with Incomplete Partitions. This set of experiments focused on the dependence of clustering accuracy on the number of patterns with missing cluster labels. As before, an ensemble of ... |

45 | Collective, hierarchical clustering from distributed, heterogeneous data
- Johnson, Kargupta
- 2000
(Show Context)
Citation Context ... 43, 47] have pioneered clustering ensembles as a new branch in the conventional taxonomy of clustering algorithms [26, 27]. Please see the Appendix for detailed review of the related work, including =-=[7, 11, 16, 19, 28, 31, 35]-=-. The problem of clustering combination can be defined generally as follows: given multiple clusterings of the data set, find a combined clustering with better quality. While the problem of clustering... |

43 |
Automated star/galaxy discrimination with neural networks
- Odewahn, Stockwell, et al.
- 1992
(Show Context)
Citation Context ...tions. 6.1 Datasets. Table 2 summarizes the details of the datasets. Five datasets of different nature have been used in the experiments. “Biochemical” and “Galaxy” data sets are described in [1] and =-=[40]-=-, respectively. Table 2: Characteristics of the datasets. Dataset No. of No. of No. of Total no. Av. k -means features classes points/class of points error (%) Biochem. 7 2 2138-3404 5542 47.4 Galaxy ... |

38 | Finding Consistent Clusters in Data Partitions
- Fred
- 2001
(Show Context)
Citation Context |

34 |
Protein Data Bank
- Abola, Bernstein, et al.
- 1987
(Show Context)
Citation Context ...sus functions. 6.1 Datasets. Table 2 summarizes the details of the datasets. Five datasets of different nature have been used in the experiments. “Biochemical” and “Galaxy” data sets are described in =-=[1]-=- and [40], respectively. Table 2: Characteristics of the datasets. Dataset No. of No. of No. of Total no. Av. k -means features classes points/class of points error (%) Biochem. 7 2 2138-3404 5542 47.... |

34 |
The median procedure for partitions
- Barthélemy, Leclerc
- 1995
(Show Context)
Citation Context ...tributes, or, in other terms, a median partition problem. Median partition can be viewed as the best summary of the given input partitions. As an optimization problem, median partition is NP-complete =-=[3]-=-, with a continuum of heuristics for an approximate solution. This work focuses on the primary problem of clustering ensembles, namely the consensus function, which creates the combined clustering. We... |

34 | Stochastic discrimination
- Kleinberg
- 1990
(Show Context)
Citation Context |

33 |
Bagging for path-based clustering
- Fischer, Buhmann
(Show Context)
Citation Context |

16 | Voting-Merging: an ensemble method for clustering
- Dimitriadou, Weingessel, et al.
- 2001
(Show Context)
Citation Context ... 43, 47] have pioneered clustering ensembles as a new branch in the conventional taxonomy of clustering algorithms [26, 27]. Please see the Appendix for detailed review of the related work, including =-=[7, 11, 16, 19, 28, 31, 35]-=-. The problem of clustering combination can be defined generally as follows: given multiple clusterings of the data set, find a combined clustering with better quality. While the problem of clustering... |

14 | Evidence Accumulation Clustering based on the K-means algorithm - Fred, Jain - 2002 |

14 |
Ensembles of partitions via data resampling
- Minaei, Topchy, et al.
- 2004
(Show Context)
Citation Context ...andomness or different parameters of some algorithms, e.g. initializations and various values of k in k-means algorithm [35, 15, 16]. 3. Use many subsamples of the data set, such as bootstrap samples =-=[10, 38]-=-. These methods rely on the clustering algorithms, which are powerful on their own, and as such are computationally involved. We argue that it is possible to generate the partitions using weak, but le... |

13 |
Brodley, “Random projection for high dimensional data clustering: A cluster ensemble approach
- Fern, E
- 2003
(Show Context)
Citation Context ... 43, 47] have pioneered clustering ensembles as a new branch in the conventional taxonomy of clustering algorithms [26, 27]. Please see the Appendix for detailed review of the related work, including =-=[7, 11, 16, 19, 28, 31, 35]-=-. The problem of clustering combination can be defined generally as follows: given multiple clusterings of the data set, find a combined clustering with better quality. While the problem of clustering... |

12 |
Comparing, contrasting and combining clusters in viral gene expression data
- Kellam, Liu, et al.
- 2001
(Show Context)
Citation Context |

8 |
Inference with Missing Data
- Rubin
- 1976
(Show Context)
Citation Context ... handling missing data can be found in [20]. Though data with missing cluster labels can be obtained in different ways, we analyze only the case when components of yi are missing completely at random =-=[46]-=-. It means that the probability of a component to be missing does not depend on other observed or unobserved variables. Note, that the outcome of clustering of data subsamples (e.g., bootstrap) is dif... |

7 | Reinterpreting the category utility function
- Mirkin
- 2001
(Show Context)
Citation Context ...orrectly predicted both with the knowledge of clustering πC and without it. The category utility function can also be written as Goodman-Kruskal index for the contingency table between two partitions =-=[22, 39]-=-. The overall utility of the partition πC with respect to all the partitions in Π can be measured as the sum of pair-wise agreements: H U( π , Π ) =∑ U( π , π ) . C C i i= 1 Therefore, the best median... |

6 |
Robust clustering by evolutionary computation
- Gablentz, Koppen, et al.
- 2000
(Show Context)
Citation Context |

6 |
the Utility of Categories
- Gluck, Corter, et al.
- 1985
(Show Context)
Citation Context ...BWEB algorithm in the context of conceptual clustering [48]. COBWEB clustering criterion estimates the partition utility, which is the sum of category utility functions introduced by Gluck and Corter =-=[21]-=-. In our terms, the category utility function U(σ, πi) evaluates the quality of a 1 Here “attributes” (features) refer to the partitions of an ensemble, while the objects refer the original data point... |

6 |
Distributed Data Mining
- Park, Kargupta
- 2003
(Show Context)
Citation Context ...ual clusterings with conflicting objective functions. Fusion of clusterings using multiple 1ssources of data or features becomes increasingly important in distributed data mining, e.g., see review in =-=[41]-=-. Several recent independent studies [10, 12, 14, 15, 43, 47] have pioneered clustering ensembles as a new branch in the conventional taxonomy of clustering algorithms [26, 27]. Please see the Appendi... |

5 |
Using projections to visually cluster high-dimensional data
- Hinneburg, Keim, et al.
- 2003
(Show Context)
Citation Context ...ior information about the data domain is crucial for successful clustering, though such information can be hard to obtain, even from experts. Identification of relevant subspaces [2] or visualization =-=[24]-=- may help to establish the sample data’s conformity to the underlying distributions or, at least, to the proper number of clusters. The exploratory nature of clustering tasks demands efficient methods... |

3 |
Bagged clustering, Working Papers SFB "Adaptive Information Systems and Modeling
- Leisch
- 1999
(Show Context)
Citation Context |

2 | Consensus Clustering: A Resamlping Based Method for Class Discovery and - Monti, Tamayo, et al. - 2003 |