## Solving Cluster Ensemble Problems by Bipartite Graph Partitioning (2004)

### Cached

### Download Links

- [www.aicml.cs.ualberta.ca]
- [kingman.cs.ualberta.ca]
- [www.machinelearning.org]
- [web.engr.oregonstate.edu]
- [www.cs.tufts.edu]
- [web.engr.orst.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | IN PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON MACHINE LEARNING |

Citations: | 67 - 3 self |

### BibTeX

@INPROCEEDINGS{Fern04solvingcluster,

author = {Xiaoli Zhang Fern and Carla E. Brodley},

title = {Solving Cluster Ensemble Problems by Bipartite Graph Partitioning},

booktitle = {IN PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON MACHINE LEARNING},

year = {2004},

publisher = {}

}

### Years of Citing Articles

### OpenURL

### Abstract

A critical problem in cluster ensemble research is how to combine multiple clusterings to yield a final superior clustering result. Leveraging advanced graph partitioning techniques, we solve this problem by reducing it to a graph partitioning problem. We introduce a new reduction method that constructs a bipartite graph from a given cluster ensemble. The resulting graph models both instances and clusters of the ensemble simultaneously as vertices in the graph. Our approach retains all of the information provided by a given ensemble, allowing the similarity among instances and the similarity among clusters to be considered collectively in forming the final clustering. Further, the resulting graph partitioning problem can be solved efficiently. We empirically evaluate the proposed approach against two commonly used graph formulations and show that it is more robust and achieves comparable or better performance in comparison to its competitors.

### Citations

8563 |
Elements of Information Theory
- Cover, Thomas
- 1991
(Show Context)
Citation Context ...and arbitrarily selected the values for the other data sets.s(NMI) criterion (Strehl & Ghosh, 2002). Treating cluster labels and class labels as random variables, NMI measures the mutual information (=-=Cover & Thomas, 1991-=-) shared by the two random variables and normalizes it to a [0, 1] range. Note that the expected NMI value of a random partition of the data is 0 and the optimal value 1 is attained when the class lab... |

2868 |
P.: UCI Repository of Machine Learning Databases
- Merz, Merphy
- 1996
(Show Context)
Citation Context ...y lung image data set with eight classes (Dy et al., 1999). MODIS and EOS are land cover data sets described by different feature sets. ISOLET6 and GLASS are from the UCI machine learning repository (=-=Blake & Merz, 1998-=-), where ISOLET6 is a subset of the ISOLET spoken letter recognition training set. In particular, ISOLET6 contains the instances of six classes (letters) randomly selected out of twenty six classes (l... |

2590 | Normalized cuts and image segmentation
- Shi, Malik
- 1997
(Show Context)
Citation Context ...ems using graph partitioning techniques for two reasons. First, graph partitioning is a well studied area and algorithms such as spectral clustering have been successful in a variety of applications (=-=Shi & Malik, 2000-=-; Dhillon, 2001). Second, cluster ensembles provide a natural way to define similarity measures for computing the weight of the edges in a graph, which is an important and sometimes hard to satisfy pr... |

1097 | On spectral clustering: Analysis and an algorithm - Ng, Jordan, et al. - 2001 |

794 | A fast and high quality multilevel scheme for partitioning irregular graphs
- Karypis, Kumar
- 1998
(Show Context)
Citation Context ...uctured, possibly due to the lack of a correspondence structure in the clusters, K-means can still perform reasonably well using the instance vertices. 6.2.2. Multilevel Graph Partition: Metis Metis (=-=Karypis & Kumar, 1998-=-), a multilevel graph partitioning system, approaches the graph partitioning problem from a different angle. It partitions a graph using three basic steps: (1) coarsen the graph by collapsing vertices... |

396 | Cluster ensembles — a knowledge reuse framework for combining multiple partitions
- Strehl, Ghosh
(Show Context)
Citation Context ...ated for decades in the statistics, data mining, and machine learning communities. A recent advance of clustering techniques is the development of cluster ensemble or consensus clustering techniques (=-=Strehl & Ghosh, 2002-=-; Fern & Brodley, 2003; Monti et al., 2003; Topchy et al., 2003), which seek to improve clustering performance by first generating multiple partitions of a given data set and then combining them to fo... |

315 | Co-clustering documents and words using bipartite spectral graph partitioning
- Dhillon
- 2001
(Show Context)
Citation Context ...titioning techniques for two reasons. First, graph partitioning is a well studied area and algorithms such as spectral clustering have been successful in a variety of applications (Shi & Malik, 2000; =-=Dhillon, 2001-=-). Second, cluster ensembles provide a natural way to define similarity measures for computing the weight of the edges in a graph, which is an important and sometimes hard to satisfy prerequisite for ... |

250 | Information-theoretic coclustering - Dhillon, Mallela, et al. |

211 |
Fast spectral methods for ratio cut partitioning and clustering
- Hagen, Kahng
- 1991
(Show Context)
Citation Context ...e, various graph partitioning algorithms define different optimization criteria based on the above goal. Examples include the normalized cut criterion (Shi & Malik, 2000) and the ratio cut criterion (=-=Hagen & Kahng, 1992-=-). See (Fjallstrom, 1998) for an in-depth discussion. Here we defer the discussion of our choice of graph partitioning algorithm to Section 6.2. Given the basics of cluster ensembles and graph partiti... |

157 | Consensus Clustering: A Resampling-Based Method for Class Discovery and
- Monti, Tamayo, et al.
- 2003
(Show Context)
Citation Context ...ing, and machine learning communities. A recent advance of clustering techniques is the development of cluster ensemble or consensus clustering techniques (Strehl & Ghosh, 2002; Fern & Brodley, 2003; =-=Monti et al., 2003-=-; Topchy et al., 2003), which seek to improve clustering performance by first generating multiple partitions of a given data set and then combining them to form a final (presumably superior) clusterin... |

150 | Document clustering using word clusters via the information bottleneck method
- Slonim, Tishby
(Show Context)
Citation Context ...stering instances according to the clusters can be related to clustering documents based on the keywords. Naturally, we can relate CBGF to document clustering approaches based on clustering keywords (=-=Slonim & Tishby, 2000-=-) and HBGF to the bipartite co-clustering approach proposed by Dhillon (2001). This connection raises an interesting question. Can we borrow ideas from document clustering, a highly advanced area, to ... |

101 | Bagging to improve the accuracy of a clustering procedure
- Dudoit, Fridlyand
(Show Context)
Citation Context ...4) propose to represent a cluster ensemble as a new set of features describing the instances and produce final clusters by applying Kmeans and EM to the new features. See Dimitriadou et al., 2001 and =-=Dudoit & Fridlyand, 2003-=- for other representative technqiues for combining clusterings. While performing a thorough comparison of all available techniques is beyond the scope of this paper, in recent studies (Strehl & Ghosh,... |

97 | Random projection for high dimensional data clustering: A cluster ensemble approach
- Fern, Brodley
- 2003
(Show Context)
Citation Context ...e statistics, data mining, and machine learning communities. A recent advance of clustering techniques is the development of cluster ensemble or consensus clustering techniques (Strehl & Ghosh, 2002; =-=Fern & Brodley, 2003-=-; Monti et al., 2003; Topchy et al., 2003), which seek to improve clustering performance by first generating multiple partitions of a given data set and then combining them to form a final (presumably... |

91 | Learning Spectral Clustering
- Bach, Jordan
- 2003
(Show Context)
Citation Context ... way to define similarity measures for computing the weight of the edges in a graph, which is an important and sometimes hard to satisfy prerequisite for the success of graph partitioning techniques (=-=Bach & Jordan, 2004-=-). Previously, Strehl and Ghosh (2002) proposed two approaches to formulating graph partitioning problems for cluster ensembles. The first formulation is an instance-based approach that models instanc... |

33 | Algorithms for graph partitioning: A survey. Linkoping - Fjallstrom - 1998 |

16 | Voting-Merging: an ensemble method for clustering
- Dimitriadou, Weingessel, et al.
- 2001
(Show Context)
Citation Context ...ntly Topchy et al. (2003; 2004) propose to represent a cluster ensemble as a new set of features describing the instances and produce final clusters by applying Kmeans and EM to the new features. See =-=Dimitriadou et al., 2001-=- and Dudoit & Fridlyand, 2003 for other representative technqiues for combining clusterings. While performing a thorough comparison of all available techniques is beyond the scope of this paper, in re... |

2 |
Data clustering using evidence accumulation, ICPR
- Fred, Jain
- 2002
(Show Context)
Citation Context ...generate the ensemble. 2.2. Related Work on Combining Clusterings While our paper focuses on combining clusterings by graph partitioning, other alternative approaches exist. A commonly used approach (=-=Fred & Jain, 2002-=-; Fern & Brodley, 2003; Monti et al., 2003) combines the clusterings by first generating a similarity matrix for instances and then applying agglomerative clustering algorithms to produce a final clus... |

1 |
The customized-queries approach to CBIR using
- Dy, Brodley, et al.
- 1999
(Show Context)
Citation Context ...n our experiments. The characteristics of the data sets are summarized in Table 1 with related parameter choices. HRCT is a high resolution computed tomography lung image data set with eight classes (=-=Dy et al., 1999-=-). MODIS and EOS are land cover data sets described by different feature sets. ISOLET6 and GLASS are from the UCI machine learning repository (Blake & Merz, 1998), where ISOLET6 is a subset of the ISO... |

1 |
Combining multiple weak clusterings. ICDM
- Topchy, Jain, et al.
- 2003
(Show Context)
Citation Context ...rning communities. A recent advance of clustering techniques is the development of cluster ensemble or consensus clustering techniques (Strehl & Ghosh, 2002; Fern & Brodley, 2003; Monti et al., 2003; =-=Topchy et al., 2003-=-), which seek to improve clustering performance by first generating multiple partitions of a given data set and then combining them to form a final (presumably superior) clustering solution. Such tech... |