## Meta clustering (2006)

### Cached

### Download Links

- [www.cs.cornell.edu]
- [dspace.library.cornell.edu]
- [www.cs.cornell.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | In Proceedings IEEE International Conference on Data Mining |

Citations: | 25 - 1 self |

### BibTeX

@INPROCEEDINGS{Caruana06metaclustering,

author = {Rich Caruana and Mohamed Elhawary and Nam Nguyen},

title = {Meta clustering},

booktitle = {In Proceedings IEEE International Conference on Data Mining},

year = {2006}

}

### OpenURL

### Abstract

Clustering is ill-defined. Unlike supervised learning where labels lead to crisp performance criteria such as accuracy and squared error, clustering quality depends on how the clusters will be used. Devising clustering criteria that capture what users need is difficult. Most clustering algorithms search for optimal clusterings based on a pre-specified clustering criterion. Our approach differs. We search for many alternate clusterings of the data, and then allow users to select the clustering(s) that best fit their needs. Meta clustering first finds a variety of clusterings and then clusters this diverse set of clusterings so that users must only examine a small number of qualitatively different clusterings. We present methods for automatically generating a diverse set of alternate clusterings, as well as methods for grouping clusterings into meta clusters. We evaluate meta clustering on four test problems and two case studies. Surprisingly, clusterings that would be of most interest to users often are not very compact clusterings. 1.

### Citations

4145 |
Pattern Classification and Scene Analysis
- Duda, Hart
- 1973
(Show Context)
Citation Context ...usterings are described in the remainder of Section 2.1. 2.1.1 Diverse Clusterings from K-Means Minima K-means is an iterative refinement algorithm that attempts to minimize a squared error criterion =-=[10]-=-. Each cluster is initialized by setting its mean to a random point in the data set. Each step of the iterative refinement performs two tasks. First, the data points are classified as being a member o... |

2307 |
Algorithms for Clustering Data
- Jain, Dubes
- 1988
(Show Context)
Citation Context ... level. To do this, we need a similarity measure between clusterings. 2.2.1 Measuring the Similarity Between Clusterings Several measures of clustering similarity have been proposed in the literature =-=[17, 18, 19]-=-. Here we use a measure of clustering similarity related to the Rand index [28]: define Iij as 1 if points i and j are in the same cluster in one clustering, but in different clusters in the other clu... |

1176 | On spectral clustering: analysis and an algorithm
- Ng, Jordan, et al.
- 2002
(Show Context)
Citation Context ...arison, we also apply other clustering approaches to the four data sets: hierarchical agglomerative clustering (HAC) [21], EM-based mixture model clustering [22], and two types of spectral clustering =-=[23, 1]-=-. HAC is implemented using three different linkage criteria: single (min-link), complete (max-link), and centroid (average-link) [21]. The EM-based mixture model clustering estimates Gaussian mixture ... |

709 |
UCI Repository of machine learning databases
- Newman, Hettich, et al.
- 1998
(Show Context)
Citation Context ...iables are counts in a bags-of-words model describing the web pages. The auxiliary labels are the 25 web crawls that generated the data. The Covertype data is from the UCI Machine Learning Repository =-=[26]-=-. It contains cartographic variables sampled at 30 × 30 meter grid cells in four wilderness areas in Roosevelt National Forest in northern Colorado. Data was scaled to a mean of zero and a standard de... |

565 |
Comparing partitions
- Hubert, Arabie
- 1985
(Show Context)
Citation Context ... level. To do this, we need a similarity measure between clusterings. 2.2.1 Measuring the Similarity Between Clusterings Several measures of clustering similarity have been proposed in the literature =-=[17, 18, 19]-=-. Here we use a measure of clustering similarity related to the Rand index [28]: define Iij as 1 if points i and j are in the same cluster in one clustering, but in different clusters in the other clu... |

510 |
Mixture models : inference and applications to clustering. M
- McLachlan, Basford
- 1988
(Show Context)
Citation Context ...ther Clustering Methods For a thorough comparison, we also apply other clustering approaches to the four data sets: hierarchical agglomerative clustering (HAC) [21], EM-based mixture model clustering =-=[22]-=-, and two types of spectral clustering [23, 1]. HAC is implemented using three different linkage criteria: single (min-link), complete (max-link), and centroid (average-link) [21]. The EM-based mixtur... |

491 |
Objective criteria for the evaluation of clustering methods
- Rand
- 1971
(Show Context)
Citation Context ...he Similarity Between Clusterings Several measures of clustering similarity have been proposed in the literature [17, 18, 19]. Here we use a measure of clustering similarity related to the Rand index =-=[28]-=-: define Iij as 1 if points i and j are in the same cluster in one clustering, but in different clusters in the other clustering, and Iij is 0 otherwise. The dissimilarity of two clustering models is ... |

438 |
Multidimensional Scaling
- Cox, Cox
- 2001
(Show Context)
Citation Context ...d features: weights given to one feature can be compensated by weights given to other correlated features.sThe problem with correlated features can be avoided by applying Principal Component Analysis =-=[8]-=- to the data prior to weighting. PCA rotates the data to find a new orthogonal basis in which feature values are uncorrelated. Random weights applied to the rotated features (components) yields a more... |

425 | Cluster ensembles – a knowledge reuse framework for combining multiple partitions
- Strehl, Ghosh
(Show Context)
Citation Context ...number of ensemble clustering methods improve performance by generating multiple clusterings. We mention only a few here. The cluster ensemble problem is formulated as a graph partitioning problem in =-=[30]-=- where the goal is to encode the clusterings into a graph and then partition it into K parts with the objective of minimizing the sum of the edges connecting those parts. [11] proposes a different clu... |

351 | Constrained k-means clustering with background knowledge
- Wagstaff, Cardie, et al.
- 2001
(Show Context)
Citation Context ... for their particular application. In practice, users may have only a vague idea of the desired clustering (and thus may not be able to provide the constraints necessary for semisupervised clustering =-=[9, 32]-=-). Or users may have no idea what to expect from the clustering. The auxiliary labels are meant to represent one clustering users might find useful. In no way are they intended to represent an exclusi... |

246 | Refining Initial Points for k-means Clustering
- Bradley, Fayyad
- 1998
(Show Context)
Citation Context ...ce: there is no longer a move that can reduce the squared error. The output of k-means is typically highly dependent on the initialization of the cluster means: the search space has many local minima =-=[3, 5]-=-. In practice, k-means is run many times with many different initializations, and the clustering with the smallest sum-of-squared distances between cluster means and cluster members is returned as the... |

199 |
Cluster analysis of multivariate data: efficiency vs interpretability of classifications
- Forgy
- 1965
(Show Context)
Citation Context ...ecause it works with similarity data, because it does not require the user to prespecify the number of clusters, and because the resulting hierarchy makes navigating the space of clusterings easier. (=-=[13]-=- presents one of the few studies to examine the tradeoffs between clustering complexity, efficiency, and interpretability.) An alternate approach for presenting different clusterings to users is to fi... |

173 | A random walks view of spectral segmentation
- Meilă, Shi
- 2001
(Show Context)
Citation Context ...arison, we also apply other clustering approaches to the four data sets: hierarchical agglomerative clustering (HAC) [21], EM-based mixture model clustering [22], and two types of spectral clustering =-=[23, 1]-=-. HAC is implemented using three different linkage criteria: single (min-link), complete (max-link), and centroid (average-link) [21]. The EM-based mixture model clustering estimates Gaussian mixture ... |

170 | Consensus clustering: A resampling-based method for class discovery and visualization of gene expression microarray data
- Monti, Tamayo, et al.
- 2003
(Show Context)
Citation Context ...le a variety of Zipf weighting parameters and to explore PCA space. For comparison, the scatter plots in Figure 11 show the consensus clustering found using the cluster aggregation method proposed in =-=[24]-=- (marked with a green “+” in the figures). This is the clustering that represents the consensus of all found clusterings. As expected, the consensus clustering is very compact (because less compact cl... |

160 | Agglomerative information bottleneck
- Slonim, Tishby
- 2000
(Show Context)
Citation Context ...ustering to two case studies. Surprisingly, clusterings that would be of most interest to users often are not very compact clusterings. 1. Introduction Clustering performance is difficult to evaluate =-=[29]-=-. In supervised learning, model performance is assessed by comparing model predictions to supervisory targets. In clustering we do not have targets and usually do not know a priori what groupings of t... |

107 |
A general theory of classificatory sorting strategies: I. Hierarchical systems
- Lance, Williams
- 1966
(Show Context)
Citation Context ...rner (both accurate and compact). 4.4. Other Clustering Methods For a thorough comparison, we also apply other clustering approaches to the four data sets: hierarchical agglomerative clustering (HAC) =-=[21]-=-, EM-based mixture model clustering [22], and two types of spectral clustering [23, 1]. HAC is implemented using three different linkage criteria: single (min-link), complete (max-link), and centroid ... |

103 | Fast and intuitive clustering of web documents
- Zamir, Etzioni, et al.
- 1997
(Show Context)
Citation Context ...ty of points for clustering. By weighting features before distances are calculated (i.e. multiplying feature values by particular scalars), we can control the importance of each feature to clustering =-=[33]-=-. Clustering many times with different random feature weights allows us to find qualitatively different clusterings for the data using the same clustering algorithm. Feature weighting requires a distr... |

97 | An impossibility theorem for clustering
- Kleinberg
- 2003
(Show Context)
Citation Context ...terings. No “correct” clustering exists. Moreover, theoretical work suggests that it is not possible to achieve all of the properties one might desire of clustering in a single clustering of the data =-=[20]-=-. Most clustering methodologies focus on finding optimal or near-optimal clusterings, according to specific clustering criteria. However, this approach often is misguided. When users cannot specify ap... |

88 | Convergence properties of the k-means algorithms
- Bottou, Bengio
- 1995
(Show Context)
Citation Context ...ce: there is no longer a move that can reduce the squared error. The output of k-means is typically highly dependent on the initialization of the cluster means: the search space has many local minima =-=[3, 5]-=-. In practice, k-means is run many times with many different initializations, and the clustering with the smallest sum-of-squared distances between cluster means and cluster members is returned as the... |

76 | Clustering aggregation
- Gionis, Mannila, et al.
- 2005
(Show Context)
Citation Context ...oes not belong to the cluster, thus capturing the similarity between instances and the similarity between clusters when producing the final clustering. Another cluster ensemble method was proposed by =-=[16]-=- where the objective of the final clustering is to minimize the disagreement between all the clusterings and the final clustering. This final clustering is the one that agrees with most of the cluster... |

75 | Solving cluster ensemble problems by bipartite graph partitioning
- Fern, Brodley
- 2004
(Show Context)
Citation Context ... partitioning problem in [30] where the goal is to encode the clusterings into a graph and then partition it into K parts with the objective of minimizing the sum of the edges connecting those parts. =-=[11]-=- proposes a different cluster ensemble method where both clusters and instances are modeled as vertices in a bipartite graph. Edges connect instances with clusters with a weight of zero or one dependi... |

70 | Semi-supervised clustering using genetic algorithms
- Demiriz, Bennett, et al.
- 1999
(Show Context)
Citation Context ... for their particular application. In practice, users may have only a vague idea of the desired clustering (and thus may not be able to provide the constraints necessary for semisupervised clustering =-=[9, 32]-=-). Or users may have no idea what to expect from the clustering. The auxiliary labels are meant to represent one clustering users might find useful. In no way are they intended to represent an exclusi... |

65 | A mixture model for clustering ensembles
- Topchy, Jain, et al.
(Show Context)
Citation Context ...tive clustering to produce the final clustering after generating a similarity matrix from many base-level clusterings. Cluster aggregation was formulated as a maximum likelihood estimation problem in =-=[31]-=- where the ensemble is modeled as a new set of features that describe the instances and the final clustering is produced by applying K-means while solving an EM problem. Linear programming was used in... |

25 |
Entropy-based gene ranking without selection bias for the predictive classication of microarray data
- Furlanello, Serani, et al.
- 2003
(Show Context)
Citation Context ...ion can weight only a few variables highly. We will use a Zipf power law distribution because there is empirical evidence that feature importance is Zipfdistributed in a number of real-world problems =-=[7, 14]-=-. A Zipf distribution describes a range of integer values from 1 to some maximum value K. The frequency of each integer is proportional to 1 iα where i is the integer value and α is a shape parameter.... |

20 | Combining multiple clustering systems
- Boulis, Ostendorf
- 2004
(Show Context)
Citation Context ...where the ensemble is modeled as a new set of features that describe the instances and the final clustering is produced by applying K-means while solving an EM problem. Linear programming was used in =-=[4]-=- to find the relation between the clusters in the different clusterings and the clusters of the final clustering. Simulated annealing and local search was used in [12] to find the final clustering. Th... |

13 |
Comparing, contrasting and combining clusters in viral gene expression data
- Kellam, Liu, et al.
- 2001
(Show Context)
Citation Context ... level. To do this, we need a similarity measure between clusterings. 2.2.1 Measuring the Similarity Between Clusterings Several measures of clustering similarity have been proposed in the literature =-=[17, 18, 19]-=-. Here we use a measure of clustering similarity related to the Rand index [28]: define Iij as 1 if points i and j are in the same cluster in one clustering, but in different clusters in the other clu... |

4 | A new approach to data driven clustering
- Azran, Ghahramani
- 2006
(Show Context)
Citation Context ...terpret if viewed in color. See [6]) tering formed from the clusterings in the branch. The large dots in the scatter plots represent consensus clusterings that a user might select. 1 7. Related Works =-=[2]-=- presents a very different algorithm for finding alternate clusterings of the data. In this approach a probability matrix that defines the likelihood of jumping from one point to another is used to ge... |

3 |
Integrating microarray data by concensus clustering
- Filkov, Skiena
- 2003
(Show Context)
Citation Context ...m. Linear programming was used in [4] to find the relation between the clusters in the different clusterings and the clusters of the final clustering. Simulated annealing and local search was used in =-=[12]-=- to find the final clustering. The main difference between these ensemble methods and meta clustering is that most ensemble methods combine the clusterings they find into a one final clustering becaus... |

3 | Determining the number of groups from measures of cluster stability
- Mufti, Bertrand, et al.
- 2005
(Show Context)
Citation Context ...ter (if label l appeared in cluster i more often than any other label, then max(Ci|Li) is the number of points in Ci with the label l). Determining the number of clusters in a data set is challenging =-=[25]-=-. Indeed, the “correct” number of clusters depends on how the clustering will be used. For simplicity, we make the unrealistic assumption that the desired number of clusters is predefined. It is easy ... |

2 |
A feature selection method based on the shapley value
- Cohen, Dror, et al.
- 2005
(Show Context)
Citation Context ...ion can weight only a few variables highly. We will use a Zipf power law distribution because there is empirical evidence that feature importance is Zipfdistributed in a number of real-world problems =-=[7, 14]-=-. A Zipf distribution describes a range of integer values from 1 to some maximum value K. The frequency of each integer is proportional to 1 iα where i is the integer value and α is a shape parameter.... |

1 |
Statistical and measurement tools
- Christensen
(Show Context)
Citation Context ...ases, the distribution becomes more biased toward smaller numbers, with only the occasional value approaching K. See Figure 1. Random values from a Zipf distribution can be generated in the manner of =-=[6]-=-. Algorithm 1: Generate a diverse set of clusterings Input: X = {x1, x2, ..., xn} for xi ∈ R d , k is the number of clusters, m is the number of clusterings to be generated Output: A set of m alternat... |

1 |
Wild world terrestrial ecoregions
- Geographic
(Show Context)
Citation Context ...ll, and soil moisture. Each variable was scaled to have a mean of zero and a mean absolute deviation of one. The auxiliary labels are based on the “Terrestrial Ecoregions of the World” available from =-=[15]-=-. The Bergmark data was collected using 25 focused web crawls, each with different keywords. The variables are counts in a bags-of-words model describing the web pages. The auxiliary labels are the 25... |

1 |
of Hexacorals. Environmental database
- unknown authors
(Show Context)
Citation Context ..., while acknowledging many applications with other good clusterings exist. The Australia Coastal data is a subset of the data available from the Biogeoinformatics of Hexacorals environmental database =-=[27]-=-. The data contain measurements from the Australia coastline at every half-degree of longitude and latitude. The features describe environmental properties of each grid cell such as temperature, salin... |