Results 1  10
of
24
Clustering aggregation
 in ICDE 2005, 2005
"... We consider the following problem: given a set of clusterings, find a clustering that agrees as much as possible with the given clusterings. This problem, clustering aggregation, appears naturally in various contexts. For example, clustering categorical data is an instance of the problem: each cat ..."
Abstract

Cited by 102 (2 self)
 Add to MetaCart
(Show Context)
We consider the following problem: given a set of clusterings, find a clustering that agrees as much as possible with the given clusterings. This problem, clustering aggregation, appears naturally in various contexts. For example, clustering categorical data is an instance of the problem: each categorical variable can be viewed as a clustering of the input rows. Moreover, clustering aggregation can be used as a metaclustering method to improve the robustness of clusterings. The problem formulation does not require apriori information about the number of clusters, and it gives a natural way for handling missing values. We give a formal statement of the clusteringaggregation problem, we discuss related work, and we suggest a number of algorithms. For several of the methods we provide theoretical guarantees on the quality of the solutions. We also show how sampling can be used to scale the algorithms for large data sets. We give an extensive empirical evaluation demonstrating the usefulness of the problem and of the solutions. 1
Meta clustering
 In Proceedings IEEE International Conference on Data Mining
, 2006
"... Clustering is illdefined. Unlike supervised learning where labels lead to crisp performance criteria such as accuracy and squared error, clustering quality depends on how the clusters will be used. Devising clustering criteria that capture what users need is difficult. Most clustering algorithms se ..."
Abstract

Cited by 36 (1 self)
 Add to MetaCart
(Show Context)
Clustering is illdefined. Unlike supervised learning where labels lead to crisp performance criteria such as accuracy and squared error, clustering quality depends on how the clusters will be used. Devising clustering criteria that capture what users need is difficult. Most clustering algorithms search for optimal clusterings based on a prespecified clustering criterion. Our approach differs. We search for many alternate clusterings of the data, and then allow users to select the clustering(s) that best fit their needs. Meta clustering first finds a variety of clusterings and then clusters this diverse set of clusterings so that users must only examine a small number of qualitatively different clusterings. We present methods for automatically generating a diverse set of alternate clusterings, as well as methods for grouping clusterings into meta clusters. We evaluate meta clustering on four test problems and two case studies. Surprisingly, clusterings that would be of most interest to users often are not very compact clusterings. 1.
Weighted clustering ensembles
 In Proceedings of The 6th SIAM International Conference on Data Mining
, 2006
"... Cluster ensembles offer a solution to challenges inherent to clustering arising from its illposed nature. Cluster ensembles can provide robust and stable solutions by leveraging the consensus across multiple clustering results, while averaging out emergent spurious structures that arise due to the ..."
Abstract

Cited by 28 (7 self)
 Add to MetaCart
(Show Context)
Cluster ensembles offer a solution to challenges inherent to clustering arising from its illposed nature. Cluster ensembles can provide robust and stable solutions by leveraging the consensus across multiple clustering results, while averaging out emergent spurious structures that arise due to the various biases to which each participating algorithm is tuned. In this paper, we address the problem of combining multiple weighted clusters which belong to different subspaces of the input space. We leverage the diversity of the input clusterings in order to generate a consensus partition that is superior to the participating ones. Since we are dealing with weighted clusters, our consensus function makes use of the weight vectors associated with the clusters. The experimental results show that our ensemble technique is capable of producing a partition that is as good as or better than the best individual clustering. 1
Consensus Clusterings
"... In this paper we address the problem of combining multiple clusterings without access to the underlying features of the data. This process is known in the literature as clustering ensembles, clustering aggregation, or consensus clustering. Consensus clustering yields a stable and robust final cluste ..."
Abstract

Cited by 19 (0 self)
 Add to MetaCart
(Show Context)
In this paper we address the problem of combining multiple clusterings without access to the underlying features of the data. This process is known in the literature as clustering ensembles, clustering aggregation, or consensus clustering. Consensus clustering yields a stable and robust final clustering that is in agreement with multiple clusterings. We find that an iterative EMlike method is remarkably effective for this problem. We present three iterative algorithms for finding clustering consensus. An extensive empirical study compares our proposed algorithms with eleven other consensus clustering methods on four data sets using six different clustering performance metrics. The experimental results show that the new ensemble clustering methods produce clusterings that are as good as, and often better than, these other methods. 1.
Advancing Data Clustering via Projective Clustering Ensembles, SIGMOD
, 2011
"... Projective Clustering Ensembles (PCE) are a very recent advance in data clustering research which combines the two powerful tools of clustering ensembles and projective clustering. Specifically, PCE enables clustering ensemble methods to handle ensembles composed by projective clustering solutions. ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Projective Clustering Ensembles (PCE) are a very recent advance in data clustering research which combines the two powerful tools of clustering ensembles and projective clustering. Specifically, PCE enables clustering ensemble methods to handle ensembles composed by projective clustering solutions. PCE has been formalized as an optimization problem with either a twoobjective or a singleobjective function. Twoobjective PCE has shown to generally produce more accurate clustering results than its singleobjective counterpart, although it can handle the objectbased and featurebased cluster representations only independently of one other. Moreover, both the early formulations of PCE do not follow any of the standard approaches of clustering ensembles, namely instancebased, clusterbased, and hybrid. In this paper, we propose an alternative formulation to the PCE problem which overcomes the above issues. We investigate the drawbacks of the early formulations of PCE and define a new singleobjective formulation of the problem. This formulation is capable of treating the object and featurebased cluster representations as a whole, essentially tying them in a distance computation between a projective clustering solution and a given ensemble. We propose two clusterbased algorithms for computing approximations to the proposed PCE formulation, which have the common merit of conforming to one of the standard approaches of clustering ensembles. Experiments on benchmark datasets have shown the significance of our PCE formulation, as both the proposed heuristics outperform existing PCE methods.
Diversitybased Weighting Schemes for Clustering Ensembles
"... Clustering ensembles has been recently recognized as an emerging approach to provide more robust solutions to the data clustering problem. Current methods of clustering ensembles typically fall into instancebased, clusterbased, or hybrid approaches; however, most of such methods fail in discrimina ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Clustering ensembles has been recently recognized as an emerging approach to provide more robust solutions to the data clustering problem. Current methods of clustering ensembles typically fall into instancebased, clusterbased, or hybrid approaches; however, most of such methods fail in discriminating among the various clusterings that participate to the ensemble. In this paper, we address the problem of weighting clustering ensembles by proposing general weighting approaches based on different implementations of the notion of diversity. We introduce three weighting schemes for clustering ensembles, called Single Weighting, Group Weighting and Dendrogram Weighting, which are independent of the particular method of clustering ensembles and designed to take into account correlations among the individual clustering solutions in different ways. We show how these schemes can be instantiated into any instancebased, clusterbased and hybrid clustering ensembles methods. Experiments have shown that the performance of the clustering ensembles algorithms increases when the proposed weighting schemes are employed. 1
A method of clustering combination applied to satellite image analysis
"... An algorithm for combining results of different clusterings is presented in this paper, the objective of which is to find groups of patterns which are common to all clusterings. The idea of the proposed combination is to group those samples which are in the same cluster in most cases. We formulate t ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
An algorithm for combining results of different clusterings is presented in this paper, the objective of which is to find groups of patterns which are common to all clusterings. The idea of the proposed combination is to group those samples which are in the same cluster in most cases. We formulate the combination as the resolution of a linear set of equations with binary constraints. The advantage of such a formulation is to provide an objective function for the combination. To optimize the objective function we propose an original unsupervised algorithm. Furthermore, we propose an extension adapted in case of a huge volume of data. The combination of clusterings is performed on the results of different clustering algorithms applied to SPOT5 satellite images and shows the effectiveness of the proposed method. 1.
Seventh IEEE International Conference on Data Mining Mechanism Design for Clustering Aggregation by Selfish Systems
"... We propose a market mechanism that can be implemented on clustering aggregation problem among selfish systems, which tend to lie about their correct clustering during aggregation process. Our study is the preliminary step toward the development of robust distributed data mining among selfish systems ..."
Abstract
 Add to MetaCart
(Show Context)
We propose a market mechanism that can be implemented on clustering aggregation problem among selfish systems, which tend to lie about their correct clustering during aggregation process. Our study is the preliminary step toward the development of robust distributed data mining among selfish systems. 1.