Results 1 
4 of
4
Abstract Consensus Clustering Algorithms: Comparison and Refinement
"... Consensus clustering is the problem of reconciling clustering information about the same data set coming from different sources or from different runs of the same algorithm. Cast as an optimization problem, consensus clustering is known as median partition, and has been shown to be NPcomplete. A nu ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
Consensus clustering is the problem of reconciling clustering information about the same data set coming from different sources or from different runs of the same algorithm. Cast as an optimization problem, consensus clustering is known as median partition, and has been shown to be NPcomplete. A number of heuristics have been proposed as approximate solutions, some with performance guarantees. In practice, the problem is apparently easy to approximate, but guidance is necessary as to which heuristic to use depending on the number of elements and clusterings given. We have implemented a number of heuristics for the consensus clustering problem, and here we compare their performance, independent of data size, in terms of efficacy and efficiency, on both simulated and real data sets. We find that based on the underlying algorithms and their behavior in practice the heuristics can be categorized into two distinct groups, with ramification as to which one to use in a given situation, and that a hybrid solution is the best bet in general. We have also developed a refined consensus clustering heuristic for the occasions when the given clusterings may be too disparate, and their consensus may not be representative of any one of them, and we show that in practice the refined consensus clusterings can be much superior to the general consensus clustering. 1
Average Parameterization and Partial Kernelization for Computing Medians
 PROC. 9TH LATIN
, 2010
"... We propose an effective polynomialtime preprocessing strategy for intractable median problems. Developing a new methodological framework, we show that if the input instances of generally intractable problems exhibit a sufficiently high degree of similarity between each other on average, then there ..."
Abstract

Cited by 8 (6 self)
 Add to MetaCart
We propose an effective polynomialtime preprocessing strategy for intractable median problems. Developing a new methodological framework, we show that if the input instances of generally intractable problems exhibit a sufficiently high degree of similarity between each other on average, then there are efficient exact solving algorithms. In other words, we show that the median problems Swap Median Permutation, Consensus Clustering, Kemeny Score, and Kemeny Tie Score all are fixedparameter tractable with respect to the parameter “average distance between input objects”. To this end, we develop the new concept of “partial kernelization” and identify interesting polynomialtime solvable special cases for the considered problems.
Bounding and comparing methods for correlation clustering beyond ILP
 In NAACLHLT Workshop on Integer Linear Programming for Natural Language Processing (ILPNLP 2009
, 2009
"... We evaluate several heuristic solvers for correlation clustering, the NPhard problem of partitioning a dataset given pairwise affinities between all points. We experiment on two practical tasks, document clustering and chat disentanglement, to which ILP does not scale. On these datasets, we show th ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
We evaluate several heuristic solvers for correlation clustering, the NPhard problem of partitioning a dataset given pairwise affinities between all points. We experiment on two practical tasks, document clustering and chat disentanglement, to which ILP does not scale. On these datasets, we show that the clustering objective often, but not always, correlates with external metrics, and that local search always improves over greedy solutions. We use semidefinite programming (SDP) to provide a tighter bound, showing that simple algorithms are already close to optimality. 1
On the Parameterized Complexity of Consensus Clustering ⋆
"... Abstract. Given a collection C of partitions of a base set S, the NPhard Consensus Clustering problem asks for a partition of S which has a total Mirkin distance of at most t to the partitions in C, where t is a nonnegative integer. We present a parameterized algorithm for Consensus Clustering with ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract. Given a collection C of partitions of a base set S, the NPhard Consensus Clustering problem asks for a partition of S which has a total Mirkin distance of at most t to the partitions in C, where t is a nonnegative integer. We present a parameterized algorithm for Consensus Clustering with running time O(4.24 k ·k 3 +C·S  2), where k: = t/C is the average Mirkin distance of the solution partition to the partitions of C. Furthermore, we strengthen previous hardness results for Consensus Clustering, showing that Consensus Clustering remains NPhard even when all input partitions contain at most two subsets. Finally, we study a local search variant of Consensus Clustering, showing W[1]hardness for the parameter “radius of the Mirkindistance neighborhood”. In the process, we also consider a local search variant of the related Cluster Editing problem, showing W[1]hardness for the parameter “radius of the edge modification neighborhood”. 1