Results 1 -
8 of
8
Generation of Alternative Clusterings Using the CAMI Approach
"... Exploratory data analysis aims to discover and generate multiple views of the structure within a dataset. Conventional clustering techniques, however, are designed to only provide a single grouping or clustering of a dataset. In this paper, we introduce a novel algorithm called CAMI, that can uncove ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Exploratory data analysis aims to discover and generate multiple views of the structure within a dataset. Conventional clustering techniques, however, are designed to only provide a single grouping or clustering of a dataset. In this paper, we introduce a novel algorithm called CAMI, that can uncover alternative clusterings from a dataset. CAMI takes a mathematically appealing approach, combining the use of mutual information to distinguish between alternative clusterings, coupled with an expectation maximization framework to ensure clustering quality. We experimentally test CAMI on both synthetic and real-world datasets, comparing it against a variety of state-of-the-art algorithms. We demonstrate that CAMI’s performance is high and that its formulation provides a number of advantages compared to existing techniques. 1
Adaptive Cluster Ensemble Selection
"... Cluster ensembles generate a large number of different clustering solutions and combine them into a more robust and accurate consensus clustering. On forming the ensembles, the literature has suggested that higher diversity among ensemble members produces higher performance gain. In contrast, some s ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Cluster ensembles generate a large number of different clustering solutions and combine them into a more robust and accurate consensus clustering. On forming the ensembles, the literature has suggested that higher diversity among ensemble members produces higher performance gain. In contrast, some studies also indicated that medium diversity leads to the best performing ensembles. Such contradicting observations suggest that different data, with varying characteristics, may require different treatments. We empirically investigate this issue by examining the behavior of cluster ensembles on benchmark data sets. This leads to a novel framework that selects ensemble members for each data set based on its own characteristics. Our framework first generates a diverse set of solutions and combines them into a consensus partition P*. Based on the diversity between the ensemble members and P*, a subset of ensemble members is selected and combined to obtain the final output. We evaluate the proposed method on benchmark data sets and the results show that the proposed method can significantly improve the clustering performance, often by a substantial margin. In some cases, we were able to produce final solutions that significantly outperform even the best ensemble members. 1
Generalized Cluster Aggregation ∗
"... Clustering aggregation has emerged as an important extension of the classical clustering problem. It refers to the situation in which a number of different (input) clusterings have been obtained for a particular data set and it is desired to aggregate those clustering results to get a better cluster ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Clustering aggregation has emerged as an important extension of the classical clustering problem. It refers to the situation in which a number of different (input) clusterings have been obtained for a particular data set and it is desired to aggregate those clustering results to get a better clustering solution. In this paper, we propose a unified framework to solve the clustering aggregation problem, where the aggregated clustering result is obtained by minimizing the (weighted) sum of the Bregman divergence between it and all the input clusterings. Moreover, under our algorithm framework, we also propose a novel cluster aggregation problem where some must-link and cannot-link constraints are given in addition to the input clusterings. Finally the experimental results on some real world data sets are presented to show the effectiveness of our method. 1
Near-Duplicate Detection for Web-Forums
"... Unlike web search engines, current forum search engines lack the ability to identify threads with near-duplicate content and to group these threads in the search results. As a result, forum users are overloaded with duplicate search results. Furthermore, they often create additional similar postings ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Unlike web search engines, current forum search engines lack the ability to identify threads with near-duplicate content and to group these threads in the search results. As a result, forum users are overloaded with duplicate search results. Furthermore, they often create additional similar postings without reviewing existing search results. To overcome that problem we have developed a new near-duplicate detection algorithm for forum-threads. The algorithm was implemented in a large case study using a real-world forum serving more than one million users. We compared our work with current algorithms, such as [3, 4], for detecting near-duplicates on machine generated web pages. Our preliminary results show, that we significantly outperform these algorithms and that we are able to group forum threads with a precision of 74%.
Extending Consensus Clustering to Explore Multiple Clustering Views
"... Consensus clustering has emerged as an important extension of the classical clustering problem. Given a set of input clusterings of a given dataset, consensus clustering aims to find a single final clustering which is a better fit in some sense than the existing clusterings. There is a significant d ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Consensus clustering has emerged as an important extension of the classical clustering problem. Given a set of input clusterings of a given dataset, consensus clustering aims to find a single final clustering which is a better fit in some sense than the existing clusterings. There is a significant drawback in generating a single consensus clustering since different input clusterings could differ significantly. In this paper, we develop a new framework, called Multiple Consensus Clustering (MCC), to explore multiple clustering views of a given dataset from a set of input clusterings. Instead of generating a single consensus, MCC organizes the different input clusterings into a hierarchical tree structure and allows for interactive exploration of multiple clustering solutions. A dynamic programming algorithm is proposed to obtain a flat partition from the hierarchical tree using the modularity measure. Multiple consensuses are finally obtained by applying consensus clustering algorithms to each cluster of the partition. Extensive experimental results on 11 real world data sets and a case study on a Protein-Protein Interaction (PPI) data set demonstrate the effectiveness of our proposed method. 1
Hierarchical Ensemble Clustering
"... Abstract—Ensemble clustering has emerged as an important elaboration of the classical clustering problems. Ensemble clustering refers to the situation in which a number of different (input) clusterings have been obtained for a particular dataset and it is desired to find a single (consensus) cluster ..."
Abstract
- Add to MetaCart
Abstract—Ensemble clustering has emerged as an important elaboration of the classical clustering problems. Ensemble clustering refers to the situation in which a number of different (input) clusterings have been obtained for a particular dataset and it is desired to find a single (consensus) clustering which is a better fit in some sense than the existing clusterings. Many approaches have been developed to solve ensemble clustering problems over the last few years. However, most of these ensemble techniques are designed for partitional clustering methods. Few research efforts have been reported for ensemble hierarchical clustering methods. In this paper, we propose a hierarchical ensemble clustering framework which can naturally combine both partitional clustering and hierarchical clustering results. We notice the importance of ultra-metric distance for hierarchical clustering and propose a novel method for learning the ultra-metric distance from the aggregated distance matrices and generating final hierarchical clustering with enhanced cluster separation. Experimental results demonstrate the effectiveness of our proposed approaches. Keywords-Hierarchical ensemble clustering; Ultra-metric. I.
Subspace Clustering, Ensemble Clustering, Alternative Clustering, Multiview Clustering: What Can We Learn From Each Other?
"... Though subspace clustering, ensemble clustering, alternative clustering, and multiview clustering are different approaches motivated by different problems and aiming at different goals, there are similar problems in these fields. Here we shortly survey these areas from the point of view of subspace ..."
Abstract
- Add to MetaCart
Though subspace clustering, ensemble clustering, alternative clustering, and multiview clustering are different approaches motivated by different problems and aiming at different goals, there are similar problems in these fields. Here we shortly survey these areas from the point of view of subspace clustering. Based on this survey, we try to identify problems where the different research areas could probably learn from each other. 1.
Diversity-based Weighting Schemes for Clustering Ensembles
"... Clustering ensembles has been recently recognized as an emerging approach to provide more robust solutions to the data clustering problem. Current methods of clustering ensembles typically fall into instance-based, cluster-based, or hybrid approaches; however, most of such methods fail in discrimina ..."
Abstract
- Add to MetaCart
Clustering ensembles has been recently recognized as an emerging approach to provide more robust solutions to the data clustering problem. Current methods of clustering ensembles typically fall into instance-based, cluster-based, or hybrid approaches; however, most of such methods fail in discriminating among the various clusterings that participate to the ensemble. In this paper, we address the problem of weighting clustering ensembles by proposing general weighting approaches based on different implementations of the notion of diversity. We introduce three weighting schemes for clustering ensembles, called Single Weighting, Group Weighting and Dendrogram Weighting, which are independent of the particular method of clustering ensembles and designed to take into account correlations among the individual clustering solutions in different ways. We show how these schemes can be instantiated into any instance-based, cluster-based and hybrid clustering ensembles methods. Experiments have shown that the performance of the clustering ensembles algorithms increases when the proposed weighting schemes are employed. 1

