Results 1  10
of
22
Multiple NonRedundant Spectral Clustering Views
"... in several different ways for different purposes. For example, images of faces of people can be grouped based Many clustering algorithms only find one on their pose or identity. Web pages collected from clustering solution. However, data can ofuniversities can be clustered based on the type of webt ..."
Abstract

Cited by 19 (4 self)
 Add to MetaCart
in several different ways for different purposes. For example, images of faces of people can be grouped based Many clustering algorithms only find one on their pose or identity. Web pages collected from clustering solution. However, data can ofuniversities can be clustered based on the type of webten be grouped and interpreted in many difpage’s owner, {faculty, student, staff}, field, {physics, ferent ways. This is particularly true in math, engineering, computer science}, or identity of the highdimensional setting where differthe university. In some cases, a data analyst wishes ent subspaces reveal different possible groupto find a single clustering, but this may require an alings of the data. Instead of committing gorithm to consider multiple clusterings and discard to one clustering solution, here we introthose that are not of interest. In other cases, one may duce a novel method that can provide sevwish to summarize and organize the data according to eral nonredundant clustering solutions to multiple possible clustering views. In either case, it is the user. Our approach simultaneously learns important to find multiple clustering solutions which nonredundant subspaces that provide multiare nonredundant. ple views and finds a clustering solution in each view. We achieve this by augmenting a spectral clustering objective function to incorporate dimensionality reduction and multiple views and to penalize for redundancy between the views. 1.
Consensus Clusterings
"... In this paper we address the problem of combining multiple clusterings without access to the underlying features of the data. This process is known in the literature as clustering ensembles, clustering aggregation, or consensus clustering. Consensus clustering yields a stable and robust final cluste ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
In this paper we address the problem of combining multiple clusterings without access to the underlying features of the data. This process is known in the literature as clustering ensembles, clustering aggregation, or consensus clustering. Consensus clustering yields a stable and robust final clustering that is in agreement with multiple clusterings. We find that an iterative EMlike method is remarkably effective for this problem. We present three iterative algorithms for finding clustering consensus. An extensive empirical study compares our proposed algorithms with eleven other consensus clustering methods on four data sets using six different clustering performance metrics. The experimental results show that the new ensemble clustering methods produce clusterings that are as good as, and often better than, these other methods. 1.
Mining Clustering Dimensions
"... Many realworld datasets can be clustered alongmultiple dimensions. Forexample, text documentscanbeclusterednotonlybytopic, but also by the author’s gender or sentiment. Unfortunately, traditional clustering algorithms produce only a single clustering of a dataset, effectively providing a user with ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
Many realworld datasets can be clustered alongmultiple dimensions. Forexample, text documentscanbeclusterednotonlybytopic, but also by the author’s gender or sentiment. Unfortunately, traditional clustering algorithms produce only a single clustering of a dataset, effectively providing a user with just a single view of the data. In this paper, we propose a new clustering algorithm that can discover in an unsupervised manner each clustering dimension along which a dataset can be meaningfully clustered. Its ability to revealthe important clustering dimensions of a dataset in an unsupervised manner is particularly appealing for those users who have no idea of how a dataset can possibly be clustered. Wedemonstrateitsviabilityonseveral challenging text classification tasks. 1.
Variable Selection in ModelBased Clustering: To Do or To Facilitate
"... Variable selection for cluster analysis is a difficult problem. The difficulty originates not only from the lack of class information but also the fact that highdimensional data are often multifaceted and can be meaningfully clustered in multiple ways. In such a case the effort to find one subset o ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
Variable selection for cluster analysis is a difficult problem. The difficulty originates not only from the lack of class information but also the fact that highdimensional data are often multifaceted and can be meaningfully clustered in multiple ways. In such a case the effort to find one subset of attributes that presumably gives the “best ” clustering may be misguided. It makes more sense to facilitate variable selection by domain experts, that is, to systematically identify various facets of a data set (each being based on a subset of attributes), cluster the data along each one, and present the results to the domain experts for appraisal and selection. In this paper, we propose a generalization of the Gaussian mixture model, show its ability to cluster data along multiple facets, and demonstrate it is often more reasonable to facilitate variable selection than to perform it. 1.
Variational Inference for Nonparametric Multiple Clustering
"... Most clustering algorithms produce a single clustering solution. Similarly, feature selection for clustering tries to find one feature subset where one interesting clustering solution resides. However, a single data set may be multifaceted and can be grouped and interpreted in many different ways, ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
Most clustering algorithms produce a single clustering solution. Similarly, feature selection for clustering tries to find one feature subset where one interesting clustering solution resides. However, a single data set may be multifaceted and can be grouped and interpreted in many different ways, especially for high dimensional data, where feature selection is typically needed. Moreover, different clustering solutions are interesting for different purposes. Instead of committing to one clustering solution, in this paper we introduce a probabilistic nonparametric Bayesian model that can discover several possible clustering solutions and the feature subset views that generated each cluster partitioning simultaneously. We provide a variational inference approach to learn the features and clustering partitions in each view. Our model allows us not only to learn the multiple clusterings and views but also allows us to automatically learn the number of views and the number of clusters in each view. Keywords multiple clustering, nonredundant/disparate clustering, feature selection, nonparametric Bayes, variational inference 1.
Subspace Clustering, Ensemble Clustering, Alternative Clustering, Multiview Clustering: What Can We Learn From Each Other?
"... Though subspace clustering, ensemble clustering, alternative clustering, and multiview clustering are different approaches motivated by different problems and aiming at different goals, there are similar problems in these fields. Here we shortly survey these areas from the point of view of subspace ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
Though subspace clustering, ensemble clustering, alternative clustering, and multiview clustering are different approaches motivated by different problems and aiming at different goals, there are similar problems in these fields. Here we shortly survey these areas from the point of view of subspace clustering. Based on this survey, we try to identify problems where the different research areas could probably learn from each other. 1.
Consensus clustering + meta clustering = multiple consensus clustering
 Florida Artificial Intelligence Research Society Conference
, 2011
"... Consensus clustering and meta clustering are two important extensions of the classical clustering problem. Given a set of input clusterings of a given dataset, consensus clustering aims to find a single final clustering which is a better fit in some sense than the existing clusterings, and meta clus ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Consensus clustering and meta clustering are two important extensions of the classical clustering problem. Given a set of input clusterings of a given dataset, consensus clustering aims to find a single final clustering which is a better fit in some sense than the existing clusterings, and meta clustering aims to group similar input clusterings together so that users only need to examine a small number of different clusterings. In this paper, we present a new approach, MCC (stands for multiple consensus clustering), to explore multiple clustering views of a given dataset from the input clusterings by combining consensus clustering and meta clustering. In particular, given a set of input clusterings of a particular data set, MCC employs meta clustering to cluster the input clusterings and then uses consensus clustering to generate a consensus for each cluster of the input clusterings. Extensive experimental results on 11 real world data sets demonstrate the effectiveness of our proposed method. 1
A nonparametric bayesian model for multiple clustering with overlapping feature views
 Journal of Machine Learning Research
, 2012
"... Most clustering algorithms produce a single clustering solution. This is inadequate for many data sets that are multifaceted and can be grouped and interpreted in many different ways. Moreover, for highdimensional data, different features may be relevant or irrelevant to each clustering solution, ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Most clustering algorithms produce a single clustering solution. This is inadequate for many data sets that are multifaceted and can be grouped and interpreted in many different ways. Moreover, for highdimensional data, different features may be relevant or irrelevant to each clustering solution, suggesting the need for feature selection in clustering. Features relevant to one clustering interpretation may be different from the ones relevant for an alternative interpretation or view of the data. In this paper, we introduce a probabilistic nonparametric Bayesian model that can discover multiple clustering solutions from data and the feature subsets that are relevant for the clusters in each view. In our model, the features in different views may be shared and therefore the sets of relevant features are allowed to overlap. We model feature relevance to each view using an Indian Buffet Process and the cluster membership in each view using a Chinese Restaurant Process. We provide an inference approach to learn the latent parameters corresponding to this multiple partitioning problem. Our model not only learns the features and clusters in each view but also automatically learns the number of clusters, number of views and number of features in each view. 1
Which Clustering Do You Want? Inducing Your Ideal Clustering with Minimal Feedback
"... While traditional research on text clustering has largely focused on grouping documents by topic, it is conceivable that a user may want to cluster documents along other dimensions, such as the author’s mood, gender, age, or sentiment. Without knowing the user’s intention, a clustering algorithm wil ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
While traditional research on text clustering has largely focused on grouping documents by topic, it is conceivable that a user may want to cluster documents along other dimensions, such as the author’s mood, gender, age, or sentiment. Without knowing the user’s intention, a clustering algorithm will only group documents along the most prominent dimension, which may not be the one the user desires. To address the problem of clustering documents along the userdesired dimension, previous work has focused on learning a similarity metric from data manually annotated with the user’s intention or having a human construct a feature space in an interactive manner during the clustering process. With the goal of reducing reliance on human knowledge for finetuning the similarity function or selecting the relevant features required by these approaches, we propose a novel active clustering algorithm, which allows a user to easily select the dimension along which she wants to cluster the documents by inspecting only a small number of words. We demonstrate the viability of our algorithm on a variety of commonlyused sentiment datasets. 1.
A Polygonbased Methodology for Mining Related Spatial Datasets
"... Polygons can serve an important role in the analysis of georeferenced data as they provide a natural representation for particular types of spatial objects and in that they can be used as models for spatial clusters. This paper claims that polygon analysis is particularly useful for mining related, ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Polygons can serve an important role in the analysis of georeferenced data as they provide a natural representation for particular types of spatial objects and in that they can be used as models for spatial clusters. This paper claims that polygon analysis is particularly useful for mining related, spatial datasets. A novel methodology for clustering polygons that have been extracted from different spatial datasets is proposed which consists of a meta clustering module that clusters polygons and a summary generation module that creates a final clustering from a polygonal meta clustering based on user preferences. Moreover, a densitybased polygon clustering algorithm is introduced. Our methodology is evaluated in a realworld case study involving ozone pollution in Texas; it was able to reveal interesting relationships between different ozone hotspots and interesting associations between ozone hotspots and other meteorological variables. Keywords spatial data mining, polygon clustering algorithms, mining related