Results 1  10
of
27
Simultaneous Unsupervised Learning of Disparate Clusterings
"... Most clustering algorithms produce a single clustering for a given data set even when the data can be clustered naturally in multiple ways. In this paper, we address the difficult problem of uncovering disparate clusterings from the data in a totally unsupervised manner. We propose two new approache ..."
Abstract

Cited by 23 (0 self)
 Add to MetaCart
Most clustering algorithms produce a single clustering for a given data set even when the data can be clustered naturally in multiple ways. In this paper, we address the difficult problem of uncovering disparate clusterings from the data in a totally unsupervised manner. We propose two new approaches for this problem. In the first approach we aim to find good clusterings of the data that are also decorrelated with one another. To this end, we give a new and tractable characterization of decorrelation between clusterings, and present an objective function to capture it. We provide an iterative “decorrelated” kmeans type algorithm to minimize this objective function. In the second approach, we model the data as a sum of mixtures and associate each mixture with a clustering. This approach leads us to the problem of learning a convolution of mixture distributions. Though the latter problem can be formulated as one of factorial learning [8, 13, 16], the existing formulations and methods do not perform well on many real highdimensional data sets. We propose a new regularized factorial learning framework that is more suitable for capturing the notion of disparate clusterings in modern, highdimensional data sets. The resulting algorithm does well in uncovering multiple clusterings, and is much improved over existing methods. We evaluate our methods on two realworld data sets a music data set from the text mining domain, and a portrait data set from the computer vision domain. Our methods achieve a substantially higher accuracy than existing factorial learning as well as traditional clustering algorithms.
Finding alternative clusterings using constraints
 In Proceedings of the 8th IEEE international conference on data mining (ICDM
, 2008
"... The aim of data mining is to find novel and actionable insights. However, most algorithms typically just find a single explanation of the data even though alternatives could exist. In this work, we explore a general purpose approach to find an alternative clustering of the data with the aid of mustl ..."
Abstract

Cited by 19 (1 self)
 Add to MetaCart
The aim of data mining is to find novel and actionable insights. However, most algorithms typically just find a single explanation of the data even though alternatives could exist. In this work, we explore a general purpose approach to find an alternative clustering of the data with the aid of mustlink and cannotlink constraints. This problem has received little attention in the literature and since our approach can be incorporated into many clustering algorithm that uses a distance function, compares favorably with existing work. 1.
Multiple NonRedundant Spectral Clustering Views
"... in several different ways for different purposes. For example, images of faces of people can be grouped based Many clustering algorithms only find one on their pose or identity. Web pages collected from clustering solution. However, data can ofuniversities can be clustered based on the type of webt ..."
Abstract

Cited by 19 (4 self)
 Add to MetaCart
in several different ways for different purposes. For example, images of faces of people can be grouped based Many clustering algorithms only find one on their pose or identity. Web pages collected from clustering solution. However, data can ofuniversities can be clustered based on the type of webten be grouped and interpreted in many difpage’s owner, {faculty, student, staff}, field, {physics, ferent ways. This is particularly true in math, engineering, computer science}, or identity of the highdimensional setting where differthe university. In some cases, a data analyst wishes ent subspaces reveal different possible groupto find a single clustering, but this may require an alings of the data. Instead of committing gorithm to consider multiple clusterings and discard to one clustering solution, here we introthose that are not of interest. In other cases, one may duce a novel method that can provide sevwish to summarize and organize the data according to eral nonredundant clustering solutions to multiple possible clustering views. In either case, it is the user. Our approach simultaneously learns important to find multiple clustering solutions which nonredundant subspaces that provide multiare nonredundant. ple views and finds a clustering solution in each view. We achieve this by augmenting a spectral clustering objective function to incorporate dimensionality reduction and multiple views and to penalize for redundancy between the views. 1.
Generation of Alternative Clusterings Using the CAMI Approach
"... Exploratory data analysis aims to discover and generate multiple views of the structure within a dataset. Conventional clustering techniques, however, are designed to only provide a single grouping or clustering of a dataset. In this paper, we introduce a novel algorithm called CAMI, that can uncove ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
Exploratory data analysis aims to discover and generate multiple views of the structure within a dataset. Conventional clustering techniques, however, are designed to only provide a single grouping or clustering of a dataset. In this paper, we introduce a novel algorithm called CAMI, that can uncover alternative clusterings from a dataset. CAMI takes a mathematically appealing approach, combining the use of mutual information to distinguish between alternative clusterings, coupled with an expectation maximization framework to ensure clustering quality. We experimentally test CAMI on both synthetic and realworld datasets, comparing it against a variety of stateoftheart algorithms. We demonstrate that CAMI’s performance is high and that its formulation provides a number of advantages compared to existing techniques. 1
Unifying Dependent Clustering and Disparate Clustering for Nonhomogeneous Data
"... Modern data mining settings involve a combination of attributevalued descriptors over entities as well as specified relationships between these entities. We present an approach to cluster such nonhomogeneous datasets by using the relationships to impose either dependent clustering or disparate clus ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
Modern data mining settings involve a combination of attributevalued descriptors over entities as well as specified relationships between these entities. We present an approach to cluster such nonhomogeneous datasets by using the relationships to impose either dependent clustering or disparate clustering constraints. Unlike prior work that views constraints as boolean criteria, we present a formulation that allows constraints to be satisfied or violated in a smooth manner. This enables us to achieve dependent clustering and disparate clustering using the same optimization framework by merely maximizing versus minimizing the objective function. We present results on both synthetic data as well as several realworld datasets.
ASCLU: Alternative subspace clustering
 In MultiClust at KDD
, 2010
"... Finding groups of similar objects in databases is one of the most important data mining tasks. Recently, traditional clustering approaches have been extended to generate alternative clustering solutions. The basic observation is that for each database object multiple meaningful groupings might exist ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
Finding groups of similar objects in databases is one of the most important data mining tasks. Recently, traditional clustering approaches have been extended to generate alternative clustering solutions. The basic observation is that for each database object multiple meaningful groupings might exist: the data allows to be clustered through different perspectives. It is thus reasonable to search for deviating clusters compared to a given clustering result, that the user is not satisfied with. The existing methods focus on full space clustering. However, for today’s applications, where many attributes per object are recorded, traditional clustering is known to generate no meaningful results. Instead, the analysis of subspace projections of the data with subspace or projected clustering techniques is more suitable. In this paper, we develop the first method that detects alternative subspace clusters based on an already known subspace clustering. Considering subspace projections, we can identify alternative clusters also based on deviating dimension sets besides just deviating object sets. Thus, we realize different views on the data by using different attributes. Besides the challenge of detecting alternative subspace clusters our model avoids redundant clusters in the overall result, i.e. the generated clusters are dissimilar among each other. In experiments we analyze the effectiveness of our model and show that meaningful alternative subspace clustering solutions are generated. 1.
Variational Inference for Nonparametric Multiple Clustering
"... Most clustering algorithms produce a single clustering solution. Similarly, feature selection for clustering tries to find one feature subset where one interesting clustering solution resides. However, a single data set may be multifaceted and can be grouped and interpreted in many different ways, ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
Most clustering algorithms produce a single clustering solution. Similarly, feature selection for clustering tries to find one feature subset where one interesting clustering solution resides. However, a single data set may be multifaceted and can be grouped and interpreted in many different ways, especially for high dimensional data, where feature selection is typically needed. Moreover, different clustering solutions are interesting for different purposes. Instead of committing to one clustering solution, in this paper we introduce a probabilistic nonparametric Bayesian model that can discover several possible clustering solutions and the feature subset views that generated each cluster partitioning simultaneously. We provide a variational inference approach to learn the features and clustering partitions in each view. Our model allows us not only to learn the multiple clusterings and views but also allows us to automatically learn the number of views and the number of clusters in each view. Keywords multiple clustering, nonredundant/disparate clustering, feature selection, nonparametric Bayes, variational inference 1.
Subspace Clustering, Ensemble Clustering, Alternative Clustering, Multiview Clustering: What Can We Learn From Each Other?
"... Though subspace clustering, ensemble clustering, alternative clustering, and multiview clustering are different approaches motivated by different problems and aiming at different goals, there are similar problems in these fields. Here we shortly survey these areas from the point of view of subspace ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
Though subspace clustering, ensemble clustering, alternative clustering, and multiview clustering are different approaches motivated by different problems and aiming at different goals, there are similar problems in these fields. Here we shortly survey these areas from the point of view of subspace clustering. Based on this survey, we try to identify problems where the different research areas could probably learn from each other. 1.
Avoiding Bias in Text Clustering Using Constrained Kmeans and MayNotLinks
"... Abstract. In this paper we present a new clustering algorithm which extends the traditional batch kmeans enabling the introduction of domain knowledge in the form of Must, Cannot, May and MayNot rules between the data points. Besides, we have applied the presented method to the task of avoiding bi ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Abstract. In this paper we present a new clustering algorithm which extends the traditional batch kmeans enabling the introduction of domain knowledge in the form of Must, Cannot, May and MayNot rules between the data points. Besides, we have applied the presented method to the task of avoiding bias in clustering. Evaluation carried out in standard collections showed considerable improvements in effectiveness against previous constrained and nonconstrained algorithms for the given task.
Uncovering Many Views of Biological Networks Using Ensembles of NearOptimal Partitions
 1ST INTL WORKSHOP ON DISCOVERING, SUMMARIZING, AND USING MULTIPLE CLUSTERINGS, KDD
, 2010
"... Densely interacting regions of biological networks often correspond to functional modules such as protein complexes. Most algorithms proposed to uncover modules, however, produce one clustering that only reveals a single view of how the cell is organized. We describe two new methods to find ensemble ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Densely interacting regions of biological networks often correspond to functional modules such as protein complexes. Most algorithms proposed to uncover modules, however, produce one clustering that only reveals a single view of how the cell is organized. We describe two new methods to find ensembles of provably nearoptimal modularity partitions that lie within a heuristically constrained search space. We also show how to count the number of solutions in this space that exist within a bounded modularity range. We apply our algorithms to a protein interaction network for S. cerevisiae and show how finegrained differences between nearoptimal partitions can be used to define robust communities. We also propose a technique to find structurally diverse nearoptimal solutions and show that these different partitions are enriched for different biological functions. Our results indicate that nearoptimal solutions can represent alternative and complementary views of the network’s structure.