Results 1  10
of
36
Simultaneous Unsupervised Learning of Disparate Clusterings
"... Most clustering algorithms produce a single clustering for a given data set even when the data can be clustered naturally in multiple ways. In this paper, we address the difficult problem of uncovering disparate clusterings from the data in a totally unsupervised manner. We propose two new approache ..."
Abstract

Cited by 34 (0 self)
 Add to MetaCart
(Show Context)
Most clustering algorithms produce a single clustering for a given data set even when the data can be clustered naturally in multiple ways. In this paper, we address the difficult problem of uncovering disparate clusterings from the data in a totally unsupervised manner. We propose two new approaches for this problem. In the first approach we aim to find good clusterings of the data that are also decorrelated with one another. To this end, we give a new and tractable characterization of decorrelation between clusterings, and present an objective function to capture it. We provide an iterative “decorrelated” kmeans type algorithm to minimize this objective function. In the second approach, we model the data as a sum of mixtures and associate each mixture with a clustering. This approach leads us to the problem of learning a convolution of mixture distributions. Though the latter problem can be formulated as one of factorial learning [8, 13, 16], the existing formulations and methods do not perform well on many real highdimensional data sets. We propose a new regularized factorial learning framework that is more suitable for capturing the notion of disparate clusterings in modern, highdimensional data sets. The resulting algorithm does well in uncovering multiple clusterings, and is much improved over existing methods. We evaluate our methods on two realworld data sets a music data set from the text mining domain, and a portrait data set from the computer vision domain. Our methods achieve a substantially higher accuracy than existing factorial learning as well as traditional clustering algorithms.
Finding alternative clusterings using constraints
 In Proceedings of the 8th IEEE international conference on data mining (ICDM
, 2008
"... The aim of data mining is to find novel and actionable insights. However, most algorithms typically just find a single explanation of the data even though alternatives could exist. In this work, we explore a general purpose approach to find an alternative clustering of the data with the aid of mustl ..."
Abstract

Cited by 29 (2 self)
 Add to MetaCart
(Show Context)
The aim of data mining is to find novel and actionable insights. However, most algorithms typically just find a single explanation of the data even though alternatives could exist. In this work, we explore a general purpose approach to find an alternative clustering of the data with the aid of mustlink and cannotlink constraints. This problem has received little attention in the literature and since our approach can be incorporated into many clustering algorithm that uses a distance function, compares favorably with existing work. 1.
Multiple NonRedundant Spectral Clustering Views
"... in several different ways for different purposes. For example, images of faces of people can be grouped based Many clustering algorithms only find one on their pose or identity. Web pages collected from clustering solution. However, data can ofuniversities can be clustered based on the type of webt ..."
Abstract

Cited by 29 (4 self)
 Add to MetaCart
in several different ways for different purposes. For example, images of faces of people can be grouped based Many clustering algorithms only find one on their pose or identity. Web pages collected from clustering solution. However, data can ofuniversities can be clustered based on the type of webten be grouped and interpreted in many difpage’s owner, {faculty, student, staff}, field, {physics, ferent ways. This is particularly true in math, engineering, computer science}, or identity of the highdimensional setting where differthe university. In some cases, a data analyst wishes ent subspaces reveal different possible groupto find a single clustering, but this may require an alings of the data. Instead of committing gorithm to consider multiple clusterings and discard to one clustering solution, here we introthose that are not of interest. In other cases, one may duce a novel method that can provide sevwish to summarize and organize the data according to eral nonredundant clustering solutions to multiple possible clustering views. In either case, it is the user. Our approach simultaneously learns important to find multiple clustering solutions which nonredundant subspaces that provide multiare nonredundant. ple views and finds a clustering solution in each view. We achieve this by augmenting a spectral clustering objective function to incorporate dimensionality reduction and multiple views and to penalize for redundancy between the views. 1.
Generation of Alternative Clusterings Using the CAMI Approach
"... Exploratory data analysis aims to discover and generate multiple views of the structure within a dataset. Conventional clustering techniques, however, are designed to only provide a single grouping or clustering of a dataset. In this paper, we introduce a novel algorithm called CAMI, that can uncove ..."
Abstract

Cited by 25 (3 self)
 Add to MetaCart
(Show Context)
Exploratory data analysis aims to discover and generate multiple views of the structure within a dataset. Conventional clustering techniques, however, are designed to only provide a single grouping or clustering of a dataset. In this paper, we introduce a novel algorithm called CAMI, that can uncover alternative clusterings from a dataset. CAMI takes a mathematically appealing approach, combining the use of mutual information to distinguish between alternative clusterings, coupled with an expectation maximization framework to ensure clustering quality. We experimentally test CAMI on both synthetic and realworld datasets, comparing it against a variety of stateoftheart algorithms. We demonstrate that CAMI’s performance is high and that its formulation provides a number of advantages compared to existing techniques. 1
A hierarchical information theoretic technique for the discovery of non linear alternative clusterings
 In Proc. of the Int’l Conf. on Knowledge Discovery and Data Mining (KDD’10
, 2010
"... Discovery of alternative clusterings is an important method for exploring complex datasets. It provides the capability for the user to view clustering behaviour from different perspectives and thus explore new hypotheses. However, current algorithms for alternative clustering have focused mainly on ..."
Abstract

Cited by 16 (5 self)
 Add to MetaCart
(Show Context)
Discovery of alternative clusterings is an important method for exploring complex datasets. It provides the capability for the user to view clustering behaviour from different perspectives and thus explore new hypotheses. However, current algorithms for alternative clustering have focused mainly on linear scenarios and may not perform as desired for datasets containing clusters with non linear shapes. Our goal in this paper is to address this challenge of non linearity. In particular, we propose a novel algorithm to uncover an alternative clustering that is distinctively different from an existing, reference clustering. Our technique is information theory based and aims to ensure alternative clustering quality by maximizing the mutual information between clustering labels and data observations, whilst at the same time ensuring alternative clustering distinctiveness by minimizing the information sharing between the two clusterings. We perform experiments to assess our method against a large range of alternative clustering algorithms in the literature. We show our technique’s performance is generally better for nonlinear scenarios and furthermore, is highly competitive even for simpler, linear scenarios.
Unifying Dependent Clustering and Disparate Clustering for Nonhomogeneous Data
"... Modern data mining settings involve a combination of attributevalued descriptors over entities as well as specified relationships between these entities. We present an approach to cluster such nonhomogeneous datasets by using the relationships to impose either dependent clustering or disparate clus ..."
Abstract

Cited by 15 (8 self)
 Add to MetaCart
(Show Context)
Modern data mining settings involve a combination of attributevalued descriptors over entities as well as specified relationships between these entities. We present an approach to cluster such nonhomogeneous datasets by using the relationships to impose either dependent clustering or disparate clustering constraints. Unlike prior work that views constraints as boolean criteria, we present a formulation that allows constraints to be satisfied or violated in a smooth manner. This enables us to achieve dependent clustering and disparate clustering using the same optimization framework by merely maximizing versus minimizing the objective function. We present results on both synthetic data as well as several realworld datasets.
ASCLU: Alternative subspace clustering
 In MultiClust at KDD
, 2010
"... Finding groups of similar objects in databases is one of the most important data mining tasks. Recently, traditional clustering approaches have been extended to generate alternative clustering solutions. The basic observation is that for each database object multiple meaningful groupings might exist ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
(Show Context)
Finding groups of similar objects in databases is one of the most important data mining tasks. Recently, traditional clustering approaches have been extended to generate alternative clustering solutions. The basic observation is that for each database object multiple meaningful groupings might exist: the data allows to be clustered through different perspectives. It is thus reasonable to search for deviating clusters compared to a given clustering result, that the user is not satisfied with. The existing methods focus on full space clustering. However, for today’s applications, where many attributes per object are recorded, traditional clustering is known to generate no meaningful results. Instead, the analysis of subspace projections of the data with subspace or projected clustering techniques is more suitable. In this paper, we develop the first method that detects alternative subspace clusters based on an already known subspace clustering. Considering subspace projections, we can identify alternative clusters also based on deviating dimension sets besides just deviating object sets. Thus, we realize different views on the data by using different attributes. Besides the challenge of detecting alternative subspace clusters our model avoids redundant clusters in the overall result, i.e. the generated clusters are dissimilar among each other. In experiments we analyze the effectiveness of our model and show that meaningful alternative subspace clustering solutions are generated. 1.
Variational Inference for Nonparametric Multiple Clustering
"... Most clustering algorithms produce a single clustering solution. Similarly, feature selection for clustering tries to find one feature subset where one interesting clustering solution resides. However, a single data set may be multifaceted and can be grouped and interpreted in many different ways, ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
Most clustering algorithms produce a single clustering solution. Similarly, feature selection for clustering tries to find one feature subset where one interesting clustering solution resides. However, a single data set may be multifaceted and can be grouped and interpreted in many different ways, especially for high dimensional data, where feature selection is typically needed. Moreover, different clustering solutions are interesting for different purposes. Instead of committing to one clustering solution, in this paper we introduce a probabilistic nonparametric Bayesian model that can discover several possible clustering solutions and the feature subset views that generated each cluster partitioning simultaneously. We provide a variational inference approach to learn the features and clustering partitions in each view. Our model allows us not only to learn the multiple clusterings and views but also allows us to automatically learn the number of views and the number of clusters in each view. Keywords multiple clustering, nonredundant/disparate clustering, feature selection, nonparametric Bayes, variational inference 1.
Subspace Clustering, Ensemble Clustering, Alternative Clustering, Multiview Clustering: What Can We Learn From Each Other?
"... Though subspace clustering, ensemble clustering, alternative clustering, and multiview clustering are different approaches motivated by different problems and aiming at different goals, there are similar problems in these fields. Here we shortly survey these areas from the point of view of subspace ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
(Show Context)
Though subspace clustering, ensemble clustering, alternative clustering, and multiview clustering are different approaches motivated by different problems and aiming at different goals, there are similar problems in these fields. Here we shortly survey these areas from the point of view of subspace clustering. Based on this survey, we try to identify problems where the different research areas could probably learn from each other. 1.
Avoiding Bias in Text Clustering Using Constrained Kmeans and MayNotLinks
"... Abstract. In this paper we present a new clustering algorithm which extends the traditional batch kmeans enabling the introduction of domain knowledge in the form of Must, Cannot, May and MayNot rules between the data points. Besides, we have applied the presented method to the task of avoiding bi ..."
Abstract

Cited by 4 (4 self)
 Add to MetaCart
(Show Context)
Abstract. In this paper we present a new clustering algorithm which extends the traditional batch kmeans enabling the introduction of domain knowledge in the form of Must, Cannot, May and MayNot rules between the data points. Besides, we have applied the presented method to the task of avoiding bias in clustering. Evaluation carried out in standard collections showed considerable improvements in effectiveness against previous constrained and nonconstrained algorithms for the given task.