## Active Semi-Supervision for Pairwise Constrained Clustering

### Cached

### Download Links

- [www.siam.org]
- [www.siam.org]
- [siam.org]
- [www.cs.utexas.edu]
- [www.cs.utexas.edu]
- [www.cs.utexas.edu]
- [www.cs.utexas.edu]
- [www.cs.utexas.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | Proc. 4th SIAM Intl. Conf. on Data Mining (SDM-2004 |

Citations: | 99 - 9 self |

### BibTeX

@INPROCEEDINGS{Basu_activesemi-supervision,

author = {Sugato Basu and Arindam Banerjee and Raymond J. Mooney},

title = {Active Semi-Supervision for Pairwise Constrained Clustering},

booktitle = {Proc. 4th SIAM Intl. Conf. on Data Mining (SDM-2004},

year = {}

}

### Years of Citing Articles

### OpenURL

### Abstract

Semi-supervised clustering uses a small amount of supervised data to aid unsupervised learning. One typical approach specifies a limited number of must-link and cannotlink constraints between pairs of examples. This paper presents a pairwise constrained clustering framework and a new method for actively selecting informative pairwise constraints to get improved clustering performance. The clustering and active learning methods are both easily scalable to large datasets, and can handle very high dimensional data. Experimental and theoretical results confirm that this active querying of pairwise constraints significantly improves the accuracy of clustering when given a relatively small amount of supervision. 1

### Citations

3081 | UCI Repository of Machine Learning Database. Http://www.ics.uci.edu/~mlearn/ mlrepository.html - Blake, Merz - 1998 |

2067 | Some methods for classification and analysis of multivariate observations
- MacQueen
- 1967
(Show Context)
Citation Context ...very high dimensional data, as our experiments on text datasets demonstrate. Section 2 outlines the pairwise constrained clustering framework, and Section 3 presents a refinement of KMeans clustering =-=[13, 25]-=-, called PCKMeans, that utilizes pairwise constraints. In Section 4, we present a method for actively picking good constraints by asking queries of the form “Are these two examples in same or differen... |

1929 | Randomized Algorithms
- Motwani, Raghavan
- 1995
(Show Context)
Citation Context ...ll refer to points from the same cluster as having the same color. If the probability of drawing points of different colors is given by O . , then, by an extension of the Z coupon collector’s problem =-=[29]-=-, one can show that points of all colors will be drawn with high probability within [/, I#I draws. We claim that the farthest first scheme gets points of all colors within # attempts with probability ... |

1318 | Combining Labeled and Unlabeled Data with Co-Training
- Blum, Mitchell
- 1998
(Show Context)
Citation Context ...ce labeled data can be expensive to generate. Consequently, semi-supervised learning, learning from a combination of both labeled and unlabeled data, has become a topic of significant recent interest =-=[6, 20, 30]-=-. More specifically, semisupervised clustering, the use of class labels or pairwise constraints on some examples to aid unsupervised clustering, has been the focus of several recent projects [4, 22, 3... |

859 | Text classification from labeled and unlabeled documents using EM
- Nigam, McCallum, et al.
- 2000
(Show Context)
Citation Context ...ce labeled data can be expensive to generate. Consequently, semi-supervised learning, learning from a combination of both labeled and unlabeled data, has become a topic of significant recent interest =-=[6, 20, 30]-=-. More specifically, semisupervised clustering, the use of class labels or pairwise constraints on some examples to aid unsupervised clustering, has been the focus of several recent projects [4, 22, 3... |

728 | Transductive Inference for Text Classification using Support Vector Machines
- Joachims
- 1999
(Show Context)
Citation Context ...ce labeled data can be expensive to generate. Consequently, semi-supervised learning, learning from a combination of both labeled and unlabeled data, has become a topic of significant recent interest =-=[6, 20, 30]-=-. More specifically, semisupervised clustering, the use of class labels or pairwise constraints on some examples to aid unsupervised clustering, has been the focus of several recent projects [4, 22, 3... |

561 | Active learning with statistical models
- Cohn, Ghahramani, et al.
- 1996
(Show Context)
Citation Context ...ing techniques in classification are not applicable in the clustering framework, since the basic underlying concept of reduction of classification error and variance over the distribution of examples =-=[9]-=- is not well-defined for clustering. In the unsupervised setting, Hofmann et al. [19] consider a model of active learning which is different from ours – they have incomplete pairwise similarities betw... |

548 | Distance metric learning, with application to clustering with side-information
- Xing, Ng, et al.
- 2004
(Show Context)
Citation Context ...nts and an underlying metric between the points while clustering. Other work with the pairwise constrained clustering model includes learning distance metrics for clustering from pairwise constraints =-=[17, 22, 34]-=-. In this domain, Cohn. et al. [8] have proposed iterative userfeedback to acquire constraints, but it was not an active learning algorithm. Active learning in the classification framework is a longst... |

506 | A sequential algorithm for training text classifiers
- Lewis, Gale
- 1994
(Show Context)
Citation Context ...cation framework is a longstudied problem, where different principles of query selection have been studied, e.g., reduction of the version space size [16], reduction of uncertainty in predicted label =-=[24]-=-, maximizing the margin on training data [1], finding high variance data points by density-weighted pool-based sampling [27], etc. However, active learning techniques in classification are not applica... |

358 | Selective sampling using the query by committee algorithm
- Freund, Shamir, et al.
- 1997
(Show Context)
Citation Context ...earning algorithm. Active learning in the classification framework is a longstudied problem, where different principles of query selection have been studied, e.g., reduction of the version space size =-=[16]-=-, reduction of uncertainty in predicted label [24], maximizing the margin on training data [1], finding high variance data points by density-weighted pool-based sampling [27], etc. However, active lea... |

352 | Constrained k-means clustering with background knowledge
- Wagstaff, Cardie, et al.
- 2001
(Show Context)
Citation Context ..., 20, 30]. More specifically, semisupervised clustering, the use of class labels or pairwise constraints on some examples to aid unsupervised clustering, has been the focus of several recent projects =-=[4, 22, 33, 34]-=-. In a semi-supervised clustering setting, the focus is on clustering large amounts of unlabeled data in the presence of a small amount of supervised data. In this setting, we consider a framework tha... |

327 | Concept decompositions for large sparse text data using clustering
- Dhillon, Modha
- 2001
(Show Context)
Citation Context ...hm greedily optimizes N pckm using a KMeans-type iteration with a modified cluster-assignment step. For experiments with text documents, we used a variant of KMeans called spherical KMeans (SPKMeans) =-=[11]-=- that has computational advantages for sparse high dimensional text data vectors. We will present our algorithm and its motivation based on KMeans in Section 3, but all of it can be easily extended fo... |

270 | Employing EM in pool-based active learning for text classification
- McCallum, Nigam
- 1998
(Show Context)
Citation Context ...of the limited supervised data available in a semi-supervised setting, supervised training examples should be actively selected as maximally informative ones rather than chosen at random, if possible =-=[27]-=-. In that case, fewer constraints will be required to significantly improve the clustering accuracy. To this end, we present a new method for actively selecting good pairwise constraints for semi-supe... |

239 |
Directional Statistics
- Mardia, Jupp
- 2000
(Show Context)
Citation Context ... Ü Ü \ 6] _ I# # & is , where \ 6] _ I# # K " *-, # /c# 3 if z ! " *-, # c# 3 if ! otherwise We assume an identity covariance Gaussian noise model for the observed data (von-Mises Fisher distribution =-=[26]-=- was considered as the noise model for high-dimensional text data) 4 , and also assume that the observed (noisy) data points have been drawn independently of each other following this model. If denote... |

232 | Correlation clustering
- Bansal, Blum, et al.
(Show Context)
Citation Context ...associated violation cost, which PCKMeans does. A softconstrained algorithm SCOP-KMeans has been recently proposed [32], whose performance would be interesting to compare with PCKMeans. Bansal et al. =-=[3]-=- proposed a theoretical model where they performed clustering using only pairwise constraints, which is different from our model since we consider both constraints and an underlying metric between the... |

197 | A best possible heuristic for the k-center problem - Hochbaum, Shmoys - 1985 |

178 | Markov random fields with efficient approximations
- Boyjov, Veksler, et al.
(Show Context)
Citation Context ...fferent clusters (cannot-link) [33]. In real-world unsupervised learning tasks, e.g., clustering for speaker identification in a conversation [17], visual correspondence in multiview image processing =-=[7]-=-, clustering multi-spectral information from Mars images [32], etc., considering supervision in the form of constraints is generally more practical than providing class labels, since true labels may b... |

166 | From instance-level constraints to space-level constraints: Making the most of prior knowledge in data clustering
- Klein, Kamvar, et al.
- 2002
(Show Context)
Citation Context ..., 20, 30]. More specifically, semisupervised clustering, the use of class labels or pairwise constraints on some examples to aid unsupervised clustering, has been the focus of several recent projects =-=[4, 22, 33, 34]-=-. In a semi-supervised clustering setting, the focus is on clustering large amounts of unlabeled data in the presence of a small amount of supervised data. In this setting, we consider a framework tha... |

166 | Approximation algorithms for classification problems with pairwise relationships: Metric labeling and Markov random fields
- Kleinberg, Tardos
- 1999
(Show Context)
Citation Context ...nd the cannot-link constraints in respectively. # Let be the cluster assignment of a point , # %$ where . The cost of violating mustlink and cannot-link constraints is typically quantified by metrics =-=[23]-=-. We restrict our attention to the uniform metric (also known as the generalized Potts metric), for which the cost of violating a & '( must-link is given by #.0/1#243 , i.e., if the must-linked points... |

164 | Clustering with instance-level constraints
- Wagstaff, Cardie
- 2000
(Show Context)
Citation Context ...sed learning tasks, e.g., clustering for speaker identification in a conversation [17], visual correspondence in multiview image processing [7], clustering multi-spectral information from Mars images =-=[32]-=-, etc., considering supervision in the form of constraints is generally more practical than providing class labels, since true labels may be unknown a priori, while it can be easier to specify whether... |

163 | Impact of Similarity Measures on Web-page Clustering
- Strehl, Ghosh, et al.
- 2002
(Show Context)
Citation Context ...al information shared by the random variables representing the cluster assignments and the userlabeled class assignments of the data points. We computed NMI following the methodology of Strehl et al. =-=[31]-=-. NMI measures how closely the clustering algorithm could reconstruct the underlying label distribution in the data. If is the random variable denoting the cluster assignments of the points, and is th... |

160 | Semi-supervised Clustering by Seeding
- Basu, Banerjee, et al.
- 2002
(Show Context)
Citation Context ..., 20, 30]. More specifically, semisupervised clustering, the use of class labels or pairwise constraints on some examples to aid unsupervised clustering, has been the focus of several recent projects =-=[4, 22, 33, 34]-=-. In a semi-supervised clustering setting, the focus is on clustering large amounts of unlabeled data in the presence of a small amount of supervised data. In this setting, we consider a framework tha... |

147 |
Discrete Location Theory
- Mirchandani, Francis, et al.
- 1990
(Show Context)
Citation Context ...points from dense regions of the data space [27]. Such a formulation of active learning would be more robust to outliers, and can be used with more outlier-robust clustering algorithms, e.g., KMedian =-=[28]-=-. Our current clustering model assumes that the constraints are consistent, i.e., there is no noise in the constraints. We are working on incorporating a noise model into our PCC framework, so that we... |

104 | Random projection for high dimensional data clustering: A cluster ensemble approach
- Fern, Brodley
(Show Context)
Citation Context ...d Rand Index [22, 33, 34] that are frequently used for evaluation of clustering in the PCC framework are very similar to pairwise F-measure. NMI has also become a popular clustering evaluation metric =-=[2, 12, 15]-=-. We present results using both these evaluation measures and observe from the results that they are strongly correlated. We also show results for the objective function N pckm. Normalized Mutual Info... |

103 | Semi-supervised clustering with user feedback
- Cohn, Caruana, et al.
- 2003
(Show Context)
Citation Context ...ints while clustering. Other work with the pairwise constrained clustering model includes learning distance metrics for clustering from pairwise constraints [17, 22, 34]. In this domain, Cohn. et al. =-=[8]-=- have proposed iterative userfeedback to acquire constraints, but it was not an active learning algorithm. Active learning in the classification framework is a longstudied problem, where different pri... |

97 |
Query learning strategies using boosting and bagging
- Abe, Mamitsuka
- 1998
(Show Context)
Citation Context ...ere different principles of query selection have been studied, e.g., reduction of the version space size [16], reduction of uncertainty in predicted label [24], maximizing the margin on training data =-=[1]-=-, finding high variance data points by density-weighted pool-based sampling [27], etc. However, active learning techniques in classification are not applicable in the clustering framework, since the b... |

91 | An information-theoretic analysis of hard and soft assignment methods for clustering
- Kearns, Mansour, et al.
- 1997
(Show Context)
Citation Context ...t of labeled examples for each cluster gives significant performance improvements. Under certain generative model-based assumptions, one can connect the mixture of Gaussians model to the KMeans model =-=[21]-=-. A direct calculation using Chernoff bounds shows that if a particular cluster (with an underlying Gaussian model) with true centroid is seeded with points (drawn independently at random from the cor... |

72 | Active data clustering
- Hofmann, Buhmann
- 1997
(Show Context)
Citation Context ...ce the basic underlying concept of reduction of classification error and variance over the distribution of examples [9] is not well-defined for clustering. In the unsupervised setting, Hofmann et al. =-=[19]-=- consider a model of active learning which is different from ours – they have incomplete pairwise similarities between points, and their active learning goal is to select new data, using expected valu... |

66 | P.S.: Initialization of iterative refinement clustering algorithms
- Fayyad, Reina, et al.
- 1998
(Show Context)
Citation Context ...d set. If such a point exists, it is used to initialize the n cluster. If there are any more cluster centroids left uninitialized, we initialize them by random perturbations of the global centroid of =-=[14]-=-. is assigned to a cluster such that it minimizes the sum of the distance of to the cluster centroid and the cost of constraint violations incurred by that cluster assignment (by equivalently satisfyi... |

65 | An information-theoretic external cluster-validity measure
- Dom
- 2001
(Show Context)
Citation Context ...d Rand Index [22, 33, 34] that are frequently used for evaluation of clustering in the PCC framework are very similar to pairwise F-measure. NMI has also become a popular clustering evaluation metric =-=[2, 12, 15]-=-. We present results using both these evaluation measures and observe from the results that they are strongly correlated. We also show results for the objective function N pckm. Normalized Mutual Info... |

49 |
Performance guarantees for hierarchical clustering
- Dasgupta
- 2002
(Show Context)
Citation Context ...t traveri sal gives an efficient approximation of -4D h @UD A the problem [18], and has also been used to construct hierarchical clusterings with performance guarantees at each level of the hierarchy =-=[10]-=-. For our data model (see Appendix A.2), we prove another interesting property of farthest-first traversal (see Appendix A.4) that justifies its use for active learning. In [4], it was observed that i... |

36 | Generative model-based clustering of directional data
- Banerjee, Dhillon, et al.
- 2003
(Show Context)
Citation Context ...n the data. If is the random variable denoting the cluster assignments of the points, and is the random variable denoting the underlying class labels on the points, then the NMI measure is defined as =-=[2]-=-: q 0 > [ 0 P o 0 Y 0 q where is the mutual information between the random variables and 0 , is the Shannon entropy of , 0 q and is the conditional entropy of given . Pairwise F-measure is defined as ... |

20 |
Learning distance functions using equivalence relations. ICML
- Hillel, Hertz, et al.
- 2003
(Show Context)
Citation Context ...two examples must be in the same cluster (must-link) or different clusters (cannot-link) [33]. In real-world unsupervised learning tasks, e.g., clustering for speaker identification in a conversation =-=[17]-=-, visual correspondence in multiview image processing [7], clustering multi-spectral information from Mars images [32], etc., considering supervision in the form of constraints is generally more pract... |

9 | Hidden Markov random field model for segmentation of brain MR images
- Zhang, Brady, et al.
- 2000
(Show Context)
Citation Context ...rvation for # a given X configuration is Ü S Z L JML . Since the MRF is #I Y Z defined over the %# hidden true labels of the observed points , this model is called a Hidden Markov Random Field (HMRF) =-=[35]-=-, which is a direct generalization of a Hidden Markov Model. Since the posterior probability of a configuration # is I# j 6#w #w , the PCC objective function is proportional to the negative logarithm ... |

4 |
A best possible heuristic for the -center problem
- Hochbaum, Shmoys
- 1985
(Show Context)
Citation Context ...ing point farthest from the traversed set (using the standard notion of distance from a set: 6 T i ), and so on. Farthest-first traveri sal gives an efficient approximation of -4D h @UD A the problem =-=[18]-=-, and has also been used to construct hierarchical clusterings with performance guarantees at each level of the hierarchy [10]. For our data model (see Appendix A.2), we prove another interesting prop... |