Results 1 
7 of
7
A Discriminative Framework for Clustering via Similarity Functions
"... Problems of clustering data from pairwise similarity information are ubiquitous in Computer Science. Theoretical treatments typically view the similarity information as groundtruth and then design algorithms to (approximately) optimize various graphbased objective functions. However, in most appli ..."
Abstract

Cited by 24 (9 self)
 Add to MetaCart
Problems of clustering data from pairwise similarity information are ubiquitous in Computer Science. Theoretical treatments typically view the similarity information as groundtruth and then design algorithms to (approximately) optimize various graphbased objective functions. However, in most applications, this similarity information is merely based on some heuristic; the ground truth is really the unknown correct clustering of the data points and the real goal is to achieve low error on the data. In this work, we develop a theoretical approach to clustering from this perspective. In particular, motivated by recent work in learning theory that asks “what natural properties of a similarity (or kernel) function are sufficient to be able to learn well? ” we ask “what natural properties of a similarity function are sufficient to be able to cluster well?” To study this question we develop a theoretical framework that
Clustering with Interactive Feedback
"... Abstract. In this paper, we initiate a theoretical study of the problem of clustering data under interactive feedback. We introduce a querybased model in which users can provide feedback to a clustering algorithm in a natural way via split and merge requests. We then analyze the “clusterability” of ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
Abstract. In this paper, we initiate a theoretical study of the problem of clustering data under interactive feedback. We introduce a querybased model in which users can provide feedback to a clustering algorithm in a natural way via split and merge requests. We then analyze the “clusterability” of different concept classes in this framework — the ability to cluster correctly with a bounded number of requests under only the assumption that each cluster can be described by a concept in the class — and provide efficient algorithms as well as informationtheoretic upper and lower bounds. 1
A theory of similarity functions for clustering
, 2007
"... Problems of clustering data from pairwise similarity information are ubiquitous in Computer Science. Theoretical treatments typically view the similarity information as groundtruth and then design algorithms to (approximately) optimize various graphbased objective functions. However, in most appli ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
Problems of clustering data from pairwise similarity information are ubiquitous in Computer Science. Theoretical treatments typically view the similarity information as groundtruth and then design algorithms to (approximately) optimize various graphbased objective functions. However, in most applications, this similarity information is merely based on some heuristic: the true goal is to cluster the points correctly rather than to optimize any specific graph property. In this work, we initiate a theoretical study of the design of similarity functions for clustering from this perspective. In particular, motivated by recent work in learning theory that asks “what natural properties of a similarity function are sufficient to be able to learn well? ” we ask “what natural properties of a similarity function are sufficient to be able to cluster well?” We develop a notion of the clustering complexity of a given property (analogous to notions of capacity in learning theory), that characterizes its informationtheoretic usefulness for clustering. We then analyze this complexity for several natural gametheoretic and learningtheoretic properties, as well as design efficient algorithms that are able to take advantage of them. We consider two natural clustering objectives: (a) list clustering: analogous to the notion of listdecoding, the algorithm can produce a small list of clusterings (which a user can select from) and (b) hierarchical clustering: the desired clustering is some
New Theoretical Frameworks for Machine Learning
, 2007
"... This thesis develops and analyzes theoretical frameworks for new emerging paradigms of Machine Learning including Semisupervised, Active, and Similaritybased Learning. These are areas of significant practical importance and significant activity in Machine Learning, and a number of different algori ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
This thesis develops and analyzes theoretical frameworks for new emerging paradigms of Machine Learning including Semisupervised, Active, and Similaritybased Learning. These are areas of significant practical importance and significant activity in Machine Learning, and a number of different algorithmic approaches have been developed for each of them. Standard Learning Theory frameworks such as PAC or Statistical Learning Theory models tend to not capture these learning approaches, hence developing sound and rigorous models that provide a thorough understanding of these new paradigms is desirable. The purpose of this thesis is to propose and to study new theoretical frameworks and algorithms for better understanding and extending some of these learning approaches. In addition, this dissertation also presents new applications of techniques from Machine Learning Theory to new emerging areas of Computer Science at large, such as Auction and Mechanism Design. In Machine Learning, there has been growing interest in using unlabeled data together with labeled data due to the availability of large amounts of unlabeled data in many applications. As a result, a number of different algorithmic approaches have been developed for this
Interactive clustering
, 2009
"... We consider the problem of clustering with feedback. We study a recently proposed framework for the problem and present new results on clustering geometric concept classes in that model. In this model the clustering algorithm interacts with the user via “split ” and “merge ” requests to figure out t ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We consider the problem of clustering with feedback. We study a recently proposed framework for the problem and present new results on clustering geometric concept classes in that model. In this model the clustering algorithm interacts with the user via “split ” and “merge ” requests to figure out the target clustering. We also give a simple generic algorithm to cluster any concept class in the model. Our algorithm is queryefficient in the sense that it involves only a small amount of interaction with the user. We also present and study two natural generalization of the original model. The original model assumes that the user response to the algorithm is perfect. We eliminate this limitation by proposing a noisy model for interactive clustering and give an algorithm for learning the class of intervals in that model. We also propose a dynamic model considering the fact that the user might see a random subset of the space of all points at every step. Finally, for datasets satisfying a spectrum of weak to strong properties, we give query bounds, and show that a class of clustering functions containing SingleLinkage will find the target clustering under the strongest property. 1
Clustering via Similarity Functions: Theoretical Foundations and Algorithms ∗
"... Problems of clustering data from pairwise similarity information arise in many different fields. Yet the question of which algorithm is best to use under what conditions, and how good a notion of similarity does one need in order to cluster accurately remains poorly understood. In this work we propo ..."
Abstract
 Add to MetaCart
Problems of clustering data from pairwise similarity information arise in many different fields. Yet the question of which algorithm is best to use under what conditions, and how good a notion of similarity does one need in order to cluster accurately remains poorly understood. In this work we propose a new general framework for analyzing clustering from similarity information that directly addresses this question of what properties of a similarity measure are sufficient to cluster accurately and by what kinds of algorithms. We show that in our framework a wide variety of interesting learningtheoretic and gametheoretic properties, including properties motivated by mathematical biology, can be used to cluster well, and we design new efficient algorithms that are able to take advantage of them. We consider two natural clustering objectives: (a) list clustering, where the algorithm’s goal is to produce a small list of clusterings such that at least one of them is approximately correct, and (b) hierarchical clustering, where the algorithm’s goal is to produce a hierarchy such that desired clustering is some pruning of this tree (which a user could navigate). We develop a notion of the clustering complexity of a given property, analogous to notions of capacity in learning theory, that characterizes informationtheoretic usefulness for clustering. We analyze this quantity for a wide range of properties, giving tight upper and lower
Using Spectral Clustering for Finding Students’ Using Spectral Clustering for Finding Students’ Patterns of Behavior in Social Networks Patterns of Behavior in Social Networks
"... Abstract. The high dimensionality of the data generated by social networks has been a big challenge for researchers. In order to solve the problems associated with this phenomenon, a number of methods and techniques were developed. Spectral clustering is a data mining method used in many application ..."
Abstract
 Add to MetaCart
Abstract. The high dimensionality of the data generated by social networks has been a big challenge for researchers. In order to solve the problems associated with this phenomenon, a number of methods and techniques were developed. Spectral clustering is a data mining method used in many applications; in this paper we used this method to find students ’ behavioral patterns performed in an elearning system. In addition, a software was introduced to allow the user (tutor or researcher) to define the data dimensions and input values to obtain appropriate graphs with behavioral pattens that meet his/her needs. Behavioral patterns were compared with students ’ study performance and evaluation with relation to their possible usage in collaborative learning. 1