Results 1 - 10
of
94
Distance metric learning for large margin nearest neighbor classification
- In NIPS
, 2006
"... We show how to learn a Mahanalobis distance metric for k-nearest neighbor (kNN) classification by semidefinite programming. The metric is trained with the goal that the k-nearest neighbors always belong to the same class while examples from different classes are separated by a large margin. On seven ..."
Abstract
-
Cited by 177 (7 self)
- Add to MetaCart
We show how to learn a Mahanalobis distance metric for k-nearest neighbor (kNN) classification by semidefinite programming. The metric is trained with the goal that the k-nearest neighbors always belong to the same class while examples from different classes are separated by a large margin. On seven data sets of varying size and difficulty, we find that metrics trained in this way lead to significant improvements in kNN classification—for example, achieving a test error rate of 1.3 % on the MNIST handwritten digits. As in support vector machines (SVMs), the learning problem reduces to a convex optimization based on the hinge loss. Unlike learning in SVMs, however, our framework requires no modification or extension for problems in multiway (as opposed to binary) classification. 1
Supervised clustering with support vector machines
- in ICML
, 2005
"... Supervised clustering is the problem of training a clustering algorithm to produce desirable clusterings: given sets of items and complete clusterings over these sets, we learn how to cluster future sets of items. Example applications include noun-phrase coreference clustering, and clustering news a ..."
Abstract
-
Cited by 37 (4 self)
- Add to MetaCart
Supervised clustering is the problem of training a clustering algorithm to produce desirable clusterings: given sets of items and complete clusterings over these sets, we learn how to cluster future sets of items. Example applications include noun-phrase coreference clustering, and clustering news articles by whether they refer to the same topic. In this paper we present an SVM algorithm that trains a clustering algorithm by adapting the item-pair similarity measure. The algorithm may optimize a variety of different clustering functions to a variety of clustering performance measures. We empirically evaluate the algorithm for noun-phrase and news article clustering. 1.
Semi-supervised graph clustering: a kernel approach
, 2008
"... Semi-supervised clustering algorithms aim to improve clustering results using limited supervision. The supervision is generally given as pairwise constraints; such constraints are natural for graphs, yet most semi-supervised clustering algorithms are designed for data represented as vectors. In this ..."
Abstract
-
Cited by 24 (1 self)
- Add to MetaCart
Semi-supervised clustering algorithms aim to improve clustering results using limited supervision. The supervision is generally given as pairwise constraints; such constraints are natural for graphs, yet most semi-supervised clustering algorithms are designed for data represented as vectors. In this paper, we unify vector-based and graph-based approaches. We first show that a recently-proposed objective function for semi-supervised clustering based on Hidden Markov Random Fields, with squared Euclidean distance and a certain class of constraint penalty functions, can be expressed as a special case of the weighted kernel k-means objective (Dhillon et al., in Proceedings of the 10th International Conference on Knowledge Discovery and Data Mining, 2004a). A recent theoretical connection between weighted kernel k-means and several graph clustering objectives enables us to perform semi-supervised clustering of data given either as vectors or as a graph. For graph data, this result leads to algorithms for optimizing several new semi-supervised graph clustering objectives. For vector data, the kernel approach also enables us to find clusters with non-linear boundaries in the input data space. Furthermore, we show that recent work on spectral learning (Kamvar et al., in Proceedings of the 17th International Joint Conference on Artificial Intelligence, 2003) may be viewed as a special case of our formulation. We empirically show that our algorithm is able to outperform current state-of-the-art semi-supervised algorithms on both vector-based and graph-based data sets.
Learning with constrained and unlabeled data
- In CVPR
, 2005
"... Classification problems abundantly arise in many computer vision tasks – being of supervised, semi-supervised or unsupervised nature. Even when class labels are not available, a user still might favor certain grouping solutions over others. This bias can be expressed either by providing a clustering ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
Classification problems abundantly arise in many computer vision tasks – being of supervised, semi-supervised or unsupervised nature. Even when class labels are not available, a user still might favor certain grouping solutions over others. This bias can be expressed either by providing a clustering criterion or cost function and, in addition to that, by specifying pairwise constraints on the assignment of objects to classes. In this work, we discuss a unifying formulation for labelled and unlabelled data that can incorporate constrained data for model fitting. Our approach models the constraint information by the maximum entropy principle. This modeling strategy allows us (i) to handle constraint violations and soft constraints, and, at the same time, (ii) to speed up the optimization process. Experimental results on face classification and image segmentation indicates that the proposed algorithm is computationally efficient and generates superior groupings when compared with alternative techniques. 1.
Simultaneous Unsupervised Learning of Disparate Clusterings
"... Most clustering algorithms produce a single clustering for a given data set even when the data can be clustered naturally in multiple ways. In this paper, we address the difficult problem of uncovering disparate clusterings from the data in a totally unsupervised manner. We propose two new approache ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
Most clustering algorithms produce a single clustering for a given data set even when the data can be clustered naturally in multiple ways. In this paper, we address the difficult problem of uncovering disparate clusterings from the data in a totally unsupervised manner. We propose two new approaches for this problem. In the first approach we aim to find good clusterings of the data that are also decorrelated with one another. To this end, we give a new and tractable characterization of decorrelation between clusterings, and present an objective function to capture it. We provide an iterative “decorrelated” k-means type algorithm to minimize this objective function. In the second approach, we model the data as a sum of mixtures and associate each mixture with a clustering. This approach leads us to the problem of learning a convolution of mixture distributions. Though the latter problem can be formulated as one of factorial learning [8, 13, 16], the existing formulations and methods do not perform well on many real high-dimensional data sets. We propose a new regularized factorial learning framework that is more suitable for capturing the notion of disparate clusterings in modern, high-dimensional data sets. The resulting algorithm does well in uncovering multiple clusterings, and is much improved over existing methods. We evaluate our methods on two real-world data sets- a music data set from the text mining domain, and a portrait data set from the computer vision domain. Our methods achieve a substantially higher accuracy than existing factorial learning as well as traditional clustering algorithms.
Similarity Scores based on Background Samples
"... Abstract. Evaluating the similarity of images and their descriptors by employing discriminative learners has proven itself to be an effective face recognition paradigm. In this paper we show how “background samples”, that is, examples which do not belong to any of the classes being learned, may prov ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
Abstract. Evaluating the similarity of images and their descriptors by employing discriminative learners has proven itself to be an effective face recognition paradigm. In this paper we show how “background samples”, that is, examples which do not belong to any of the classes being learned, may provide a significant performance boost to such face recognition systems. In particular, we make the following contributions. First, we define and evaluate the “Two-Shot Similarity ” (TSS) score as an extension to the recently proposed “One-Shot Similarity ” (OSS) measure. Both these measures utilize background samples to facilitate better recognition rates. Second, we examine the ranking of images most similar to a query image and employ these as a descriptor for that image. Finally, we provide results underscoring the importance of proper face alignment in automatic face recognition systems. These contributions in concert allow us to obtain a success rate of 86.83 % on the Labeled Faces in the Wild (LFW) benchmark, outperforming current state-of-the-art results. 1
The one-shot similarity kernel
- In International Conference on Computer Vision (ICCV
, 2009
"... face.com The One-Shot similarity measure has recently been introduced in the context of face recognition where it was used to produce state-of-the-art results. Given two vectors, their One-Shot similarity score reflects the likelihood of each vector belonging in the same class as the other vector an ..."
Abstract
-
Cited by 11 (5 self)
- Add to MetaCart
face.com The One-Shot similarity measure has recently been introduced in the context of face recognition where it was used to produce state-of-the-art results. Given two vectors, their One-Shot similarity score reflects the likelihood of each vector belonging in the same class as the other vector and not in a class defined by a fixed set of “negative ” examples. The potential of this approach has thus far been largely unexplored. In this paper we analyze the One-Shot score and show that: (1) when using a version of LDA as the underlying classifier, this score is a Conditionally Positive Definite kernel and may be used within kernel-methods (e.g., SVM), (2) it can be efficiently computed, and (3) that it is effective as an underlying mechanism for image representation. We further demonstrate the effectiveness of the One-Shot similarity score in a number of applications including multiclass identification and descriptor generation. 1.
Semi-Supervised Clustering via Matrix Factorization
, 2008
"... The recent years have witnessed a surge of interests of semi-supervised clustering methods, which aim to cluster the data set under the guidance of some supervisory information. Usually those supervisory information takes the form of pairwise constraints that indicate the similarity/dissimilarity be ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
The recent years have witnessed a surge of interests of semi-supervised clustering methods, which aim to cluster the data set under the guidance of some supervisory information. Usually those supervisory information takes the form of pairwise constraints that indicate the similarity/dissimilarity between the two points. In this paper, we propose a novel matrix factorization based approach for semi-supervised clustering. In addition, we extend our algorithm to co-cluster the data sets of different types with constraints. Finally the experiments on UCI data sets and real world Bulletin Board Systems (BBS) data sets show the superiority of our proposed method.
Learnable Similarity Functions and Their Applications to Clustering and Record Linkage
, 2004
"... rship (Xing et al. 2003), and relative comparisons (Schultz & Joachims 2004). These approaches have shown improvements over traditional similarity functions for different data types such as vectors in Euclidean space, strings, and database records composed of multiple text fields. While these initia ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
rship (Xing et al. 2003), and relative comparisons (Schultz & Joachims 2004). These approaches have shown improvements over traditional similarity functions for different data types such as vectors in Euclidean space, strings, and database records composed of multiple text fields. While these initial results are encouraging, there still remains a large number of similarity functions that are currently unable to adapt to a particular domain. In our research, we attempt to bridge this gap by developing both new learnable similarity functions and methods for their application to particular problems in machine learning and data mining. In preliminary work, we proposed two learnable similarity functions for strings that adapt distance computations given training pairs of equivalent and non-equivalent strings (Bilenko & Mooney 2003a). The first function is based on a probabilistic model of edit distance with affine gaps (Gus- Copyright c # 2004, American Association for Artificial Intelli
PepDist: A New Framework for Protein-Peptide Binding Prediction based on Learning Peptide Distance Functions
- BMC BIOINFORMATICS
, 2006
"... Background: Many different aspects of cellular signalling, trafficking and targeting mechanisms are mediated by interactions between proteins and peptides. Representative examples are MHCpeptide complexes in the immune system. Developing computational methods for protein-peptide binding prediction i ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Background: Many different aspects of cellular signalling, trafficking and targeting mechanisms are mediated by interactions between proteins and peptides. Representative examples are MHCpeptide complexes in the immune system. Developing computational methods for protein-peptide binding prediction is therefore an important task with applications to vaccine and drug design. Methods: Previous learning approaches address the binding prediction problem using traditional margin based binary classifiers. In this paper we propose PepDist: a novel approach for predicting binding affinity. Our approach is based on learning peptide-peptide distance functions. Moreover, we suggest to learn a single peptide-peptide distance function over an entire family of proteins (e.g. MHC class I). This distance function can be used to compute the affinity of a novel peptide to any of the proteins in the given family. In order to learn these peptide-peptide distance functions, we formalize the problem as a semi-supervised learning problem with partial information in the form of equivalence constraints. Specifically, we propose to use DistBoost [1,2], which is a semisupervised distance learning algorithm. Results: We compare our method to various state-of-the-art binding prediction algorithms on

