Results 1  10
of
19
Large Scale Online Learning of Image Similarity through Ranking
"... Learning a measure of similarity between pairs of objects is an important generic problem in machine learning. It is particularly useful in large scale applications like searching for an image that is similar to a given image or finding videos that are relevant to a given video. In these tasks, user ..."
Abstract

Cited by 29 (2 self)
 Add to MetaCart
Learning a measure of similarity between pairs of objects is an important generic problem in machine learning. It is particularly useful in large scale applications like searching for an image that is similar to a given image or finding videos that are relevant to a given video. In these tasks, users look for objects that are not only visually similar but also semantically related to a given object. Unfortunately, the approaches that exist today for learning such semantic similarity do not scale to large datasets. This is both because typically their CPU and storage requirements grow quadratically with the sample size, and because many methods impose complex positivity constraints on the space of learned similarity functions. The current paper presents OASIS, an Online Algorithm for Scalable Image Similarity learning that learns a bilinear similarity measure over sparse representations. OASIS is an online dual approach using the passiveaggressive family of learning algorithms with a large margin criterion and an efficient hinge loss cost. Our experiments show that OASIS is both fast and accurate at a wide range of scales: for a dataset with thousands of images, it achieves better results than existing stateoftheart methods, while being an order of
An Online Algorithm for Large Scale Image Similarity Learning
"... Learning a measure of similarity between pairs of objects is a fundamental problem in machine learning. It stands in the core of classification methods like kernel machines, and is particularly useful for applications like searching for images that are similar to a given image or finding videos that ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
Learning a measure of similarity between pairs of objects is a fundamental problem in machine learning. It stands in the core of classification methods like kernel machines, and is particularly useful for applications like searching for images that are similar to a given image or finding videos that are relevant to a given video. In these tasks, users look for objects that are not only visually similar but also semantically related to a given object. Unfortunately, current approaches for learning similarity do not scale to large datasets, especially when imposing metric constraints on the learned similarity. We describe OASIS, a method for learning pairwise similarity that is fast and scales linearly with the number of objects and the number of nonzero features. Scalability is achieved through online learning of a bilinear model over sparse representations using a large margin criterion and an efficient hinge loss cost. OASIS is accurate at a wide range of scales: on a standard benchmark with thousands of images, it is more precise than stateoftheart methods, and faster by orders of magnitude. On 2.7 million images collected from the web, OASIS can be trained within 3 days on a single CPU. The nonmetric similarities learned by OASIS can be transformed into metric similarities, achieving higher precisions than similarities that are learned as metrics in the first place. This suggests an approach for learning a metric from data that is larger by orders of magnitude than was handled before. 1
Efficient Similarity Search for Covariance Matrices via the JensenBregman LogDet Divergence
"... Covariance matrices provide compact, informative feature descriptors for use in several computer vision applications, such as peopleappearance tracking, diffusiontensor imaging, activity recognition, among others. A key task in many of these applications is to compare different covariance matrices ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
Covariance matrices provide compact, informative feature descriptors for use in several computer vision applications, such as peopleappearance tracking, diffusiontensor imaging, activity recognition, among others. A key task in many of these applications is to compare different covariance matrices using a (dis)similarity function. A natural choice here is the Riemannian metric corresponding to the manifold inhabited by covariance matrices. But computations involving this metric are expensive, especially for large matrices and even more so, in gradientbased algorithms. To alleviate these difficulties, we advocate a novel dissimilarity measure for covariance matrices: the JensenBregman LogDet Divergence. This divergence enjoys several useful theoretical properties, but its greatest benefits are: (i) lower computational costs (compared to standard approaches); and (ii) amenability for use in nearestneighbor retrieval. We show numerous experiments to substantiate these claims. 1.
Fast Graph Laplacian Regularized Kernel Learning via Semidefinite–Quadratic–Linear Programming
"... Kernel learning is a powerful framework for nonlinear data modeling. Using the kernel trick, a number of problems have been formulated as semidefinite programs (SDPs). These include Maximum Variance Unfolding (MVU) (Weinberger et al., 2004) in nonlinear dimensionality reduction, and Pairwise Constra ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Kernel learning is a powerful framework for nonlinear data modeling. Using the kernel trick, a number of problems have been formulated as semidefinite programs (SDPs). These include Maximum Variance Unfolding (MVU) (Weinberger et al., 2004) in nonlinear dimensionality reduction, and Pairwise Constraint Propagation (PCP) (Li et al., 2008) in constrained clustering. Although in theory SDPs can be efficiently solved, the high computational complexity incurred in numerically processing the huge linear matrix inequality constraints has rendered the SDP approach unscalable. In this paper, we show that a large class of kernel learning problems can be reformulated as semidefinitequadraticlinear programs (SQLPs), which only contain a simple positive semidefinite constraint, a secondorder cone constraint and a number of linear constraints. These constraints are much easier to process numerically, and the gain in speedup over previous approaches is at least of the order m 2.5, where m is the matrix dimension. Experimental results are also presented to show the superb computational efficiency of our approach. 1
Online Multiple Kernel Classification
, 2012
"... Although both online learning and kernel learning have been studied extensively in machine learning, there is limited effort in addressing the intersecting research problems of these two important topics. As an attempt to fill the gap, we address a new research problem, termed Online Multiple Kern ..."
Abstract

Cited by 4 (4 self)
 Add to MetaCart
Although both online learning and kernel learning have been studied extensively in machine learning, there is limited effort in addressing the intersecting research problems of these two important topics. As an attempt to fill the gap, we address a new research problem, termed Online Multiple Kernel Classification (OMKC), which learns a kernelbased prediction function by selecting a subset of predefined kernel functions in an online learning fashion. OMKC is in general more challenging than typical online learning because both the kernel classifiers and the subset of selected kernels are unknown, and more importantly the solutions to the kernel classifiers and their combination weights are correlated. The proposed algorithms are based on the fusion of two online learning algorithms, i.e., the Perceptron algorithm that learns a classifier for a given kernel, and the Hedge algorithm that combines classifiers by linear weights. We develop stochastic selection strategies that randomly select a subset of kernels for combination and model updating, thus improving the learning efficiency. Our empirical study with15 data sets shows promising performance of the proposed algorithms for OMKC in both learning efficiency and prediction accuracy.
Online Learning in the Manifold of LowRank Matrices
"... When learning models that are represented in matrix forms, enforcing a lowrank constraint can dramatically improve the memory and run time complexity, while providing a natural regularization of the model. However, naive approaches for minimizing functions over the set of lowrank matrices are eith ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
When learning models that are represented in matrix forms, enforcing a lowrank constraint can dramatically improve the memory and run time complexity, while providing a natural regularization of the model. However, naive approaches for minimizing functions over the set of lowrank matrices are either prohibitively time consuming (repeated singular value decomposition of the matrix) or numerically unstable (optimizing a factored representation of the low rank matrix). We build on recent advances in optimization over manifolds, and describe an iterative online learning procedure, consisting of a gradient step, followed by a secondorder retraction back to the manifold. While the ideal retraction is hard to compute, and so is the projection operator that approximates it, we describe another secondorder retraction that can be computed efficiently, with run time and memory complexity of O ((n + m)k) for a rankk matrix of dimension m × n, given rankone gradients. We use this algorithm, LORETA, to learn a matrixform similarity measure over pairs of documents represented as high dimensional vectors. LORETA improves the mean average precision over a passive aggressive approach in a factorized model, and also improves over a full model trained over preselected features using the same memory requirements. LORETA also showed consistent improvement over standard methods in a large (1600 classes) multilabel image classification task. 1
Relational Divergence Based Classification on Riemannian Manifolds
"... A recent trend in computer vision is to represent images through covariance matrices, which can be treated as points on a special class of Riemannian manifolds. A popular way of analysing such manifolds is to embed them in Euclidean spaces, a process which can be interpreted as warping the feature s ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
A recent trend in computer vision is to represent images through covariance matrices, which can be treated as points on a special class of Riemannian manifolds. A popular way of analysing such manifolds is to embed them in Euclidean spaces, a process which can be interpreted as warping the feature space. Embedding manifolds is not without problems, as the manifold structure may not be accurately preserved. In this paper, we propose a new method for analysing Riemannian manifolds, where embedding into Euclidean spaces is not explicitly required. To this end, we propose to represent Riemannian points through their similarities to a set of reference points on the manifold, with the aid of the recently proposed Stein divergence, which is a symmetrised version of Bregman matrix divergence. Classification problems on manifolds are then effectively converted into the problem of finding appropriate machinery over the space of similarities, which can be tackled by conventional Euclidean learning methods such as linear discriminant analysis. Experiments on face recognition, person reidentification and texture classification show that the proposed method outperforms stateoftheart approaches,
JensenBregman LogDet Divergence with Application to Efficient Similarity Search for Covariance Matrices
"... Covariance matrices have found success in several computer vision applications, including activity recognition, visual surveillance, and diffusion tensor imaging. This is because they provide an easy platform for fusing multiple features compactly. An important task in all of these applications is t ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Covariance matrices have found success in several computer vision applications, including activity recognition, visual surveillance, and diffusion tensor imaging. This is because they provide an easy platform for fusing multiple features compactly. An important task in all of these applications is to compare two covariance matrices using a (dis)similarity function, for which the common choice is the Riemannian metric on the manifold inhabited by these matrices. As this Riemannian manifold is not flat, the dissimilarities should take into account the curvature of the manifold. As a result such distance computations tend to slow down, especially when the matrix dimensions are large or gradients are required. Further, suitability of the metric to enable efficient nearest neighbor retrieval is an important requirement in the contemporary times of big data analytics. To alleviate these difficulties, this paper proposes a novel dissimilarity measure for covariances, the JensenBregman LogDet Divergence (JBLD). This divergence enjoys several desirable theoretical properties, at the same time is computationally less demanding (compared to standard measures). Utilizing the fact that the squareroot of JBLD is a metric, we address the problem of efficient nearest neighbor retrieval on large covariance datasets via a metric tree data structure. To this end, we propose a KMeans clustering algorithm on JBLD. We demonstrate the superior performance of JBLD on covariance datasets from several computer vision applications.
ON A ZEROFINDING PROBLEM INVOLVING THE MATRIX EXPONENTIAL
"... Abstract. An important step in the solution of a matrix nearness problem that arises in certain machine learning applications is finding the zero of f(α) = zT exp(log X + αzzT)z − b. The matrix valued exponential and logarithm in f(α) arises from the use of the von Neumann matrix divergence tr(X lo ..."
Abstract
 Add to MetaCart
Abstract. An important step in the solution of a matrix nearness problem that arises in certain machine learning applications is finding the zero of f(α) = zT exp(log X + αzzT)z − b. The matrix valued exponential and logarithm in f(α) arises from the use of the von Neumann matrix divergence tr(X log X − X log Y − X + Y) to measure the nearness between the positive definite matrices X and Y. A key step of an iterative algorithm used to solve the underlying matrix nearness problem requires the zero of f(α) to be repeatedly computed. In this paper we propose zerofinding algorithms that gain their advantage by exploiting the special structure of the objective function. We show how to efficiently compute the derivative of f, thereby allowing the use of Newtontype methods. In numerical experiments we establish the advantage of our algorithms.
Mirror Descent for Metric Learning: A Unified Approach
"... Abstract. Most metric learning methods are characterized by diverse loss functions andprojection methods, whichnaturallybegsthequestion: is there a wider framework that can generalize many of these methods? In addition, ever persistent issues are those of scalability to large data sets and the quest ..."
Abstract
 Add to MetaCart
Abstract. Most metric learning methods are characterized by diverse loss functions andprojection methods, whichnaturallybegsthequestion: is there a wider framework that can generalize many of these methods? In addition, ever persistent issues are those of scalability to large data sets and the question of kernelizability. We propose a unified approach to Mahalanobis metric learning: an online regularized metric learning algorithm based on the ideas of composite objective mirror descent (comid). The metric learning problem is formulated as a regularized positive semidefinite matrix learning problem, whose update rules can be derived using the comid framework. This approach aims to be scalable, kernelizable, and admissible to many different types of Bregman and loss functions, which allows for the tailoring of several different classes of algorithms. The most novel contribution is the use of the trace norm, which yields a sparse metric in its eigenspectrum, thus simultaneously performing feature selection along with metric learning. 1