Results 1  10
of
36
LowRank Kernel Learning with Bregman Matrix Divergences
"... In this paper, we study lowrank matrix nearness problems, with a focus on learning lowrank positive semidefinite (kernel) matrices for machine learning applications. We propose efficient algorithms that scale linearly in the number of data points and quadratically in the rank of the input matrix. E ..."
Abstract

Cited by 20 (1 self)
 Add to MetaCart
In this paper, we study lowrank matrix nearness problems, with a focus on learning lowrank positive semidefinite (kernel) matrices for machine learning applications. We propose efficient algorithms that scale linearly in the number of data points and quadratically in the rank of the input matrix. Existing algorithms for learning kernel matrices often scale poorly, with running times that are cubic in the number of data points. We employ Bregman matrix divergences as the measures of nearness—these divergences are natural for learning lowrank kernels since they preserve rank as well as positive semidefiniteness. Special cases of our framework yield faster algorithms for various existing learning problems, and experimental results demonstrate that our algorithms can effectively learn both lowrank and fullrank kernel matrices.
Nonnegative matrix approximation: algorithms and applications
, 2006
"... Low dimensional data representations are crucial to numerous applications in machine learning, statistics, and signal processing. Nonnegative matrix approximation (NNMA) is a method for dimensionality reduction that respects the nonnegativity of the input data while constructing a lowdimensional ap ..."
Abstract

Cited by 18 (3 self)
 Add to MetaCart
Low dimensional data representations are crucial to numerous applications in machine learning, statistics, and signal processing. Nonnegative matrix approximation (NNMA) is a method for dimensionality reduction that respects the nonnegativity of the input data while constructing a lowdimensional approximation. NNMA has been used in a multitude of applications, though without commensurate theoretical development. In this report we describe generic methods for minimizing generalized divergences between the input and its low rank approximant. Some of our general methods are even extensible to arbitrary convex penalties. Our methods yield efficient multiplicative iterative schemes for solving the proposed problems. We also consider interesting extensions such as the use of penalty functions, nonlinear relationships via “link ” functions, weighted errors, and multifactor approximations. We present some experiments as an illustration of our algorithms. For completeness, the report also includes a brief literature survey of the various algorithms and the applications of NNMA. Keywords: Nonnegative matrix factorization, weighted approximation, Bregman divergence, multiplicative
Metric and Kernel Learning Using a Linear Transformation
"... Metric and kernel learning arise in several machine learning applications. However, most existing metric learning algorithms are limited to learning metrics over lowdimensional data, while existing kernel learning algorithms are often limited to the transductive setting and do not generalize to new ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
Metric and kernel learning arise in several machine learning applications. However, most existing metric learning algorithms are limited to learning metrics over lowdimensional data, while existing kernel learning algorithms are often limited to the transductive setting and do not generalize to new data points. In this paper, we study the connections between metric learning and kernel learning that arise when studying metric learning as a linear transformation learning problem. In particular, we propose a general optimization framework for learning metrics via linear transformations, and analyze in detail a special case of our framework—that of minimizing the LogDet divergence subject to linear constraints. We then propose a general regularized framework for learning a kernel matrix, and show it to be equivalent to our metric learning framework. Our theoretical connections between metric and kernel learning have two main consequences: 1) the learned kernel matrix parameterizes a linear transformation kernel function and can be applied inductively to new data points, 2) our result yields a constructive method for kernelizing most existing Mahalanobis metric learning formulations. We demonstrate our learning approach by applying it to largescale real world problems in computer vision, text mining and semisupervised kernel dimensionality reduction. Keywords: divergence metric learning, kernel learning, linear transformation, matrix divergences, logdet 1.
Matrix nearness problems with bregman divergences
 SIAM J. matrix anal. appl
"... Abstract. This paper discusses a new class of matrix nearness problems that measure approximation error using a directed distance measure called a Bregman divergence. Bregman divergences offer an important generalization of the squared Frobenius norm and relative entropy, and they all share fundamen ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
Abstract. This paper discusses a new class of matrix nearness problems that measure approximation error using a directed distance measure called a Bregman divergence. Bregman divergences offer an important generalization of the squared Frobenius norm and relative entropy, and they all share fundamental geometric properties. In addition, these divergences are intimately connected with exponential families of probability distributions. Therefore, it is natural to study matrix approximation problems with respect to Bregman divergences. This article proposes a framework for studying these problems, discusses some specific matrix nearness problems, and provides algorithms for solving them numerically. These algorithms apply to many classical and novel problems, and they admit a striking geometric interpretation.
Structured Metric Learning for High Dimensional Problems ABSTRACT
"... The success of popular algorithms such as kmeans clustering or nearest neighbor searches depend on the assumption that the underlying distance functions reflect domainspecific notions of similarity for the problem at hand. The distance metric learning problem seeks to optimize a distance function ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
The success of popular algorithms such as kmeans clustering or nearest neighbor searches depend on the assumption that the underlying distance functions reflect domainspecific notions of similarity for the problem at hand. The distance metric learning problem seeks to optimize a distance function subject to constraints that arise from fullysupervised or semisupervised information. Several recent algorithms have been proposed to learn such distance functions in low dimensional settings. One major shortcoming of these methods is their failure to scale to high dimensional problems that are becoming increasingly ubiquitous in modern data mining applications. In this paper, we present metric learning algorithms that scale linearly with dimensionality, permitting efficient optimization, storage, and evaluation of the learned metric. This is achieved through our main technical contribution which provides a framework based on the logdeterminant matrix divergence which enables efficient optimization of structured, lowparameter Mahalanobis distances. Experimentally, we evaluate our methods across a variety of high dimensional domains, including text, statistical software analysis, and collaborative filtering, showing that our methods scale to data sets with tens of thousands or more features. We show that our learned metric can achieve excellent quality with respect to various criteria. For example, in the context of metric learning for nearest neighbor classification, we show that our methods achieve 24 % higher accuracy over the baseline distance. Additionally, our methods yield very good precision while providing recall measures up to 20 % higher than other baseline methods such as latent semantic analysis.
Tensor Sparse Coding for Region Covariances
"... Abstract. Sparse representation of signals has been the focus of much research in the recent years. A vast majority of existing algorithms deal with vectors, and higher–order data like images are dealt with by vectorization. However, the structure of the data may be lost in the process, leading to a ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
Abstract. Sparse representation of signals has been the focus of much research in the recent years. A vast majority of existing algorithms deal with vectors, and higher–order data like images are dealt with by vectorization. However, the structure of the data may be lost in the process, leading to a poorer representation and overall performance degradation. In this paper we propose a novel approach for sparse representation of positive definite matrices, where vectorization will destroy the inherent structure of the data. The sparse decomposition of a positive definite matrix is formulated as a convex optimization problem, which falls under the category of determinant maximization (MAXDET) problems [1], for which efficient interior point algorithms exist. Experimental results are shown with simulated examples as well as in real–world computer vision applications, demonstrating the suitability of the new model. This forms the first step toward extending the cornucopia of sparsitybased algorithms to positive definite matrices.
Online Multiple Kernel Learning: Algorithms and Mistake Bounds
"... Abstract. Online learning and kernel learning are two active research topics in machine learning. Although each of them has been studied extensively, there is a limited effort in addressing the intersecting research. In this paper, we introduce a new research problem, termed Online Multiple Kernel L ..."
Abstract

Cited by 9 (9 self)
 Add to MetaCart
Abstract. Online learning and kernel learning are two active research topics in machine learning. Although each of them has been studied extensively, there is a limited effort in addressing the intersecting research. In this paper, we introduce a new research problem, termed Online Multiple Kernel Learning (OMKL), that aims to learn a kernel based prediction function from a pool of predefined kernels in an online learning fashion. OMKL is generally more challenging than typical online learning because both the kernel classifiers and their linear combination weights must be learned simultaneously. In this work, we consider two setups for OMKL, i.e. combining binary predictions or realvalued outputs from multiple kernel classifiers, and we propose both deterministic and stochastic approaches in the two setups for OMKL. The deterministic approach updates all kernel classifiers for every misclassified example, while the stochastic approach randomly chooses a classifier(s) for updating according to some sampling strategies. Mistake bounds are derived for all the proposed OMKL algorithms. Keywords: Online learning and relative loss bounds, Kernels 1
Matrix regularization techniques for online multitask learning
, 2008
"... In this paper we examine the problem of prediction with expert advice in a setup where the learner is presented with a sequence of examples coming from different tasks. In order for the learner to be able to benefit from performing multiple tasks simultaneously, we make assumptions of task relatedne ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
In this paper we examine the problem of prediction with expert advice in a setup where the learner is presented with a sequence of examples coming from different tasks. In order for the learner to be able to benefit from performing multiple tasks simultaneously, we make assumptions of task relatedness by constraining the comparator to use a lesser number of best experts than the number of tasks. We show how this corresponds naturally to learning under spectral or structural matrix constraints, and propose regularization techniques to enforce the constraints. The regularization techniques proposed here are interesting in their own right and multitask learning is just one application for the ideas. A theoretical analysis of one such regularizer is performed, and a regret bound that shows benefits of this setup is reported. 1
Similarity search on bregman divergence: Towards nonmetric indexing
 In VLDB
, 2009
"... In this paper, we examine the problem of indexing over nonmetric distance functions. In particular, we focus on a general class of distance functions, namely Bregman Divergence [6], to support nearest neighbor and range queries. Distance functions such as KLdivergence and ItakuraSaito distance, a ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
In this paper, we examine the problem of indexing over nonmetric distance functions. In particular, we focus on a general class of distance functions, namely Bregman Divergence [6], to support nearest neighbor and range queries. Distance functions such as KLdivergence and ItakuraSaito distance, are special cases of Bregman divergence, with wide applications in statistics, speech recognition and time series analysis among others. Unlike in metric spaces, key properties such as triangle inequality and distance symmetry do not hold for such distance functions. A direct adaptation of existing indexing infrastructure developed for metric spaces is thus not possible. We devise a novel solution to handle this class of distance measures by expanding and mapping points in the original space to a new extended space. Subsequently, we show how stateoftheart treebased indexing methods, for low to moderate dimensional datasets, and vector approximation file (VAfile) methods, for high dimensional datasets, can be adapted on this extended space to answer such queries efficiently. Improved distance bounding techniques and distributionbased index optimization are also introduced to improve the performance of query answering and index construction respectively, which can be applied on both the Rtrees and VA files. Extensive experiments are conducted to validate our approach on a variety of datasets and a range of Bregman divergence functions. 1.
Inductive regularized learning of kernel functions
"... In this paper we consider the fundamental problem of semisupervised kernel function learning. We first propose a general regularized framework for learning a kernel matrix, and then demonstrate an equivalence between our proposed kernel matrix learning framework and a general linear transformatio ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
In this paper we consider the fundamental problem of semisupervised kernel function learning. We first propose a general regularized framework for learning a kernel matrix, and then demonstrate an equivalence between our proposed kernel matrix learning framework and a general linear transformation learning problem. Our result shows that the learned kernel matrices parameterize a linear transformation kernel function and can be applied inductively to new data points. Furthermore, our result gives a constructive method for kernelizing most existing Mahalanobis metric learning formulations. To make our results practical for largescale data, we modify our framework to limit the number of parameters in the optimization process. We also consider the problem of kernelized inductive dimensionality reduction in the semisupervised setting. To this end, we introduce a novel method for this problem by considering a special case of our general kernel learning framework where we select the trace norm function as the regularizer. We empirically demonstrate that our framework learns useful kernel functions, improving the kNN classification accuracy significantly in a variety of domains. Furthermore, our kernelized dimensionality reduction technique significantly reduces the dimensionality of the feature space while achieving competitive classification accuracies.