Results 1 - 10
of
30
Information-theoretic metric learning
- in NIPS 2006 Workshop on Learning to Compare Examples
, 2007
"... We formulate the metric learning problem as that of minimizing the differential relative entropy between two multivariate Gaussians under constraints on the Mahalanobis distance function. Via a surprising equivalence, we show that this problem can be solved as a low-rank kernel learning problem. Spe ..."
Abstract
-
Cited by 67 (8 self)
- Add to MetaCart
We formulate the metric learning problem as that of minimizing the differential relative entropy between two multivariate Gaussians under constraints on the Mahalanobis distance function. Via a surprising equivalence, we show that this problem can be solved as a low-rank kernel learning problem. Specifically, we minimize the Burg divergence of a low-rank kernel to an input kernel, subject to pairwise distance constraints. Our approach has several advantages over existing methods. First, we present a natural information-theoretic formulation for the problem. Second, the algorithm utilizes the methods developed by Kulis et al. [6], which do not involve any eigenvector computation; in particular, the running time of our method is faster than most existing techniques. Third, the formulation offers insights into connections between metric learning and kernel learning. 1
A combinatorial, primal-dual approach to semidefinite programs
- In STOC
, 2007
"... Semidefinite programs (SDP) have been used in many recent approximation algorithms. We develop a general primal-dual approach to solve SDPs using a generalization of the well-known multiplicative weights update rule to symmetric matrices. For a number of problems, such as Sparsest Cut and Balanced S ..."
Abstract
-
Cited by 43 (5 self)
- Add to MetaCart
Semidefinite programs (SDP) have been used in many recent approximation algorithms. We develop a general primal-dual approach to solve SDPs using a generalization of the well-known multiplicative weights update rule to symmetric matrices. For a number of problems, such as Sparsest Cut and Balanced Separator in undirected and directed weighted graphs, and the Min UnCut problem, this yields combinatorial approximation algorithms that are significantly more efficient than interior point methods. The design of our primal-dual algorithms is guided by a robust analysis of rounding algorithms used to obtain integer solutions from fractional ones. 1
Learning low-rank kernel matrices
- In ICML
, 2006
"... Kernel learning plays an important role in many machine learning tasks. However, algorithms for learning a kernel matrix often scale poorly, with running times that are cubic in the number of data points. In this paper, we propose efficient algorithms for learning lowrank kernel matrices; our algori ..."
Abstract
-
Cited by 24 (7 self)
- Add to MetaCart
Kernel learning plays an important role in many machine learning tasks. However, algorithms for learning a kernel matrix often scale poorly, with running times that are cubic in the number of data points. In this paper, we propose efficient algorithms for learning lowrank kernel matrices; our algorithms scale linearly in the number of data points and quadratically in the rank of the kernel. We introduce and employ Bregman matrix divergences for rank-deficient matrices—these divergences are natural for our problem since they preserve the rank as well as positive semi-definiteness of the kernel matrix. Special cases of our framework yield faster algorithms for various existing kernel learning problems. Experimental results demonstrate the effectiveness of our algorithms in learning both low-rank and full-rank kernels. 1.
Randomized PCA algorithms with regret bounds that are logarithmic in the dimension
- In Advances in Neural Information Processing Systems 19 (NIPS 06
, 2006
"... We design an on-line algorithm for Principal Component Analysis. The instances are projected into a probabilistically chosen low dimensional subspace. The total expected quadratic approximation error equals the total quadratic approximation error of the best subspace chosen in hindsight plus some ad ..."
Abstract
-
Cited by 13 (5 self)
- Add to MetaCart
We design an on-line algorithm for Principal Component Analysis. The instances are projected into a probabilistically chosen low dimensional subspace. The total expected quadratic approximation error equals the total quadratic approximation error of the best subspace chosen in hindsight plus some additional term that grows linearly in dimension of the subspace but logarithmically in the dimension of the instances. 1
Linear Algorithms for Online Multitask Classification
"... We design and analyze interacting online algorithms for multitask classification that perform better than independent learners whenever the tasks are related in a certain sense. We formalize task relatedness in different ways, and derive formal guarantees on the performance advantage provided by int ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
We design and analyze interacting online algorithms for multitask classification that perform better than independent learners whenever the tasks are related in a certain sense. We formalize task relatedness in different ways, and derive formal guarantees on the performance advantage provided by interaction. Our online analysis gives new stimulating insights into previously known co-regularization techniques, such as the multitask kernels and the margin correlation analysis for multiview learning. In the last part we apply our approach to spectral co-regularization: we introduce a natural matrix extension of the quasiadditive algorithm for classification and prove bounds depending on certain unitarily invariant norms of the matrix of task coefficients. 1
Nonnegative matrix approximation: algorithms and applications
, 2006
"... Low dimensional data representations are crucial to numerous applications in machine learning, statistics, and signal processing. Nonnegative matrix approximation (NNMA) is a method for dimensionality reduction that respects the nonnegativity of the input data while constructing a low-dimensional ap ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
Low dimensional data representations are crucial to numerous applications in machine learning, statistics, and signal processing. Nonnegative matrix approximation (NNMA) is a method for dimensionality reduction that respects the nonnegativity of the input data while constructing a low-dimensional approximation. NNMA has been used in a multitude of applications, though without commensurate theoretical development. In this report we describe generic methods for minimizing generalized divergences between the input and its low rank approximant. Some of our general methods are even extensible to arbitrary convex penalties. Our methods yield efficient multiplicative iterative schemes for solving the proposed problems. We also consider interesting extensions such as the use of penalty functions, non-linear relationships via “link ” functions, weighted errors, and multi-factor approximations. We present some experiments as an illustration of our algorithms. For completeness, the report also includes a brief literature survey of the various algorithms and the applications of NNMA. Keywords: Nonnegative matrix factorization, weighted approximation, Bregman divergence, multiplicative
Low-Rank Kernel Learning with Bregman Matrix Divergences
"... In this paper, we study low-rank matrix nearness problems, with a focus on learning lowrank positive semidefinite (kernel) matrices for machine learning applications. We propose efficient algorithms that scale linearly in the number of data points and quadratically in the rank of the input matrix. E ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
In this paper, we study low-rank matrix nearness problems, with a focus on learning lowrank positive semidefinite (kernel) matrices for machine learning applications. We propose efficient algorithms that scale linearly in the number of data points and quadratically in the rank of the input matrix. Existing algorithms for learning kernel matrices often scale poorly, with running times that are cubic in the number of data points. We employ Bregman matrix divergences as the measures of nearness—these divergences are natural for learning low-rank kernels since they preserve rank as well as positive semidefiniteness. Special cases of our framework yield faster algorithms for various existing learning problems, and experimental results demonstrate that our algorithms can effectively learn both low-rank and full-rank kernel matrices.
Fast SDP Algorithms for Constraint Satisfaction Problems
"... The class of constraint satisfactions problems (CSPs) captures many fundamental combinatorial optimization problems such as Max Cut, Max q-Cut, Unique Games, and Max k-Sat. Recently, Raghavendra (STOC‘08) identified a simple semidefinite programming relaxation that gives the best possible approximat ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
The class of constraint satisfactions problems (CSPs) captures many fundamental combinatorial optimization problems such as Max Cut, Max q-Cut, Unique Games, and Max k-Sat. Recently, Raghavendra (STOC‘08) identified a simple semidefinite programming relaxation that gives the best possible approximation for any CSP, assuming the Unique Games Conjecture. Raghavendra and Steurer (FOCS‘09) showed that, independent of the truth of the Unique Games Conjecture, the integrality gap of this relaxation cannot be improved even by adding a large class of valid inequalities. We present an algorithm that finds an approximately optimal solution to this relaxation in near-linear time. Combining this algorithm with a rounding scheme of Raghavendra and Steurer (FOCS‘09) leads to an approximation algorithm for any CSP that runs in near-linear time and has an approximation guarantee that matches the integrality gap, which is optimal assuming the Unique Games Conjecture.
Winnowing subspaces
- In ICML
, 2007
"... We generalize the Winnow algorithm for learning disjunctions to learning subspaces of low rank. Subspaces are represented by symmetric projection matrices. The online algorithm maintains its uncertainty about the hidden low rank projection matrix as a symmetric positive definite matrix. This matrix ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
We generalize the Winnow algorithm for learning disjunctions to learning subspaces of low rank. Subspaces are represented by symmetric projection matrices. The online algorithm maintains its uncertainty about the hidden low rank projection matrix as a symmetric positive definite matrix. This matrix is updated using a version of the Matrix Exponentiated Gradient algorithm that is based on matrix exponentials and matrix logarithms. As in the case of the Winnow algorithm, the bounds are logarithmic in the dimension n of the problem, but linear in the rank r of the hidden subspace. We show that the algorithm can be adapted to handle arbitrary matrices of any dimension via a reduction. 1.

