Results 1  10
of
116
Learning the discriminative powerinvariance tradeoff
 IN ICCV
, 2007
"... We investigate the problem of learning optimal descriptors for a given classification task. Many handcrafted descriptors have been proposed in the literature for measuring visual similarity. Looking past initial differences, what really distinguishes one descriptor from another is the tradeoff that ..."
Abstract

Cited by 228 (4 self)
 Add to MetaCart
We investigate the problem of learning optimal descriptors for a given classification task. Many handcrafted descriptors have been proposed in the literature for measuring visual similarity. Looking past initial differences, what really distinguishes one descriptor from another is the tradeoff that it achieves between discriminative power and invariance. Since this tradeoff must vary from task to task, no single descriptor can be optimal in all situations. Our focus, in this paper, is on learning the optimal tradeoff for classification given a particular training set and prior constraints. The problem is posed in the kernel learning framework. We learn the optimal, domainspecific kernel as a combination of base kernels corresponding to base features which achieve different levels of tradeoff (such as no invariance, rotation invariance, scale invariance, affine invariance, etc.) This leads to a convex optimisation problem with a unique global optimum which can be solved for efficiently. The method is shown to achieve stateoftheart performance on the UIUC textures, Oxford flowers and Caltech 101 datasets.
Multiclass multiple kernel learning
 In ICML. ACM
"... In many applications it is desirable to learn from several kernels. “Multiple kernel learning” (MKL) allows the practitioner to optimize over linear combinations of kernels. By enforcing sparse coefficients, it also generalizes feature selection to kernel selection. We propose MKL for joint feature ..."
Abstract

Cited by 61 (4 self)
 Add to MetaCart
(Show Context)
In many applications it is desirable to learn from several kernels. “Multiple kernel learning” (MKL) allows the practitioner to optimize over linear combinations of kernels. By enforcing sparse coefficients, it also generalizes feature selection to kernel selection. We propose MKL for joint feature maps. This provides a convenient and principled way for MKL with multiclass problems. In addition, we can exploit the joint feature map to learn kernels on output spaces. We show the equivalence of several different primal formulations including different regularizers. We present several optimization methods, and compare a convex quadratically constrained quadratic program (QCQP) and two semiinfinite linear programs (SILPs) on toy data, showing that the SILPs are faster than the QCQP. We then demonstrate the utility of our method by applying the SILP to three real world datasets. 1.
Nonmyopic active learning of gaussian processes: An explorationexploitation approach
 IN ICML
, 2007
"... When monitoring spatial phenomena, such as the ecological condition of a river, deciding where to make observations is a challenging task. In these settings, a fundamental question is when an active learning, or sequential design, strategy, where locations are selected based on previous measurements ..."
Abstract

Cited by 51 (5 self)
 Add to MetaCart
When monitoring spatial phenomena, such as the ecological condition of a river, deciding where to make observations is a challenging task. In these settings, a fundamental question is when an active learning, or sequential design, strategy, where locations are selected based on previous measurements, will perform significantly better than sensing at an a priori specified set of locations. For Gaussian Processes (GPs), which often accurately model spatial phenomena, we present an analysis and efficient algorithms that address this question. Central to our analysis is a theoretical bound which quantifies the performance difference between active and a priori design strategies. We consider GPs with unknown kernel parameters and present a nonmyopic approach for trading off exploration, i.e., decreasing uncertainty about the model parameters, and exploitation, i.e., nearoptimally selecting observations when the parameters are (approximately) known. We discuss several exploration strategies, and present logarithmic sample complexity bounds for the exploration phase. We then extend our algorithm to handle nonstationary GPs exploiting local structure in the model. A variational approach allows us to perform efficient inference in this class of nonstationary models. We also present extensive empirical evaluation on several realworld problems.
Learning nonlinear combinations of kernels
 In NIPS
, 2009
"... This paper studies the general problem of learning kernels based on a polynomial combination of base kernels. We analyze this problem in the case of regression and the kernel ridge regression algorithm. We examine the corresponding learning kernel optimization problem, show how that minimax problem ..."
Abstract

Cited by 48 (2 self)
 Add to MetaCart
(Show Context)
This paper studies the general problem of learning kernels based on a polynomial combination of base kernels. We analyze this problem in the case of regression and the kernel ridge regression algorithm. We examine the corresponding learning kernel optimization problem, show how that minimax problem can be reduced to a simpler minimization problem, and prove that the global solution of this problem always lies on the boundary. We give a projectionbased gradient descent algorithm for solving the optimization problem, shown empirically to converge in few iterations. Finally, we report the results of extensive experiments with this algorithm using several publicly available datasets demonstrating the effectiveness of our technique. 1
Learning bounds for support vector machines with learned kernels
 In COLT
, 2006
"... Consider the problem of learning a kernel for use in SVM classification. We bound the estimation error of a large margin classifier when the kernel, relative to which this margin is defined, is chosen from a family of kernels based on the training sample. For a kernel family with pseudodimension dφ, ..."
Abstract

Cited by 45 (1 self)
 Add to MetaCart
(Show Context)
Consider the problem of learning a kernel for use in SVM classification. We bound the estimation error of a large margin classifier when the kernel, relative to which this margin is defined, is chosen from a family of kernels based on the training sample. For a kernel family with pseudodimension dφ, we present a bound of Õ(dφ + 1/γ2)/n on the estimation error for SVMs with margin γ. This is the first bound in which the relation between the margin term and the familyofkernels term is additive rather then multiplicative. The pseudodimension of families of linear combinations of base kernels is the number of base kernels. Unlike in previous (multiplicative) bounds, there is no nonnegativity requirement on the coefficients of the linear combinations. We also give simple bounds on the pseudodimension for families of Gaussian kernels.
L2 regularization for learning kernels
 In: Proceedings of the 25th Conference in Uncertainty in Artificial Intelligence
, 2009
"... The choice of the kernel is critical to the success of many learning algorithms but it is typically left to the user. Instead, the training data can be used to learn the kernel by selecting it out of a given family, such as that of nonnegative linear combinations of p base kernels, constrained by a ..."
Abstract

Cited by 44 (4 self)
 Add to MetaCart
(Show Context)
The choice of the kernel is critical to the success of many learning algorithms but it is typically left to the user. Instead, the training data can be used to learn the kernel by selecting it out of a given family, such as that of nonnegative linear combinations of p base kernels, constrained by a trace or L1 regularization. This paper studies the problem of learning kernels with the same family of kernels but with an L2 regularization instead, and for regression problems. We analyze the problem of learning kernels with ridge regression. We derive the form of the solution of the optimization problem and give an efficient iterative algorithm for computing that solution. We present a novel theoretical analysis of the problem based on stability and give learning bounds for orthogonal kernels that contain only an additive term O ( √ p/m) when compared to the standard kernel ridge regression stability bound. We also report the results of experiments indicating that L1 regularization can lead to modest improvements for a small number of kernels, but to performance degradations in largerscale cases. In contrast, L2 regularization never degrades performance and in fact achieves significant improvements with a large number of kernels. 1
Simple and Efficient Multiple Kernel Learning by Group Lasso
"... We consider the problem of how to improve the efficiency of Multiple Kernel Learning (MKL). In literature, MKL is often solved by an alternating approach: (1) the minimization of the kernel weights is solved by complicated techniques, such as Semiinfinite Linear Programming, Gradient Descent, or Le ..."
Abstract

Cited by 43 (4 self)
 Add to MetaCart
We consider the problem of how to improve the efficiency of Multiple Kernel Learning (MKL). In literature, MKL is often solved by an alternating approach: (1) the minimization of the kernel weights is solved by complicated techniques, such as Semiinfinite Linear Programming, Gradient Descent, or Level method; (2) the maximization of SVM dual variables can be solved by standard SVM solvers. However, the minimization step in these methods is usually dependent on its solving techniques or commercial softwares, which therefore limits the efficiency and applicability. In this paper, we formulate a closedform solution for optimizing the kernel weights based on the equivalence between grouplasso and MKL. Although this equivalence is not our invention, our derived variant equivalence not only leads to an efficient algorithm for MKL, but also generalizes to the case for LpMKL (p ≥ 1 and denoting the Lpnorm of kernel weights). Therefore, our proposed algorithm provides a unified solution for the entire family of LpMKL models. Experiments on multiple data sets show the promising performance of the proposed technique compared with other competitive methods. 1.
Optimal kernel selection in kernel Fisher discriminant analysis
 In Proceedings of the TwentyThird International Conference on Machine Learning
, 2006
"... In Kernel Fisher discriminant analysis (KFDA), we carry out Fisher linear discriminant analysis in a high dimensional feature space defined implicitly by a kernel. The performance of KFDA depends on the choice of the kernel; in this paper, we consider the problem of finding the optimal kernel, over ..."
Abstract

Cited by 39 (1 self)
 Add to MetaCart
(Show Context)
In Kernel Fisher discriminant analysis (KFDA), we carry out Fisher linear discriminant analysis in a high dimensional feature space defined implicitly by a kernel. The performance of KFDA depends on the choice of the kernel; in this paper, we consider the problem of finding the optimal kernel, over a given convex set of kernels. We show that this optimal kernel selection problem can be reformulated as a tractable convex optimization problem which interiorpoint methods can solve globally and efficiently. The kernel selection method is demonstrated with some UCI machine learning benchmark examples. 1.
Multiclass Discriminant Kernel Learning via Convex Programming
"... Regularized kernel discriminant analysis (RKDA) performs linear discriminant analysis in the feature space via the kernel trick. Its performance depends on the selection of kernels. In this paper, we consider the problem of multiple kernel learning (MKL) for RKDA, in which the optimal kernel matrix ..."
Abstract

Cited by 36 (0 self)
 Add to MetaCart
Regularized kernel discriminant analysis (RKDA) performs linear discriminant analysis in the feature space via the kernel trick. Its performance depends on the selection of kernels. In this paper, we consider the problem of multiple kernel learning (MKL) for RKDA, in which the optimal kernel matrix is obtained as a linear combination of prespecified kernel matrices. We show that the kernel learning problem in RKDA can be formulated as convex programs. First, we show that this problem can be formulated as a semidefinite program (SDP). Based on the equivalence relationship between RKDA and least square problems in the binaryclass case, we propose a convex quadratically constrained quadratic programming (QCQP) formulation for kernel learning in RKDA. A semiinfinite linear programming (SILP) formulation is derived to further improve the efficiency. We extend these formulations to the multiclass case based on a key result established in this paper. That is, the multiclass RKDA kernel learning problem can be decomposed into a set of binaryclass kernel learning problems which are constrained to share a common kernel. Based on this decomposition property, SDP formulations are proposed for the multiclass case. Furthermore, it leads naturally to QCQP and SILP formulations. As the performance of RKDA depends on the regularization parameter, we show that this parameter can also be optimized in a joint framework with the kernel. Extensive experiments have been conducted and analyzed, and connections to other algorithms are discussed.
TwoStage Learning Kernel Algorithms
"... This paper examines twostage techniques for learning kernels based on a notion of alignment. It presents a number of novel theoretical, algorithmic, and empirical results for alignmentbased techniques. Our results build on previous work by Cristianini et al. (2001), but we adopt a different definit ..."
Abstract

Cited by 35 (0 self)
 Add to MetaCart
This paper examines twostage techniques for learning kernels based on a notion of alignment. It presents a number of novel theoretical, algorithmic, and empirical results for alignmentbased techniques. Our results build on previous work by Cristianini et al. (2001), but we adopt a different definition of kernel alignment and significantly extend that work in several directions: we give a novel and simple concentration bound for alignment between kernel matrices; show the existence of good predictors for kernels with high alignment, both for classification and for regression; give algorithms for learning a maximum alignment kernel by showing that the problem can be reduced to a simple QP; and report the results of extensive experiments with this alignmentbased method in classification and regression tasks, which show an improvement both over the uniform combination of kernels and over other stateoftheart learning kernel methods. 1.