## More Generality in Efficient Multiple Kernel Learning

### Cached

### Download Links

Citations: | 46 - 2 self |

### BibTeX

@MISC{Varma_moregenerality,

author = {Manik Varma and Bodla Rakesh Babu},

title = {More Generality in Efficient Multiple Kernel Learning},

year = {}

}

### OpenURL

### Abstract

Recent advances in Multiple Kernel Learning (MKL) have positioned it as an attractive tool for tackling many supervised learning tasks. The development of efficient gradient descent based optimization schemes has made it possible to tackle large scale problems. Simultaneously, MKL based algorithms have achieved very good results on challenging real world applications. Yet, despite their successes, MKL approaches are limited in that they focus on learning a linear combination of given base kernels. In this paper, we observe that existing MKL formulations can be extended to learn general kernel combinations subject to general regularization. This can be achieved while retaining all the efficiency of existing large scale optimization algorithms. To highlight the advantages of generalized kernel learning, we tackle feature selection problems on benchmark vision and UCI databases. It is demonstrated that the proposed formulation can lead to better results not only as compared to traditional MKL but also as compared to state-of-the-art wrapper and filter methods for feature selection. 1.

### Citations

544 | Learning the kernel matrix with semidefinite programming - Lanckriet, Cristianini, et al. |

299 | Choosing multiple parameters for support vector machines
- Chapelle, Vapnik, et al.
- 2002
(Show Context)
Citation Context ...erve that it is fairly straight forward to extend traditional MKL formulations to handle generic kernel combinations. Furthermore, the gradient descent optimization developed and used in (Bach, 2008; =-=Chapelle et al., 2002-=-; Rakotomamonjy et al., 2008; Varma & Ray, 2007) can still be applied out of the box. It is therefore possible to learn rich feature representations without having to sacrifice any of the advantages o... |

277 | Multiple kernel learning, conic duality, and the SMO algorithm
- Bach, Lanckriet, et al.
- 2004
(Show Context)
Citation Context ...n be learnt as a linear combination of given base kernels. Many MKL formulations have been proposed in the literature. In (Sonnenburg et al., 2006), it was shown that the MKL Block l1 formulation of (=-=Bach et al., 2004-=-) could be expressed as a Semi-infinite Linear Program. Column generation methods and existing SVM solvers could then be used for efficient optimization and to tackle large scale problems involving as... |

238 | On kernel target alignment
- Cristianini, Kandola, et al.
- 2001
(Show Context)
Citation Context ...selection methods in Section 4 and present comparative results to them in Section 5. We conclude in Section 6. 2. Related Work Some of the earliest work on MKL was developed in (Crammer et al., 2002; =-=Cristianini et al., 2001-=-). Their focus was on optimizing loss functions such as kernel target alignment rather than the specific classification or regression problem at hand. This was addressed in the influential work of (La... |

224 | Large scale multiple kernel learning
- Sonnenburg, Rätsch, et al.
- 2006
(Show Context)
Citation Context ...the kernel from training data. In particular, it focuses on how the kernel can be learnt as a linear combination of given base kernels. Many MKL formulations have been proposed in the literature. In (=-=Sonnenburg et al., 2006-=-), it was shown that the MKL Block l1 formulation of (Bach et al., 2004) could be expressed as a Semi-infinite Linear Program. Column generation methods and existing SVM solvers could then be used for... |

148 | Learning the Discriminative Powerinvariance Trade-Off
- Varma, Ray
- 2007
(Show Context)
Citation Context ...raditional MKL formulations to handle generic kernel combinations. Furthermore, the gradient descent optimization developed and used in (Bach, 2008; Chapelle et al., 2002; Rakotomamonjy et al., 2008; =-=Varma & Ray, 2007-=-) can still be applied out of the box. It is therefore possible to learn rich feature representations without having to sacrifice any of the advantages of a well developed, large scale optimization to... |

124 | Scalable training of L1-regularized log-linear models
- Andrew, Gao
- 2007
(Show Context)
Citation Context ...es. Stated in another way, our formulation is capable of reaching the same classification accuracy as MKL with only a sixth of the features. We also present comparative results with AdaBoost, OWL-QN (=-=Andrew & Gao, 2007-=-), LP-SVM (Fung & Mangasarian, 2002), Sparse SVM (Chan et al., 2007) and BAHSIC (Song et al., 2007). The rest of the paper is organized as follows: In Section 2 we review the development of MKL from i... |

77 | Exploring Large Feature Spaces with Hierarchical Multiple Kernel
- Bach
(Show Context)
Citation Context ...ation, MKL approaches do not consider the fundamental question of what are appropriate feature representations for a given task. Some attempts have been made at addressing this issue. Most recently, (=-=Bach, 2008-=-) develop an innovative way of learning linear combinations of an exponential number of kernels of a certain type. The method canMore Generality in Efficient Multiple Kernel Learning therefore be lev... |

67 | Learning gender with support faces
- Moghaddam, Yang
- 2002
(Show Context)
Citation Context ...learning on various feature selection problems. We investigate gender identification from frontal facial images using as few pixels as possible. It is demonstrated that, on the benchmark database of (=-=Moghaddam & Yang, 2002-=-), GMKL can outperform MKL by as much as 10% for a given number of features. Similarly, on theMore Generality in Efficient Multiple Kernel Learning Table 1: Gender identification results. The final r... |

59 | Kernel Design Using Boosting
- Crammer, Keshet, et al.
- 2002
(Show Context)
Citation Context ... review other feature selection methods in Section 4 and present comparative results to them in Section 5. We conclude in Section 6. 2. Related Work Some of the earliest work on MKL was developed in (=-=Crammer et al., 2002-=-; Cristianini et al., 2001). Their focus was on optimizing loss functions such as kernel target alignment rather than the specific classification or regression problem at hand. This was addressed in t... |

50 | A feature selection newton method for support vector machine classification
- Fung, Mangasarian
- 2004
(Show Context)
Citation Context ...r formulation is capable of reaching the same classification accuracy as MKL with only a sixth of the features. We also present comparative results with AdaBoost, OWL-QN (Andrew & Gao, 2007), LP-SVM (=-=Fung & Mangasarian, 2002-=-), Sparse SVM (Chan et al., 2007) and BAHSIC (Song et al., 2007). The rest of the paper is organized as follows: In Section 2 we review the development of MKL from initial work to current state-of-the... |

39 | Learning convex combinations of continuously parameterized basic kernels
- Argyriou, Micchelli, et al.
- 2005
(Show Context)
Citation Context ... & Ray, 2007) via gradient descent optimization and (Bach, 2008) opened up the possibility of training on an exponentially large number of kernels. Other interesting approaches have been proposed in (=-=Argyriou et al., 2005-=-; Ong et al., 2005; Zien & Ong, 2007) and include Hyperkernels and multi-class MKL. Note that these methods essentially learn linear combinations of base kernels subject to l1, or sometimes l2 (Cristi... |

34 | Boosting sex identification performance
- Baluja, Rowley
- 2007
(Show Context)
Citation Context ...kernel weights could not influence the pre-learnt weak classifiers. Of course, in traditional boosting, the weak classifiers and the weights are learnt together and we present comparative results to (=-=Baluja & Rowley, 2007-=-) which represents a stateof-the-art boosting method for gender identification. OWL-QN (Andrew & Gao, 2007) : This is a large scale implementation of l1 logistic regression. The method learns a functi... |

33 |
Supervised Feature Selection via Dependence Estimation
- Song, Smola, et al.
- 2007
(Show Context)
Citation Context ...as MKL with only a sixth of the features. We also present comparative results with AdaBoost, OWL-QN (Andrew & Gao, 2007), LP-SVM (Fung & Mangasarian, 2002), Sparse SVM (Chan et al., 2007) and BAHSIC (=-=Song et al., 2007-=-). The rest of the paper is organized as follows: In Section 2 we review the development of MKL from initial work to current state-of-the-art. We then present our formulation in Section 3. The formula... |

29 |
The theory of Max-Min and its application to weapons allocation problems
- Danskin
- 1967
(Show Context)
Citation Context ...refore, T(d) = W(d) for any given value of d, and it is sufficient for us to show that W is differentiable and calculate ∇dW. Proof of the differentiability of WC and WR comes from Danskin’s Theorem (=-=Danskin, 1967-=-). Since the feasible set is compact, the gradient can be shown to exist if k, r, ∇dk and ∇dr are smoothly varying functions of d and if α ∗ , the value of α that optimizes W, is unique. Furthermore, ... |

17 |
Column-generation boosting methods for mixture of kernels
- Bi, Zhang, et al.
(Show Context)
Citation Context ...od approach in Gaussian Processes. 4. Feature Selection Methods We compare our formulation to traditional MKL as well as the following feature selection methods Boosting : The LPBoost formulation of (=-=Bi et al., 2004-=-) is similar to that of standard MKL and boosting generalizes standard MKL’s decision function. Boosting can therefore be used to learn linear combinations of base kernels. Individual “weak classifier... |

16 | Direct convex relaxations of sparse svm
- Chan, Vasconcelos, et al.
(Show Context)
Citation Context ...e same classification accuracy as MKL with only a sixth of the features. We also present comparative results with AdaBoost, OWL-QN (Andrew & Gao, 2007), LP-SVM (Fung & Mangasarian, 2002), Sparse SVM (=-=Chan et al., 2007-=-) and BAHSIC (Song et al., 2007). The rest of the paper is organized as follows: In Section 2 we review the development of MKL from initial work to current state-of-the-art. We then present our formul... |

6 |
On kernel-target alignment. NIPS
- Cristianini, Shawe-Taylor, et al.
- 2001
(Show Context)
Citation Context ...selection methods in Section 4 and present comparative results to them in Section 5. We conclude in Section 6. 2. Related Work Some of the earliest work on MKL was developed in (Crammer et al., 2002; =-=Cristianini et al., 2001-=-). Their focus was on optimizing loss functions such as kernel target alignment rather than the specific classification or regression problem at hand. This was addressed in the influential work of (La... |

1 |
Kernel design using boosting. NIPS
- Crammer, Keshet, et al.
- 2002
(Show Context)
Citation Context ... review other feature selection methods in Section 4 and present comparative results to them in Section 5. We conclude in Section 6. 2. Related Work Some of the earliest work on MKL was developed in (=-=Crammer et al., 2002-=-; Cristianini et al., 2001). Their focus was on optimizing loss functions such as kernel target alignment rather than the specific classification or regression problem at hand. This was addressed in t... |