## Multi-Task Feature and Kernel Selection for SVMs (2004)

### Cached

### Download Links

- [www.aicml.cs.ualberta.ca]
- [www1.cs.columbia.edu]
- [www.cs.columbia.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | Proc. of ICML 2004 |

Citations: | 68 - 3 self |

### BibTeX

@INPROCEEDINGS{Jebara04multi-taskfeature,

author = {Tony Jebara},

title = {Multi-Task Feature and Kernel Selection for SVMs},

booktitle = {Proc. of ICML 2004},

year = {2004}

}

### Years of Citing Articles

### OpenURL

### Abstract

We compute a common feature selection or kernel selection con guration for multiple support vector machines (SVMs) trained on dierent yet inter-related datasets. The method is advantageous when multiple classi- cation tasks and dierently labeled datasets exist over a common input space. Dierent datasets can mutually reinforce a common choice of representation or relevant features for their various classi ers. We derive a multi-task representation learning approach using the maximum entropy discrimination formalism. The resulting convex algorithms maintain the global solution properties of support vector machines. However, in addition to multiple SVM classi cation /regression parameters they also jointly estimate an optimal subset of features or optimal combination of kernels. Experiments are shown on standardized datasets.

### Citations

583 | Learning the kernel matrix with semidefinite programming - Lanckriet, Cristianini, et al. |

500 | Multitask learning
- Caruana
- 1997
(Show Context)
Citation Context ... . Multi-task learning or meta-learning leverages these many datasets synergistically, aggregating them and augmenting the eective size of the total training data (Baxter, 1995; Thrun & Pratt, 1997; C=-=aruana, 199-=-7). This can lead to improvement in overall classication and regression performance compared to learning the tasks in isolation. We elaborate metalearning in a support vector machine setting and focus... |

151 | A model of inductive bias learning
- Baxter
(Show Context)
Citation Context ...assication or regression machine by using the maximum entropy discrimination (MED) framework. While previous eorts suggestsnding these representations for a single task, recent theoretical results (Ba=-=xter, 2000-=-; Ben-David & Schuller, 2003) suggest that improvements are possible with multi-task learning. This article combines the above motivations into a joint multi-task feature and kernel selection SVM fram... |

106 | Learning to Learn - Thrun, Pratt - 1998 |

92 | Exploiting task relatedness for multiple task learning
- Ben-David, Schuller
(Show Context)
Citation Context ...gression machine by using the maximum entropy discrimination (MED) framework. While previous efforts suggest finding these representations for a single task, recent theoretical results (Baxter, 2000; =-=Ben-David & Schuller, 2003-=-) suggest that improvements are possible with multi-task learning. This article combines the above motivations into a joint multi-task feature and kernel selection SVM framework. This paper is organiz... |

91 | Learning internal representations
- Baxter
- 1995
(Show Context)
Citation Context ...ssication or regression scenarios) . Multi-task learning or meta-learning leverages these many datasets synergistically, aggregating them and augmenting the eective size of the total training data (Ba=-=xter, 199-=-5; Thrun & Pratt, 1997; Caruana, 1997). This can lead to improvement in overall classication and regression performance compared to learning the tasks in isolation. We elaborate metalearning in a supp... |

90 | Model selection for Support Vector Machines
- Chapelle, Vapnik
- 2000
(Show Context)
Citation Context ...g SVM will often hinge on the appropriate choice of the kernel. However, searching for different kernels either via trialand-error or other exhaustive means can be a computationally daunting problem (=-=Chapelle & Vapnik, 1999-=-). This search is particularly difficult if we also want to consider combining kernels in convex combinations (a continuous optimization problem) to mix various nonlinear mappings in search of the opt... |

48 | 2000, ‘Feature Selection and Dualities in Maximum Entropy Discrimination
- Jebara, Jaakkola
(Show Context)
Citation Context ...ppearing in Proceedings of the 21 st International Conference on Machine Learning, Banff, Canada, 2004. Copyright 2004 by the first author. representation either as a feature selection configuration (=-=Jebara & Jaakkola, 2000-=-; Weston et al., 2000) or as a convex kernel combination (Lanckriet et al., 2002; Cristianini et al., 2001). This is done jointly while estimating a support vector classification or regression machine... |

8 |
Regularized multi–task learning
- Evegniou, Pontil
- 2004
(Show Context)
Citation Context ...we start with a factorized prior over tasks yet will eventually find a posterior that need not remain factorized afterwards (starting with non-factorized priors may be possible as in related work by (=-=Evegniou & Pontil, 2004-=-)). We again utilize zero-mean white Gaussian priors for models, Bernoulli priors for switches, zeromean Gaussian priors for biases and exponential priors for margins. We can now readily evaluate the ... |

6 |
On kernel-target alignment. NIPS
- Cristianini, Shawe-Taylor, et al.
- 2001
(Show Context)
Citation Context ...opyright 2004 by the first author. representation either as a feature selection configuration (Jebara & Jaakkola, 2000; Weston et al., 2000) or as a convex kernel combination (Lanckriet et al., 2002; =-=Cristianini et al., 2001-=-). This is done jointly while estimating a support vector classification or regression machine by using the maximum entropy discrimination (MED) framework. While previous efforts suggest finding these... |

3 |
Maximum entropy discrimination. NIPS
- Jaakkola, Meila, et al.
- 1999
(Show Context)
Citation Context ...rleaved within each section when appropriate. Section 7 ends with a discussion. 2. Support Vector Machines as a Maximum Entropy Problem The maximum entropy discrimination MED formalism introduced in (=-=Jaakkola et al., 1999-=-) is a flexible generalization of support vector machines. MED produces a solution that is a distribution of parameter models P(Θ) rather than finding a single parameter setting Θ ∗ . It is this chara... |

3 | Learning the kernel matrix with semi-de programming - Lanckriet, Cristianini, et al. |