## Learning non-linear combinations of kernels (2009)

### Cached

### Download Links

Venue: | In NIPS |

Citations: | 24 - 2 self |

### BibTeX

@INPROCEEDINGS{Cortes09learningnon-linear,

author = {Corinna Cortes and Mehryar Mohri and Afshin Rostamizadeh},

title = {Learning non-linear combinations of kernels},

booktitle = {In NIPS},

year = {2009}

}

### OpenURL

### Abstract

This paper studies the general problem of learning kernels based on a polynomial combination of base kernels. We analyze this problem in the case of regression and the kernel ridge regression algorithm. We examine the corresponding learning kernel optimization problem, show how that minimax problem can be reduced to a simpler minimization problem, and prove that the global solution of this problem always lies on the boundary. We give a projection-based gradient descent algorithm for solving the optimization problem, shown empirically to converge in few iterations. Finally, we report the results of extensive experiments with this algorithm using several publicly available datasets demonstrating the effectiveness of our technique. 1

### Citations

8980 | Statistical Learning Theory
- Vapnik
- 1998
(Show Context)
Citation Context ...ccess in a variety of tasks [17,19]. Classification algorithms such as support vector machines (SVMs) [6, 10], regression algorithms, e.g., kernel ridge regression and support vector regression (SVR) =-=[16,22]-=-, and general dimensionality reduction algorithms such as kernel PCA (KPCA) [18] all benefit from kernel methods. Positive definite symmetric (PDS) kernel functions implicitly specify an inner product... |

2171 | Support-vector networks
- Cortes, Vapnik
- 1995
(Show Context)
Citation Context ... of our technique. 1 Introduction Learning algorithms based on kernels have been used with much success in a variety of tasks [17,19]. Classification algorithms such as support vector machines (SVMs) =-=[6, 10]-=-, regression algorithms, e.g., kernel ridge regression and support vector regression (SVR) [16,22], and general dimensionality reduction algorithms such as kernel PCA (KPCA) [18] all benefit from kern... |

2028 | Learning with Kernels
- Scholkopf, Smola
- 2002
(Show Context)
Citation Context ... using several publicly available datasets demonstrating the effectiveness of our technique. 1 Introduction Learning algorithms based on kernels have been used with much success in a variety of tasks =-=[17,19]-=-. Classification algorithms such as support vector machines (SVMs) [6, 10], regression algorithms, e.g., kernel ridge regression and support vector regression (SVR) [16,22], and general dimensionality... |

1291 | A training algorithm for optimal margin classifiers
- Boser, Guyon, et al.
(Show Context)
Citation Context ... of our technique. 1 Introduction Learning algorithms based on kernels have been used with much success in a variety of tasks [17,19]. Classification algorithms such as support vector machines (SVMs) =-=[6, 10]-=-, regression algorithms, e.g., kernel ridge regression and support vector regression (SVR) [16,22], and general dimensionality reduction algorithms such as kernel PCA (KPCA) [18] all benefit from kern... |

1048 | Nonlinear component analysis as a kernel eigenvalue problem
- Schölkopf, Smola, et al.
- 1998
(Show Context)
Citation Context ...or machines (SVMs) [6, 10], regression algorithms, e.g., kernel ridge regression and support vector regression (SVR) [16,22], and general dimensionality reduction algorithms such as kernel PCA (KPCA) =-=[18]-=- all benefit from kernel methods. Positive definite symmetric (PDS) kernel functions implicitly specify an inner product in a high-dimension Hilbert space where large-margin solutions are sought. So l... |

786 |
Kernel Methods for Pattern Analysis
- Shawe-Taylor, Cristianini
- 2004
(Show Context)
Citation Context ... using several publicly available datasets demonstrating the effectiveness of our technique. 1 Introduction Learning algorithms based on kernels have been used with much success in a variety of tasks =-=[17,19]-=-. Classification algorithms such as support vector machines (SVMs) [6, 10], regression algorithms, e.g., kernel ridge regression and support vector regression (SVR) [16,22], and general dimensionality... |

544 | Learning the kernel matrix with semidefinite programming
- Lanckriet, Cristianini, et al.
(Show Context)
Citation Context ...rkernels [14] or general convex classes of kernels [2], the great majority of analyses and algorithmic results focus on learning finite linear combinations of base kernels as originally considered by =-=[12]-=-. However, despite the substantial progress made in the theoretical understanding and the design of efficient algorithms for the problem of learning such linear combinations of kernels, no method seem... |

299 | Choosing multiple parameters for support vector machines
- Chapelle, Vapnik, et al.
- 2002
(Show Context)
Citation Context ...monstrate the effectiveness of the algorithm under a number of conditions. For general performance improvement, we chose a number of UCI datasets frequently used in kernel learning experiments, e.g., =-=[7,12,15]-=-. For learning with thousands of kernels, we chose the sentiment analysis dataset of Blitzer et. al [5]. Finally, for learning with higher-order polynomials, we selected datasets with large number of ... |

221 | Large scale multiple kernel learning
- Sonnenburg, Rätsch, et al.
(Show Context)
Citation Context ...arch. Furthermore, as shown later, the thresholding step that forces µ ′ to be positive is unnecessary since ∇F is never positive. Note that Algorithm 1 is simpler than the wrapper method proposed by =-=[20]-=-. Because of the closed form expression (10), we do not alternate between solving for the dual variables and performing a gradient step in the kernel parameters. We only need to optimize with respect ... |

106 |
bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification
- Biographies
(Show Context)
Citation Context ...t, we chose a number of UCI datasets frequently used in kernel learning experiments, e.g., [7,12,15]. For learning with thousands of kernels, we chose the sentiment analysis dataset of Blitzer et. al =-=[5]-=-. Finally, for learning with higher-order polynomials, we selected datasets with large number of examples such as kin-8nm from the Delve repository. The experiments were run on a 2.33 GHz Intel Xeon P... |

104 | Ridge regression learning algorithm in dual variables
- Saunders, Gammerman, et al.
- 1998
(Show Context)
Citation Context ...ccess in a variety of tasks [17,19]. Classification algorithms such as support vector machines (SVMs) [6, 10], regression algorithms, e.g., kernel ridge regression and support vector regression (SVR) =-=[16,22]-=-, and general dimensionality reduction algorithms such as kernel PCA (KPCA) [18] all benefit from kernel methods. Positive definite symmetric (PDS) kernel functions implicitly specify an inner product... |

96 | Learning the kernel function via regularization - Micchelli, Pontil |

78 | Learning the kernel with hyperkernels
- Ong, Smola, et al.
(Show Context)
Citation Context ...analysis of the problem both in classification and regression [1, 8, 9, 11, 13, 15, 21]. With the exception of a few publications considering infinite-dimensional kernel families such as hyperkernels =-=[14]-=- or general convex classes of kernels [2], the great majority of analyses and algorithmic results focus on learning finite linear combinations of base kernels as originally considered by [12]. However... |

76 | Exploring large feature spaces with hierarchical multiple kernel learning
- Bach
(Show Context)
Citation Context ... considered by [23]. However, here too, experimental results have not demonstrated a consistent performance improvement for the general 1learning task. Another method, hierarchical multiple learning =-=[3]-=-, considers learning a linear combination of an exponential number of linear kernels, which can be efficiently represented as a product of sums. Thus, this method can also be classified as learning a ... |

66 | Multi-task feature and kernel selection for svms
- Jebara
- 2004
(Show Context)
Citation Context ...ects of this problem, including deriving efficient solutions to the optimization problems it generates and providing a better theoretical analysis of the problem both in classification and regression =-=[1, 8, 9, 11, 13, 15, 21]-=-. With the exception of a few publications considering infinite-dimensional kernel families such as hyperkernels [14] or general convex classes of kernels [2], the great majority of analyses and algor... |

46 | More generality in efficient multiple kernel learning
- Varma, Babu
- 2009
(Show Context)
Citation Context ...kshop). This suggests exploring other non-linear families of kernels to obtain consistent and significant performance improvements. Non-linear combinations of kernels have been recently considered by =-=[23]-=-. However, here too, experimental results have not demonstrated a consistent performance improvement for the general 1learning task. Another method, hierarchical multiple learning [3], considers lear... |

39 | Learning convex combinations of continuously parameterized basic kernels - Argyriou, Micchelli, et al. - 2005 |

34 | A dc-programming algorithm for kernel selection
- ARGYRIOU, HAUSER, et al.
- 2006
(Show Context)
Citation Context ...ects of this problem, including deriving efficient solutions to the optimization problems it generates and providing a better theoretical analysis of the problem both in classification and regression =-=[1, 8, 9, 11, 13, 15, 21]-=-. With the exception of a few publications considering infinite-dimensional kernel families such as hyperkernels [14] or general convex classes of kernels [2], the great majority of analyses and algor... |

28 | Learning bounds for support vector machines with learned kernels
- Srebro, Ben-David
- 2006
(Show Context)
Citation Context ...ects of this problem, including deriving efficient solutions to the optimization problems it generates and providing a better theoretical analysis of the problem both in classification and regression =-=[1, 8, 9, 11, 13, 15, 21]-=-. With the exception of a few publications considering infinite-dimensional kernel families such as hyperkernels [14] or general convex classes of kernels [2], the great majority of analyses and algor... |

20 | L2 regularization for learning kernels
- Cortes, Mohri, et al.
- 2009
(Show Context)
Citation Context |

5 | Learning sequence kernels
- Cortes, Mohri, et al.
(Show Context)
Citation Context |