## Multiple kernel learning, conic duality, and the SMO algorithm (2004)

### Cached

### Download Links

- [www.cs.uoregon.edu]
- [www.aicml.cs.ualberta.ca]
- [kingman.cs.ualberta.ca]
- [cosmal.ucsd.edu]
- [www.machinelearning.org]
- [cosmal.ucsd.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | In Proceedings of the 21st International Conference on Machine Learning (ICML |

Citations: | 277 - 29 self |

### BibTeX

@INPROCEEDINGS{Bach04multiplekernel,

author = {Francis R. Bach and Gert R. G. Lanckriet},

title = {Multiple kernel learning, conic duality, and the SMO algorithm},

booktitle = {In Proceedings of the 21st International Conference on Machine Learning (ICML},

year = {2004}

}

### Years of Citing Articles

### OpenURL

### Abstract

While classical kernel-based classifiers are based on a single kernel, in practice it is often desirable to base classifiers on combinations of multiple kernels. Lanckriet et al. (2004) considered conic combinations of kernel matrices for the support vector machine (SVM), and showed that the optimization of the coefficients of such a combination reduces to a convex optimization problem known as a quadratically-constrained quadratic program (QCQP). Unfortunately, current convex optimization toolboxes can solve this problem only for a small number of kernels and a small number of data points; moreover, the sequential minimal optimization (SMO) techniques that are essential in large-scale implementations of the SVM cannot be applied because the cost function is non-differentiable. We propose a novel dual formulation of the QCQP as a second-order cone programming problem, and show how to exploit the technique of Moreau-Yosida regularization to yield a formulation to which SMO techniques can be applied. We present experimental results that show that our SMO-based algorithm is significantly more efficient than the general-purpose interior point methods available in current optimization toolboxes. 1.

### Citations

745 |
Nonlinear Programming
- Bertsekas
- 2004
(Show Context)
Citation Context ... 3.1). Unfortunately, as is well known in the non-smooth optimization literature, this means that simple local descent algorithms such as SMO may fail to converge or may converge to incorrect values (=-=Bertsekas, 1995-=-). Indeed, in preliminary attempts to solve the QCQP using SMO we ran into exactly these convergence problems. One class of solutions to non-smooth optimization problems involves constructing a smooth... |

545 | Learning the kernel matrix with semidefinite programming
- Lanckriet, Cristianini, et al.
(Show Context)
Citation Context ...el learning” problem can in principle be solved via cross-validation, several recent papers have focused on more efficient methods for kernel learning (Chapelle et al., 2002; Grandvalet & Canu, 2003; =-=Lanckriet et al., 2004-=-; Ong et al., 2003). In this paper we focus on the framework proposed by Lanckriet et al. (2004), which involves joint optimization of the coefficients in a conic combination of kernel matrices and th... |

468 | Making large-scale support vector machine learning practical
- Joachims
- 1999
(Show Context)
Citation Context ...o not suffice in large-scale applications of the SVM, and a second major reason for the rise to prominence of the SVM is the development of special-purpose algorithms for solving the QP (Platt, 1998; =-=Joachims, 1998-=-; Keerthi et al., 2001). Recent developments in the literature on the SVM and other kernel methods have emphasized the need to consider multiple kernels, or parameterizations of kernels, and not a sin... |

323 | Algorithms for Minimization Without Derivatives
- Brent
- 1973
(Show Context)
Citation Context ...rformed in closed form for the MY-regularized SKM. However, since each line search is the minimization of a convex function, we can use efficient one-dimensional root finding, such as Brent’s method (=-=Brent, 1973-=-). 4.3. Theoretical bounds In order to be able to check efficiently the approximate optimality condition (OPT3) of Section 3.3, we need estimates for α and η from the solution of the MY-regularized SK... |

299 | Choosing multiple parameters for support vector machines
- Chapelle, Vapnik, et al.
- 2002
(Show Context)
Citation Context ...ata sources. While this so-called “multiple kernel learning” problem can in principle be solved via cross-validation, several recent papers have focused on more efficient methods for kernel learning (=-=Chapelle et al., 2002-=-; Grandvalet & Canu, 2003; Lanckriet et al., 2004; Ong et al., 2003). In this paper we focus on the framework proposed by Lanckriet et al. (2004), which involves joint optimization of the coefficients... |

191 | K.: “Improvements to Platt’s SMO algorithm for SVM classifier design”, Neural Computation 13
- Keerthi, Shevade, et al.
- 2001
(Show Context)
Citation Context ... large-scale applications of the SVM, and a second major reason for the rise to prominence of the SVM is the development of special-purpose algorithms for solving the QP (Platt, 1998; Joachims, 1998; =-=Keerthi et al., 2001-=-). Recent developments in the literature on the SVM and other kernel methods have emphasized the need to consider multiple kernels, or parameterizations of kernels, and not a single fixed kernel. This... |

189 | Feature selection via concave minimization and support vector machines - Bradley, Mangasarian - 1998 |

152 | Application of secondorder cone programming
- Lobo, Vandenberghe, et al.
- 1998
(Show Context)
Citation Context ... duality and optimality conditions For a given optimization problem there are many ways of deriving a dual problem. In our particular case, we treat problem (P) as a second-order cone program (SOCP) (=-=Lobo et al., 1998-=-), which yields the following dual (see Appendix A for the derivation): min 1 2 γ2 − α ⊤ e (D) w.r.t. γ ∈ R,α ∈ R n s.t. 0 � α � C, α ⊤ y = 0 || ∑ i αiyixji||2 � djγ, ∀j ∈ {1,...,m}. In addition, the ... |

64 |
The MOSEK Interior Point Optimizer for Linear Programming: An Implementation of the Homogeneous Algorithm
- Andersen, A
- 2000
(Show Context)
Citation Context ...roblem—a quadratically-constrained quadratic program (QCQP). This problem is more challenging than a QP, but it can also be solved in principle by generalpurpose optimization toolboxes such as Mosek (=-=Andersen & Andersen, 2000-=-). Again, however, this existing algorithmic solution suffices only for small problems (small numbers of kernels and data points), and improved algorithmic solutions akin to sequential minimization op... |

49 | Practical aspects of the moreau-yosida regularization: Theoretical preliminaries
- Lemarechal, Sagastizábal, et al.
- 1997
(Show Context)
Citation Context ... constructing a smooth approximate problem out of a non-smooth problem. In particular, Moreau-Yosida (MY) regularization is an effective general solution methodology that is based on inf-convolution (=-=Lemarechal & Sagastizabal, 1997-=-). It can be viewed in terms of the dual problem as simply adding a quadratic regularization term to the dual objective function. Unfortunately, in our setting, this creates a new difficulty—we lose t... |

4 |
Adaptive Scaling for Feature Selection in SVMs. Neural Information Processing Systems
- Grandvalet, Canu
- 2002
(Show Context)
Citation Context ... so-called “multiple kernel learning” problem can in principle be solved via cross-validation, several recent papers have focused on more efficient methods for kernel learning (Chapelle et al., 2002; =-=Grandvalet & Canu, 2003-=-; Lanckriet et al., 2004; Ong et al., 2003). In this paper we focus on the framework proposed by Lanckriet et al. (2004), which involves joint optimization of the coefficients in a conic combination o... |

2 |
Hyperkernels. Neural Information Processing Systems
- Ong, Smola
- 2003
(Show Context)
Citation Context ... in principle be solved via cross-validation, several recent papers have focused on more efficient methods for kernel learning (Chapelle et al., 2002; Grandvalet & Canu, 2003; Lanckriet et al., 2004; =-=Ong et al., 2003-=-). In this paper we focus on the framework proposed by Lanckriet et al. (2004), which involves joint optimization of the coefficients in a conic combination of kernel matrices and the coefficients of ... |