Results 1  10
of
109
Large scale multiple kernel learning
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... While classical kernelbased learning algorithms are based on a single kernel, in practice it is often desirable to use multiple kernels. Lanckriet et al. (2004) considered conic combinations of kernel matrices for classification, leading to a convex quadratically constrained quadratic program. We s ..."
Abstract

Cited by 222 (18 self)
 Add to MetaCart
While classical kernelbased learning algorithms are based on a single kernel, in practice it is often desirable to use multiple kernels. Lanckriet et al. (2004) considered conic combinations of kernel matrices for classification, leading to a convex quadratically constrained quadratic program. We show that it can be rewritten as a semiinfinite linear program that can be efficiently solved by recycling the standard SVM implementations. Moreover, we generalize the formulation and our method to a larger class of problems, including regression and oneclass classification. Experimental results show that the proposed algorithm works for hundred thousands of examples or hundreds of kernels to be combined, and helps for automatic model selection, improving the interpretability of the learning result. In a second part we discuss general speed up mechanism for SVMs, especially when used with sparse feature maps as appear for string kernels, allowing us to train a string kernel SVM on a 10 million realworld splice data set from computational biology. We integrated multiple kernel learning in our machine learning toolbox SHOGUN for which the source code is publicly available at
Consistency of the group lasso and multiple kernel learning
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2007
"... We consider the leastsquare regression problem with regularization by a block 1norm, i.e., a sum of Euclidean norms over spaces of dimensions larger than one. This problem, referred to as the group Lasso, extends the usual regularization by the 1norm where all spaces have dimension one, where it ..."
Abstract

Cited by 162 (28 self)
 Add to MetaCart
We consider the leastsquare regression problem with regularization by a block 1norm, i.e., a sum of Euclidean norms over spaces of dimensions larger than one. This problem, referred to as the group Lasso, extends the usual regularization by the 1norm where all spaces have dimension one, where it is commonly referred to as the Lasso. In this paper, we study the asymptotic model consistency of the group Lasso. We derive necessary and sufficient conditions for the consistency of group Lasso under practical assumptions, such as model misspecification. When the linear predictors and Euclidean norms are replaced by functions and reproducing kernel Hilbert norms, the problem is usually referred to as multiple kernel learning and is commonly used for learning from heterogeneous data sources and for non linear variable selection. Using tools from functional analysis, and in particular covariance operators, we extend the consistency results to this infinite dimensional case and also propose an adaptive scheme to obtain a consistent model estimate, even when the necessary condition required for the non adaptive scheme is not satisfied.
More efficiency in multiple kernel learning
 In ICML
, 2007
"... An efficient and general multiple kernel learning (MKL) algorithm has been recently proposed by Sonnenburg et al. (2006). This approach has opened new perspectives since it makes the MKL approach tractable for largescale problems, by iteratively using existing support vector machine code. However, i ..."
Abstract

Cited by 57 (6 self)
 Add to MetaCart
An efficient and general multiple kernel learning (MKL) algorithm has been recently proposed by Sonnenburg et al. (2006). This approach has opened new perspectives since it makes the MKL approach tractable for largescale problems, by iteratively using existing support vector machine code. However, it turns out that this iterative algorithm needs several iterations before converging towards a reasonable solution. In this paper, we address the MKL problem through an adaptive 2norm regularization formulation. Weights on each kernel matrix are included in the standard SVM empirical risk minimization problem with a ℓ1 constraint to encourage sparsity. We propose an algorithm for solving this problem and provide an new insight on MKL algorithms based on block 1norm regularization by showing that the two approaches are equivalent. Experimental results show that the resulting algorithm converges rapidly and its efficiency compares favorably to other MKL algorithms. 1.
Multiclass multiple kernel learning
 In ICML. ACM
"... In many applications it is desirable to learn from several kernels. “Multiple kernel learning” (MKL) allows the practitioner to optimize over linear combinations of kernels. By enforcing sparse coefficients, it also generalizes feature selection to kernel selection. We propose MKL for joint feature ..."
Abstract

Cited by 44 (3 self)
 Add to MetaCart
In many applications it is desirable to learn from several kernels. “Multiple kernel learning” (MKL) allows the practitioner to optimize over linear combinations of kernels. By enforcing sparse coefficients, it also generalizes feature selection to kernel selection. We propose MKL for joint feature maps. This provides a convenient and principled way for MKL with multiclass problems. In addition, we can exploit the joint feature map to learn kernels on output spaces. We show the equivalence of several different primal formulations including different regularizers. We present several optimization methods, and compare a convex quadratically constrained quadratic program (QCQP) and two semiinfinite linear programs (SILPs) on toy data, showing that the SILPs are faster than the QCQP. We then demonstrate the utility of our method by applying the SILP to three real world datasets. 1.
Efficient Margin Maximizing with Boosting
, 2002
"... AdaBoost produces a linear combination of base hypotheses and predicts with the sign of this linear combination. It has been observed that the generalization error of the algorithm continues to improve even after all examples are classified correctly by the current signed linear combination, whic ..."
Abstract

Cited by 35 (7 self)
 Add to MetaCart
AdaBoost produces a linear combination of base hypotheses and predicts with the sign of this linear combination. It has been observed that the generalization error of the algorithm continues to improve even after all examples are classified correctly by the current signed linear combination, which can be viewed as hyperplane in feature space where the base hypotheses form the features.
Learning interpretable SVMs for biological sequence classification
 BMC BIOINFORMATICS
, 2005
"... We propose novel algorithms for solving the socalled Support Vector Multiple Kernel Learning problem and show how they can be used to understand the resulting support vector decision function. While classical kernelbased algorithms (such as SVMs) are based on a single kernel, in Multiple Kernel Le ..."
Abstract

Cited by 32 (9 self)
 Add to MetaCart
We propose novel algorithms for solving the socalled Support Vector Multiple Kernel Learning problem and show how they can be used to understand the resulting support vector decision function. While classical kernelbased algorithms (such as SVMs) are based on a single kernel, in Multiple Kernel Learning a quadraticallyconstraint quadratic program is solved in order to find a sparse convex combination of a set of support vector kernels. We show how this problem can be cast into a semiinfinite linear optimization problem which can in turn be solved efficiently using a boostinglike iterative method in combination with standard SVM optimization algorithms. The proposed method is able to deal with thousands of examples while combining hundreds of kernels within reasonable time. In the second part we show how this technique can be used to understand the obtained decision function in order to extract biologically relevant knowledge about the sequence analysis problem at hand. We consider the problem of splice site identification and combine string kernels at different sequence positions and with various substring (oligomer) lengths. The proposed algorithm computes a sparse weighting over the length and the substring, highlighting which substrings are important for discrimination. Finally, we propose a bootstrap scheme in order to reliably identify a few statistically significant positions, which can then be used for further analysis such as consensus finding.
Optimal kernel selection in kernel Fisher discriminant analysis
 In Proceedings of the TwentyThird International Conference on Machine Learning
, 2006
"... In Kernel Fisher discriminant analysis (KFDA), we carry out Fisher linear discriminant analysis in a high dimensional feature space defined implicitly by a kernel. The performance of KFDA depends on the choice of the kernel; in this paper, we consider the problem of finding the optimal kernel, over ..."
Abstract

Cited by 28 (1 self)
 Add to MetaCart
In Kernel Fisher discriminant analysis (KFDA), we carry out Fisher linear discriminant analysis in a high dimensional feature space defined implicitly by a kernel. The performance of KFDA depends on the choice of the kernel; in this paper, we consider the problem of finding the optimal kernel, over a given convex set of kernels. We show that this optimal kernel selection problem can be reformulated as a tractable convex optimization problem which interiorpoint methods can solve globally and efficiently. The kernel selection method is demonstrated with some UCI machine learning benchmark examples. 1.