Results 1  10
of
19
Multiple kernel learning algorithms
 JMLR
, 2011
"... In recent years, several methods have been proposed to combine multiple kernels instead of using a single one. These different kernels may correspond to using different notions of similarity or may be using information coming from multiple sources (different representations or different feature subs ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
(Show Context)
In recent years, several methods have been proposed to combine multiple kernels instead of using a single one. These different kernels may correspond to using different notions of similarity or may be using information coming from multiple sources (different representations or different feature subsets). In trying to organize and highlight the similarities and differences between them, we give a taxonomy of and review several multiple kernel learning algorithms. We perform experiments on real data sets for better illustration and comparison of existing algorithms. We see that though there may not be large differences in terms of accuracy, there is difference between them in complexity as given by the number of stored support vectors, the sparsity of the solution as given by the number of used kernels, and training time complexity. We see that overall, using multiple kernels instead of a single one is useful and believe that combining kernels in a nonlinear or datadependent way seems more promising than linear combination in fusing information provided by simple linear kernels, whereas linear methods are more reasonable when combining complex Gaussian kernels.
ℓpnorm multiple kernel learning
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2011
"... Learning linear combinations of multiple kernels is an appealing strategy when the right choice of features is unknown. Previous approaches to multiple kernel learning (MKL) promote sparse kernel combinations to support interpretability and scalability. Unfortunately, thisℓ1norm MKL is rarely obser ..."
Abstract

Cited by 24 (3 self)
 Add to MetaCart
Learning linear combinations of multiple kernels is an appealing strategy when the right choice of features is unknown. Previous approaches to multiple kernel learning (MKL) promote sparse kernel combinations to support interpretability and scalability. Unfortunately, thisℓ1norm MKL is rarely observed to outperform trivial baselines in practical applications. To allow for robust kernel mixtures that generalize well, we extend MKL to arbitrary norms. We devise new insights on the connection between several existing MKL formulations and develop two efficient interleaved optimization strategies for arbitrary norms, that isℓpnorms with p≥1. This interleaved optimization is much faster than the commonly used wrapper approaches, as demonstrated on several data sets. A theoretical analysis and an experiment on controlled artificial data shed light on the appropriateness of sparse, nonsparse and ℓ∞norm MKL in various scenarios. Importantly, empirical applications of ℓpnorm MKL to three realworld problems from computational biology show that nonsparse MKL achieves accuracies that surpass the stateoftheart. Data sets, source code to reproduce the experiments, implementations of the algorithms, and
A Family of Simple NonParametric Kernel Learning Algorithms
"... Previous studies of NonParametric Kernel Learning (NPKL) usually formulate the learning task as a SemiDefinite Programming (SDP) problem that is often solved by some general purpose SDP solvers. However, for N data examples, the time complexity of NPKL using a standard interiorpoint SDP solver cou ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
Previous studies of NonParametric Kernel Learning (NPKL) usually formulate the learning task as a SemiDefinite Programming (SDP) problem that is often solved by some general purpose SDP solvers. However, for N data examples, the time complexity of NPKL using a standard interiorpoint SDP solver could be as high as O(N 6.5), which prohibits NPKL methods applicable to real applications, even for data sets of moderate size. In this paper, we present a family of efficient NPKL algorithms, termed “SimpleNPKL”, which can learn nonparametric kernels from a large set of pairwise constraints efficiently. In particular, we propose two efficient SimpleNPKL algorithms. One is SimpleNPKL algorithm with linear loss, which enjoys a closedform solution that can be efficiently computed by the Lanczos sparse eigen decomposition technique. Another one is SimpleNPKL algorithm with other loss functions (including square hinge loss, hinge loss, square loss) that can be reformulated as a saddlepoint optimization problem, which can be further resolved by a fast iterative algorithm. In contrast to the previous NPKL approaches, our empirical results show that the proposed new technique, maintaining the same accuracy, is significantly more efficient and scalable. Finally, we also demonstrate that the proposed new technique is also applicable to speed up many kernel learning tasks, including colored maximum variance unfolding, minimum volume embedding, and structure preserving embedding.
Improving Web Image Search by Bagbased Reranking
, 2011
"... Given a textual query in traditional textbased image retrieval (TBIR), relevant images are to be reranked using visual features after the initial textbased search. In this paper, we propose a new bagbased reranking framework for largescale TBIR. Specifically, we first cluster relevant images us ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
Given a textual query in traditional textbased image retrieval (TBIR), relevant images are to be reranked using visual features after the initial textbased search. In this paper, we propose a new bagbased reranking framework for largescale TBIR. Specifically, we first cluster relevant images using both textual and visual features. By treating each cluster as a “bag ” and the images in the bag as “instances, ” we formulate this problem as a multiinstance (MI) learning problem. MI learning methods such as miSVM can be readily incorporated into our bagbased reranking framework. Observing that at least a certain portion of a positive bag is of positive instances while a negative bag might also contain positive instances, we further use a more suitable generalized MI (GMI) setting for this application. To address the ambiguities on the instance labels in the positive and negative bags under this GMI setting, we develop a new method referred to as GMISVM to enhance retrieval performance by propagating the labels from the bag level to the instance level. To acquire bag annotations for (G)MI learning, we propose a bag ranking method to rank all the bags according to the defined bag ranking score. The top ranked bags are used as pseudopositive training bags, while pseudonegative training bags can be obtained by randomly sampling a few irrelevant images that are not associated with the textual query. Comprehensive experiments on the challenging realworld data set NUSWIDE demonstrate our framework with automatic bag annotation can achieve the best performances compared with existing image reranking methods. Our experiments also demonstrate that GMISVM can achieve better performances when using the manually labeled training bags obtained from relevance feedback.
TwoLayer Multiple Kernel Learning
"... Multiple Kernel Learning (MKL) aims to learn kernel machines for solving a real machine learning problem (e.g. classification) by exploring the combinations of multiple kernels. The traditional MKL approach is in general “shallow ” in the sense that the target kernel is simply a linear (or convex) c ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
(Show Context)
Multiple Kernel Learning (MKL) aims to learn kernel machines for solving a real machine learning problem (e.g. classification) by exploring the combinations of multiple kernels. The traditional MKL approach is in general “shallow ” in the sense that the target kernel is simply a linear (or convex) combination of some base kernels. In this paper, we investigate a framework of MultiLayer Multiple Kernel Learning (MLMKL) that aims to learn “deep ” kernel machines by exploring the combinations of multiple kernels in a multilayer structure, which goes beyond the conventional MKL approach. Through a multiple layer mapping, the proposed MLMKL framework offers higher flexibility than the regular MKL for finding the optimal kernel for applications. As the first attempt to this new MKL framework, we present a TwoLayer Multiple Kernel Learning (2LMKL) method together with two efficient algorithms for classification tasks. We analyze their generalization performances and have conducted an extensive set of experiments over 16 benchmark datasets, in which encouraging results showed that our method performed better than the conventional MKL methods. 1
Open Access
"... Background: The lack of sufficient training data is the limiting factor for many Machine Learning applications in Computational Biology. If data is available for several different but related problem domains, Multitask Learning algorithms can be used to learn a model based on all available informati ..."
Abstract
 Add to MetaCart
(Show Context)
Background: The lack of sufficient training data is the limiting factor for many Machine Learning applications in Computational Biology. If data is available for several different but related problem domains, Multitask Learning algorithms can be used to learn a model based on all available information. In Bioinformatics, many problems can be cast into the Multitask Learning scenario by incorporating data from several organisms. However, combining information from several tasks requires careful consideration of the degree of similarity between tasks. Our proposed method simultaneously learns or refines the similarity between tasks along with the Multitask Learning classifier. This is done by formulating the Multitask Learning problem as Multiple Kernel Learning, using the recently published qNorm MKL algorithm. Results: We demonstrate the performance of our method on two problems from Computational Biology. First, we show that our method is able to improve performance on a splice site dataset with given hierarchical task structure by refining the task relationships. Second, we consider an MHCI dataset, for which we assume no knowledge about the degree of task relatedness. Here, we are able to learn the task similarities ab initio along with the Multitask classifiers. In both cases, we outperform baseline methods that we compare against. Conclusions: We present a novel approach to Multitask Learning that is capable of learning task similarity along
Towards Largescale and Ultrahigh Dimensional Feature Selection via Feature Generation
"... In many realworld applications such as text mining, it is desirable to select the most relevant features or variables to improve the generalization ability, or to provide a better interpretation of the prediction models. In this paper, a novel adaptive feature scaling (AFS) scheme is proposed by i ..."
Abstract
 Add to MetaCart
(Show Context)
In many realworld applications such as text mining, it is desirable to select the most relevant features or variables to improve the generalization ability, or to provide a better interpretation of the prediction models. In this paper, a novel adaptive feature scaling (AFS) scheme is proposed by introducing a feature scaling vector d ∈ [0, 1] m to alleviate the bias problem brought by the scaling bias of the diverse features. By reformulating the resultant AFS model to semiinfinite programming problem, a novel feature generating method is presented to identify the most relevant features for classification problems. In contrast to the traditional feature selection methods, the new formulation has the advantage of solving extremely highdimensional and largescale problems. With an exact solution to the worstcase analysis in the identification of relevant features, the proposed feature generating scheme converges globally. More importantly, the proposed scheme facilitates the group selection with or without special structures. Comprehensive experiments on a wide range of synthetic and realworld datasets demonstrate that the proposed method achieves better or competitive performance compared with the existing methods on (group) feature selection in terms of generalization performance and training efficiency. The C++ and MATLAB implementations of our algorithm can be available at
A Randomized Mirror Descent Algorithm for Large Scale Multiple Kernel Learning
"... We consider the problem of simultaneously learning to linearly combine a very large number of kernels and learn a good predictor based on the learnt kernel. When the number of kernels d to be combined is very large, multiple kernel learning methods whose computational cost scales linearly in d are i ..."
Abstract
 Add to MetaCart
We consider the problem of simultaneously learning to linearly combine a very large number of kernels and learn a good predictor based on the learnt kernel. When the number of kernels d to be combined is very large, multiple kernel learning methods whose computational cost scales linearly in d are intractable. We propose a randomized version of the mirror descent algorithm to overcome this issue, under the objective of minimizing the group pnorm penalized empirical risk. The key to achieve the required exponential speedup is the computationally efficient construction of lowvariance estimates of the gradient. We propose importance sampling based estimates, and find that the ideal distribution samples a coordinate with a probability proportional to the magnitude of the corresponding gradient. We show that in the case of learning the coefficients of a polynomial kernel, the combinatorial structure of the base kernels to be combined allows sampling from this distribution in O(log(d)) time, making the total computational cost of the method to achieve an ɛoptimal solution to be O(log(d)/ɛ2), thereby allowing our method to operate for very large values of d. Experiments with simulated and real data confirm that the new algorithm is computationally more efficient than its stateoftheart alternatives. 1.
Wavelet Kernel Learning
"... This paperaddressesthe problem ofoptimal featureextractionfrom a waveletrepresentation. Our workaims at building features by selecting waveletcoefficients resulting from signalorimage decomposition on an adapted wavelet basis. For this purpose, we jointly learn in a kernelized largemargin context t ..."
Abstract
 Add to MetaCart
(Show Context)
This paperaddressesthe problem ofoptimal featureextractionfrom a waveletrepresentation. Our workaims at building features by selecting waveletcoefficients resulting from signalorimage decomposition on an adapted wavelet basis. For this purpose, we jointly learn in a kernelized largemargin context the wavelet shape as well as the appropriate scale and translation of the wavelets, hence the name “wavelet kernel learning”. This problem is posed as a multiple kernel learning problem where the number of kernels can be very large. For solving such a problem, we introduce a novel multiple kernel learning algorithm based on active constraints methods. We furthermore propose some variants of this algorithm that can produce approximate solutions more efficiently. Empirical analysis show that our active constraint MKL algorithm achieves stateofthe art efficiency. When used for wavelet kernel learning, our experimental results show that the approaches we propose are competitive with respect to the state of the art on BrainComputer Interface and Brodatz texture datasets.