Results 1  10
of
21
Multiple kernel learning algorithms
 JMLR
, 2011
"... In recent years, several methods have been proposed to combine multiple kernels instead of using a single one. These different kernels may correspond to using different notions of similarity or may be using information coming from multiple sources (different representations or different feature subs ..."
Abstract

Cited by 72 (1 self)
 Add to MetaCart
In recent years, several methods have been proposed to combine multiple kernels instead of using a single one. These different kernels may correspond to using different notions of similarity or may be using information coming from multiple sources (different representations or different feature subsets). In trying to organize and highlight the similarities and differences between them, we give a taxonomy of and review several multiple kernel learning algorithms. We perform experiments on real data sets for better illustration and comparison of existing algorithms. We see that though there may not be large differences in terms of accuracy, there is difference between them in complexity as given by the number of stored support vectors, the sparsity of the solution as given by the number of used kernels, and training time complexity. We see that overall, using multiple kernels instead of a single one is useful and believe that combining kernels in a nonlinear or datadependent way seems more promising than linear combination in fusing information provided by simple linear kernels, whereas linear methods are more reasonable when combining complex Gaussian kernels.
Multiclass multiple kernel learning
 In ICML. ACM
"... In many applications it is desirable to learn from several kernels. “Multiple kernel learning” (MKL) allows the practitioner to optimize over linear combinations of kernels. By enforcing sparse coefficients, it also generalizes feature selection to kernel selection. We propose MKL for joint feature ..."
Abstract

Cited by 53 (3 self)
 Add to MetaCart
(Show Context)
In many applications it is desirable to learn from several kernels. “Multiple kernel learning” (MKL) allows the practitioner to optimize over linear combinations of kernels. By enforcing sparse coefficients, it also generalizes feature selection to kernel selection. We propose MKL for joint feature maps. This provides a convenient and principled way for MKL with multiclass problems. In addition, we can exploit the joint feature map to learn kernels on output spaces. We show the equivalence of several different primal formulations including different regularizers. We present several optimization methods, and compare a convex quadratically constrained quadratic program (QCQP) and two semiinfinite linear programs (SILPs) on toy data, showing that the SILPs are faster than the QCQP. We then demonstrate the utility of our method by applying the SILP to three real world datasets. 1.
Optimal kernel selection in kernel Fisher discriminant analysis
 In Proceedings of the TwentyThird International Conference on Machine Learning
, 2006
"... In Kernel Fisher discriminant analysis (KFDA), we carry out Fisher linear discriminant analysis in a high dimensional feature space defined implicitly by a kernel. The performance of KFDA depends on the choice of the kernel; in this paper, we consider the problem of finding the optimal kernel, over ..."
Abstract

Cited by 33 (1 self)
 Add to MetaCart
(Show Context)
In Kernel Fisher discriminant analysis (KFDA), we carry out Fisher linear discriminant analysis in a high dimensional feature space defined implicitly by a kernel. The performance of KFDA depends on the choice of the kernel; in this paper, we consider the problem of finding the optimal kernel, over a given convex set of kernels. We show that this optimal kernel selection problem can be reformulated as a tractable convex optimization problem which interiorpoint methods can solve globally and efficiently. The kernel selection method is demonstrated with some UCI machine learning benchmark examples. 1.
Multiclass Discriminant Kernel Learning via Convex Programming
"... Regularized kernel discriminant analysis (RKDA) performs linear discriminant analysis in the feature space via the kernel trick. Its performance depends on the selection of kernels. In this paper, we consider the problem of multiple kernel learning (MKL) for RKDA, in which the optimal kernel matrix ..."
Abstract

Cited by 30 (0 self)
 Add to MetaCart
Regularized kernel discriminant analysis (RKDA) performs linear discriminant analysis in the feature space via the kernel trick. Its performance depends on the selection of kernels. In this paper, we consider the problem of multiple kernel learning (MKL) for RKDA, in which the optimal kernel matrix is obtained as a linear combination of prespecified kernel matrices. We show that the kernel learning problem in RKDA can be formulated as convex programs. First, we show that this problem can be formulated as a semidefinite program (SDP). Based on the equivalence relationship between RKDA and least square problems in the binaryclass case, we propose a convex quadratically constrained quadratic programming (QCQP) formulation for kernel learning in RKDA. A semiinfinite linear programming (SILP) formulation is derived to further improve the efficiency. We extend these formulations to the multiclass case based on a key result established in this paper. That is, the multiclass RKDA kernel learning problem can be decomposed into a set of binaryclass kernel learning problems which are constrained to share a common kernel. Based on this decomposition property, SDP formulations are proposed for the multiclass case. Furthermore, it leads naturally to QCQP and SILP formulations. As the performance of RKDA depends on the regularization parameter, we show that this parameter can also be optimized in a joint framework with the kernel. Extensive experiments have been conducted and analyzed, and connections to other algorithms are discussed.
Hierarchic bayesian models for kernel learning
 In ICML: 22nd International Conference on Machine Learning
, 2005
"... The integration of diverse forms of informative data by learning an optimal combination of base kernels in classification or regression problems can provide enhanced performance when compared to that obtained from any single data source. We present a Bayesian hierarchical model which enables kernel ..."
Abstract

Cited by 30 (6 self)
 Add to MetaCart
(Show Context)
The integration of diverse forms of informative data by learning an optimal combination of base kernels in classification or regression problems can provide enhanced performance when compared to that obtained from any single data source. We present a Bayesian hierarchical model which enables kernel learning and present effective variational Bayes estimators for regression and classification. Illustrative experiments demonstrate the utility of the proposed method. Matlab code replicating results reported is available at
Nonlinear Adaptive Distance Metric Learning for Clustering ABSTRACT
"... A good distance metric is crucial for many data mining tasks. To learn a metric in the unsupervised setting, most metric learning algorithms project observed data to a lowdimensional manifold, where geometric relationships such as pairwise distances are preserved. It can be extended to the nonlinear ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
(Show Context)
A good distance metric is crucial for many data mining tasks. To learn a metric in the unsupervised setting, most metric learning algorithms project observed data to a lowdimensional manifold, where geometric relationships such as pairwise distances are preserved. It can be extended to the nonlinear case by applying the kernel trick, which embeds the data into a feature space by specifying the kernel function that computes the dot products between data points in the feature space. In this paper, we propose a novel unsupervised Nonlinear Adaptive Metric Learning algorithm, called NAML, which performs clustering and distance metric learning simultaneously. NAML first maps the data to a highdimensional space through a kernel function; then applies a linear projection to find a lowdimensional manifold where the separability of the data is maximized; and finally performs clustering in the lowdimensional space. The performance of NAML depends on the selection of the kernel function and the projection. We show that the joint kernel learning, dimensionality reduction, and clustering can be formulated as a trace maximization problem, which can be solved via an iterative procedure in the EM framework. Experimental results demonstrated the efficacy of the proposed algorithm.
SimpleMKL
, 2008
"... Multiple kernel learning (MKL) aims at simultaneously learning a kernel and the associated predictor in supervised learning settings. For the support vector machine, an efficient and general multiple kernel learning algorithm, based on semiinfinite linear programming, has been recently proposed. Th ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
Multiple kernel learning (MKL) aims at simultaneously learning a kernel and the associated predictor in supervised learning settings. For the support vector machine, an efficient and general multiple kernel learning algorithm, based on semiinfinite linear programming, has been recently proposed. This approach has opened new perspectives since it makes MKL tractable for largescale problems, by iteratively using existing support vector machine code. However, it turns out that this iterative algorithm needs numerous iterations for converging towards a reasonable solution. In this paper, we address the MKL problem through a weighted 2norm regularization formulation with an additional constraint on the weights that encourages sparse kernel combinations. Apart from learning the combination, we solve a standard SVM optimization problem, where the kernel is defined as a linear combination of multiple kernels. We propose an algorithm, named SimpleMKL, for solving this MKL problem and provide a new insight on MKL algorithms based on mixednorm regularization by showing that the two approaches are equivalent. We show how SimpleMKL can be applied beyond binary classification, for problems like regression, clustering (oneclass classification) or multiclass classification. Experimental results show that the proposed algorithm converges
Learning the kernel matrix in discriminant analysis via quadratically constrained quadratic programming
 In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, 2007
"... The kernel function plays a central role in kernel methods. In this paper, we consider the automated learning of the kernel matrix over a convex combination of prespecified kernel matrices in Regularized Kernel Discriminant Analysis (RKDA), which performs linear discriminant analysis in the feature ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
(Show Context)
The kernel function plays a central role in kernel methods. In this paper, we consider the automated learning of the kernel matrix over a convex combination of prespecified kernel matrices in Regularized Kernel Discriminant Analysis (RKDA), which performs linear discriminant analysis in the feature space via the kernel trick. Previous studies have shown that this kernel learning problem can be formulated as a semidefinite program (SDP), which is however computationally expensive, even with the recent advances in interior point methods. Based on the equivalence relationship between RKDA and least square problems in the binaryclass case, we propose a Quadratically Constrained Quadratic Programming (QCQP) formulation for the kernel learning problem, which can be solved more efficiently than SDP. While most existing work on kernel learning deal with binaryclass problems only, we show that our QCQP formulation can be extended naturally to the multiclass case. Experimental results on both binaryclass and multiclass benchmark data sets show the efficacy of the proposed QCQP formulations.
Bilinear Analysis for Kernel Selection and Nonlinear Feature Extraction
 IEEE Transactions on Neural Networks
, 2007
"... Abstract—This paper presents a unified criterion, Fisher + kernel criterion (FKC), for feature extraction and recognition. This new criterion is intended to extract the most discriminant features in different nonlinear spaces, and then, fuse these features under a unified measurement. Thus, FKC can ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
Abstract—This paper presents a unified criterion, Fisher + kernel criterion (FKC), for feature extraction and recognition. This new criterion is intended to extract the most discriminant features in different nonlinear spaces, and then, fuse these features under a unified measurement. Thus, FKC can simultaneously achieve nonlinear discriminant analysis and kernel selection. In addition, we present an efficient algorithm Fisher + kernel analysis (FKA), which utilizes the bilinear analysis, to optimize the new criterion. This FKA algorithm can alleviate the illposed problem existed in traditional kernel discriminant analysis (KDA), and usually, has no singularity problem. The effectiveness of our proposed algorithm is validated by a series of facerecognition experiments on several different databases. Index Terms—Bilinear analysis, discriminant analysis, face recognition, feature extraction, Fisher criterion, kernel selection. I.
Feature Selection and Kernel Design via Linear Programming
"... The definition of object (e.g., data point) similarity is critical to the performance of many machine learning algorithms, both in terms of accuracy and computational efficiency. However, it is often the case that a similarity function is unknown or chosen by hand. This paper introduces a formulatio ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
The definition of object (e.g., data point) similarity is critical to the performance of many machine learning algorithms, both in terms of accuracy and computational efficiency. However, it is often the case that a similarity function is unknown or chosen by hand. This paper introduces a formulation that given relative similarity comparisons among triples of points of the form object i is more like object j than object k, it constructs a kernel function that preserves the given relationships. Our approach is based on learning a kernel that is a combination of functions taken from a set of base functions (these could be kernels as well). The formulation is based on defining an optimization problem that can be