Results 1 - 10
of
10
Multiclass multiple kernel learning
- In ICML. ACM
"... In many applications it is desirable to learn from several kernels. “Multiple kernel learning” (MKL) allows the practitioner to optimize over linear combinations of kernels. By enforcing sparse coefficients, it also generalizes feature selection to kernel selection. We propose MKL for joint feature ..."
Abstract
-
Cited by 26 (3 self)
- Add to MetaCart
In many applications it is desirable to learn from several kernels. “Multiple kernel learning” (MKL) allows the practitioner to optimize over linear combinations of kernels. By enforcing sparse coefficients, it also generalizes feature selection to kernel selection. We propose MKL for joint feature maps. This provides a convenient and principled way for MKL with multiclass problems. In addition, we can exploit the joint feature map to learn kernels on output spaces. We show the equivalence of several different primal formulations including different regularizers. We present several optimization methods, and compare a convex quadratically constrained quadratic program (QCQP) and two semi-infinite linear programs (SILPs) on toy data, showing that the SILPs are faster than the QCQP. We then demonstrate the utility of our method by applying the SILP to three real world datasets. 1.
A DC-programming algorithm for kernel selection
- In Proceedings of the Twenty-third International Conference on Machine Learning
, 2006
"... We address the problem of learning a kernel for a given supervised learning task. Our approach consists in searching within the convex hull of a prescribed set of basic kernels for one which minimizes a convex regularization functional. A unique feature of this approach compared to others in the lit ..."
Abstract
-
Cited by 20 (2 self)
- Add to MetaCart
We address the problem of learning a kernel for a given supervised learning task. Our approach consists in searching within the convex hull of a prescribed set of basic kernels for one which minimizes a convex regularization functional. A unique feature of this approach compared to others in the literature is that the number of basic kernels can be infinite. We only require that they are continuously parameterized. For example, the basic kernels could be isotropic Gaussians with variance in a prescribed interval or even Gaussians parameterized by multiple continuous parameters. Our work builds upon a formulation involving a minimax optimization problem and a recently proposed greedy algorithm for learning the kernel. Although this optimization problem is not convex, it belongs to the larger class of DC (difference of convex functions) programs. Therefore, we apply recent results from DC optimization theory to create a new algorithm for learning the kernel. Our experimental results on benchmark data sets show that this algorithm outperforms a previously proposed method.
Multi-class Discriminant Kernel Learning via Convex Programming
"... Regularized kernel discriminant analysis (RKDA) performs linear discriminant analysis in the feature space via the kernel trick. Its performance depends on the selection of kernels. In this paper, we consider the problem of multiple kernel learning (MKL) for RKDA, in which the optimal kernel matrix ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Regularized kernel discriminant analysis (RKDA) performs linear discriminant analysis in the feature space via the kernel trick. Its performance depends on the selection of kernels. In this paper, we consider the problem of multiple kernel learning (MKL) for RKDA, in which the optimal kernel matrix is obtained as a linear combination of pre-specified kernel matrices. We show that the kernel learning problem in RKDA can be formulated as convex programs. First, we show that this problem can be formulated as a semidefinite program (SDP). Based on the equivalence relationship between RKDA and least square problems in the binary-class case, we propose a convex quadratically constrained quadratic programming (QCQP) formulation for kernel learning in RKDA. A semi-infinite linear programming (SILP) formulation is derived to further improve the efficiency. We extend these formulations to the multi-class case based on a key result established in this paper. That is, the multi-class RKDA kernel learning problem can be decomposed into a set of binary-class kernel learning problems which are constrained to share a common kernel. Based on this decomposition property, SDP formulations are proposed for the multi-class case. Furthermore, it leads naturally to QCQP and SILP formulations. As the performance of RKDA depends on the regularization parameter, we show that this parameter can also be optimized in a joint framework with the kernel. Extensive experiments have been conducted and analyzed, and connections to other algorithms are discussed.
Learning from Incomplete Data with Infinite Imputations
"... We address the problem of learning decision functions from training data in which some attribute values are unobserved. This problem can arise, for instance, when training data is aggregated from multiple sources, and some sources record only a subset of attributes. We derive a generic joint optimiz ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We address the problem of learning decision functions from training data in which some attribute values are unobserved. This problem can arise, for instance, when training data is aggregated from multiple sources, and some sources record only a subset of attributes. We derive a generic joint optimization problem in which the distribution governing the missing values is a free parameter. We show that the optimal solution concentrates the density mass on finitely many imputations, and provide a corresponding algorithm for learning from incomplete data. We report on empirical results on benchmark data, and on the email spam application that motivates our work. 1.
Learning to Integrate Data from Different Sources and Tasks
, 2007
"... I, Andreas Argyriou, confirm that the work presented in this thesis is my own. Where information has been derived from other sources, I confirm that this has been indicated in the thesis. 2 Abstract 3 Supervised learning aims at developing models with good generalization properties using input/outpu ..."
Abstract
- Add to MetaCart
I, Andreas Argyriou, confirm that the work presented in this thesis is my own. Where information has been derived from other sources, I confirm that this has been indicated in the thesis. 2 Abstract 3 Supervised learning aims at developing models with good generalization properties using input/output empirical data. Methods which use linear functions and especially kernel methods, such as ridge regres-sion, support vector machines and logistic regression, have been extensively applied for this purpose. The first question we study deals with selecting kernels appropriate for a specific supervised task. To this end we formulate a methodology for learning combinations of prescribed basic kernels, which can be applied to a variety of kernel methods. Unlike previous approaches, it can address cases in which the set of basic kernels is infinite and even uncountable, like the set of all Gaussian kernels. We also propose an algorithm which is conceptually simple and is based on existing kernel methods. Secondly, we address the problem of learning common feature representations across multiple tasks. It has been empirically and theoretically shown that, when different tasks are related, it is possible to exploit task relatedness
Predicting Abnormal Returns From News Using Text Classification
, 2009
"... We show how text from news articles can be used to predict intraday price movements of financial assets using support vector machines. Multiple kernel learning is used to combine equity returns with text as predictive features in order to increase classification performance and we develop an analyti ..."
Abstract
- Add to MetaCart
We show how text from news articles can be used to predict intraday price movements of financial assets using support vector machines. Multiple kernel learning is used to combine equity returns with text as predictive features in order to increase classification performance and we develop an analytic center cutting plane method to solve the kernel learning problem efficiently. This method exhibits linear convergence but requires very few gradient evaluations (each of them a support vector machine classification problem), making it particularly efficient on the large sample sizes considered in this application. 1
Assisting Main Task Learning by Heterogeneous Auxiliary Tasks with Applications to Skin Cancer Screening Ning Situ
"... In typical classification problems, high level concept features provided by a domain expert are usually available during classifier training but not during its deployment. We address this problem from a multitask learning (MTL) perspective by treating these features as auxiliary learning tasks. Prev ..."
Abstract
- Add to MetaCart
In typical classification problems, high level concept features provided by a domain expert are usually available during classifier training but not during its deployment. We address this problem from a multitask learning (MTL) perspective by treating these features as auxiliary learning tasks. Previous efforts in MTL have mostly assumed that all tasks have the same input space. However, auxiliary tasks can have different input spaces, since their learning targets are different. Thus, to handle cases with heterogeneous input, in this paper we present a newly developed model using heterogeneous auxiliary tasks to help main task learning. First, we formulate a convex optimization problem for the proposed model, and then, we analyze its hypothesis class and derive true risk bounds. Finally, we compare the proposed model with other relevant methods when applied to the problem of skin cancer screening and public datasets. Our results show that the performance of the proposed method is highly competitive compared to other relevant methods. 1
Feature Selection and Kernel Learning for Local Learning-Based Clustering
"... Abstract—The performance of the most clustering algorithms highly relies on the representation of data in the input space or the Hilbert space of kernel methods. This paper is to obtain an appropriate data representation through feature selection or kernel learning within the framework of the Local ..."
Abstract
- Add to MetaCart
Abstract—The performance of the most clustering algorithms highly relies on the representation of data in the input space or the Hilbert space of kernel methods. This paper is to obtain an appropriate data representation through feature selection or kernel learning within the framework of the Local Learning-Based Clustering (LLC) (Wu and Schölkopf 2006) method, which can outperform the global learning-based ones when dealing with the high-dimensional data lying on manifold. Specifically, we associate a weight to each feature or kernel and incorporate it into the built-in regularization of the LLC algorithm to take into account the relevance of each feature or kernel for the clustering. Accordingly, the weights are estimated iteratively in the clustering process. We show that the resulting weighted regularization with an additional constraint on the weights is equivalent to a known sparse-promoting penalty. Hence, the weights of those irrelevant features or kernels can be shrunk toward zero. Extensive experiments show the efficacy of the proposed methods on the benchmark data sets. Index Terms—High-dimensional data, local learning-based clustering, feature selection, kernel learning, sparse weighting. Ç 1
Contents lists available at ScienceDirect Pattern Recognition
"... journal homepage: www.elsevier.com/locate/pr ..."
Mathematical Programming for . . .
, 2009
"... The primary focus of this work is optimization algorithms for statistical learning tools and, in particular, the development and implementation of large-scale algorithms for sparse principal component analysis (PCA) and kernel optimization. Sparse PCA seeks sparse factors, or linear combinations of ..."
Abstract
- Add to MetaCart
The primary focus of this work is optimization algorithms for statistical learning tools and, in particular, the development and implementation of large-scale algorithms for sparse principal component analysis (PCA) and kernel optimization. Sparse PCA seeks sparse factors, or linear combinations of the data variables, explaining a maximum amount of variance in the data while having only a limited number of nonzero coefficients. We first enhance a recent first order algorithm for a semidefinite relaxation to sparse PCA using numerically cheaper approximate gradients, allowing us to work with larger data sets. These results are applied to some classic clustering and feature selection problems arising in biology. We next examine classification problems, specifically using support vector machines (SVM), which are heavily dependent on the choice of an input kernel matrix. Kernel learning seeks to improve classification performance by minimizing an upper bound on test error over a set of kernel matrices. Current classification methods, such as SVM, require positive semidefinite kernel matrices. We first address this limitation using kernel learning to incorporate indefinite kernels into SVM, and describe

