Results 1 - 10
of
11
Multiclass multiple kernel learning
- In ICML. ACM
"... In many applications it is desirable to learn from several kernels. “Multiple kernel learning” (MKL) allows the practitioner to optimize over linear combinations of kernels. By enforcing sparse coefficients, it also generalizes feature selection to kernel selection. We propose MKL for joint feature ..."
Abstract
-
Cited by 26 (3 self)
- Add to MetaCart
In many applications it is desirable to learn from several kernels. “Multiple kernel learning” (MKL) allows the practitioner to optimize over linear combinations of kernels. By enforcing sparse coefficients, it also generalizes feature selection to kernel selection. We propose MKL for joint feature maps. This provides a convenient and principled way for MKL with multiclass problems. In addition, we can exploit the joint feature map to learn kernels on output spaces. We show the equivalence of several different primal formulations including different regularizers. We present several optimization methods, and compare a convex quadratically constrained quadratic program (QCQP) and two semi-infinite linear programs (SILPs) on toy data, showing that the SILPs are faster than the QCQP. We then demonstrate the utility of our method by applying the SILP to three real world datasets. 1.
Optimal kernel selection in kernel Fisher discriminant analysis
- In Proceedings of the Twenty-Third International Conference on Machine Learning
, 2006
"... In Kernel Fisher discriminant analysis (KFDA), we carry out Fisher linear discriminant analysis in a high dimensional feature space defined implicitly by a kernel. The performance of KFDA depends on the choice of the kernel; in this paper, we consider the problem of finding the optimal kernel, over ..."
Abstract
-
Cited by 18 (1 self)
- Add to MetaCart
In Kernel Fisher discriminant analysis (KFDA), we carry out Fisher linear discriminant analysis in a high dimensional feature space defined implicitly by a kernel. The performance of KFDA depends on the choice of the kernel; in this paper, we consider the problem of finding the optimal kernel, over a given convex set of kernels. We show that this optimal kernel selection problem can be reformulated as a tractable convex optimization problem which interior-point methods can solve globally and efficiently. The kernel selection method is demonstrated with some UCI machine learning benchmark examples. 1.
Multi-class Discriminant Kernel Learning via Convex Programming
"... Regularized kernel discriminant analysis (RKDA) performs linear discriminant analysis in the feature space via the kernel trick. Its performance depends on the selection of kernels. In this paper, we consider the problem of multiple kernel learning (MKL) for RKDA, in which the optimal kernel matrix ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Regularized kernel discriminant analysis (RKDA) performs linear discriminant analysis in the feature space via the kernel trick. Its performance depends on the selection of kernels. In this paper, we consider the problem of multiple kernel learning (MKL) for RKDA, in which the optimal kernel matrix is obtained as a linear combination of pre-specified kernel matrices. We show that the kernel learning problem in RKDA can be formulated as convex programs. First, we show that this problem can be formulated as a semidefinite program (SDP). Based on the equivalence relationship between RKDA and least square problems in the binary-class case, we propose a convex quadratically constrained quadratic programming (QCQP) formulation for kernel learning in RKDA. A semi-infinite linear programming (SILP) formulation is derived to further improve the efficiency. We extend these formulations to the multi-class case based on a key result established in this paper. That is, the multi-class RKDA kernel learning problem can be decomposed into a set of binary-class kernel learning problems which are constrained to share a common kernel. Based on this decomposition property, SDP formulations are proposed for the multi-class case. Furthermore, it leads naturally to QCQP and SILP formulations. As the performance of RKDA depends on the regularization parameter, we show that this parameter can also be optimized in a joint framework with the kernel. Extensive experiments have been conducted and analyzed, and connections to other algorithms are discussed.
Nonlinear Adaptive Distance Metric Learning for Clustering ABSTRACT
"... A good distance metric is crucial for many data mining tasks. To learn a metric in the unsupervised setting, most metric learning algorithms project observed data to a lowdimensional manifold, where geometric relationships such as pairwise distances are preserved. It can be extended to the nonlinear ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
A good distance metric is crucial for many data mining tasks. To learn a metric in the unsupervised setting, most metric learning algorithms project observed data to a lowdimensional manifold, where geometric relationships such as pairwise distances are preserved. It can be extended to the nonlinear case by applying the kernel trick, which embeds the data into a feature space by specifying the kernel function that computes the dot products between data points in the feature space. In this paper, we propose a novel unsupervised Nonlinear Adaptive Metric Learning algorithm, called NAML, which performs clustering and distance metric learning simultaneously. NAML first maps the data to a high-dimensional space through a kernel function; then applies a linear projection to find a low-dimensional manifold where the separability of the data is maximized; and finally performs clustering in the low-dimensional space. The performance of NAML depends on the selection of the kernel function and the projection. We show that the joint kernel learning, dimensionality reduction, and clustering can be formulated as a trace maximization problem, which can be solved via an iterative procedure in the EM framework. Experimental results demonstrated the efficacy of the proposed algorithm.
Learning the kernel matrix in discriminant analysis via quadratically constrained quadratic programming
- In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, 2007
"... The kernel function plays a central role in kernel methods. In this paper, we consider the automated learning of the kernel matrix over a convex combination of pre-specified kernel matrices in Regularized Kernel Discriminant Analysis (RKDA), which performs linear discriminant analysis in the feature ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
The kernel function plays a central role in kernel methods. In this paper, we consider the automated learning of the kernel matrix over a convex combination of pre-specified kernel matrices in Regularized Kernel Discriminant Analysis (RKDA), which performs linear discriminant analysis in the feature space via the kernel trick. Previous studies have shown that this kernel learning problem can be formulated as a semidefinite program (SDP), which is however computationally expensive, even with the recent advances in interior point methods. Based on the equivalence relationship between RKDA and least square problems in the binary-class case, we propose a Quadratically Constrained Quadratic Programming (QCQP) formulation for the kernel learning problem, which can be solved more efficiently than SDP. While most existing work on kernel learning deal with binary-class problems only, we show that our QCQP formulation can be extended naturally to the multi-class case. Experimental results on both binary-class and multiclass benchmark data sets show the efficacy of the proposed QCQP formulations.
SimpleMKL
, 2008
"... Multiple kernel learning (MKL) aims at simultaneously learning a kernel and the associated predictor in supervised learning settings. For the support vector machine, an efficient and general multiple kernel learning algorithm, based on semi-infinite linear programming, has been recently proposed. Th ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Multiple kernel learning (MKL) aims at simultaneously learning a kernel and the associated predictor in supervised learning settings. For the support vector machine, an efficient and general multiple kernel learning algorithm, based on semi-infinite linear programming, has been recently proposed. This approach has opened new perspectives since it makes MKL tractable for large-scale problems, by iteratively using existing support vector machine code. However, it turns out that this iterative algorithm needs numerous iterations for converging towards a reasonable solution. In this paper, we address the MKL problem through a weighted 2-norm regularization formulation with an additional constraint on the weights that encourages sparse kernel combinations. Apart from learning the combination, we solve a standard SVM optimization problem, where the kernel is defined as a linear combination of multiple kernels. We propose an algorithm, named SimpleMKL, for solving this MKL problem and provide a new insight on MKL algorithms based on mixed-norm regularization by showing that the two approaches are equivalent. We show how SimpleMKL can be applied beyond binary classification, for problems like regression, clustering (one-class classification) or multiclass classification. Experimental results show that the proposed algorithm converges
Feature Selection and Kernel Design via Linear Programming
"... The definition of object (e.g., data point) similarity is critical to the performance of many machine learning algorithms, both in terms of accuracy and computational efficiency. However, it is often the case that a similarity function is unknown or chosen by hand. This paper introduces a formulatio ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
The definition of object (e.g., data point) similarity is critical to the performance of many machine learning algorithms, both in terms of accuracy and computational efficiency. However, it is often the case that a similarity function is unknown or chosen by hand. This paper introduces a formulation that given relative similarity comparisons among triples of points of the form object i is more like object j than object k, it constructs a kernel function that preserves the given relationships. Our approach is based on learning a kernel that is a combination of functions taken from a set of base functions (these could be kernels as well). The formulation is based on defining an optimization problem that can be
Multi-Class Classifiers and Their Underlying Shared Structure
"... Multi-class problems have a richer structure than binary classification problems. Thus, they can potentially improve their performance by exploiting the relationship among class labels. While for the purposes of providing an automated classification result this class structure does not need to be ex ..."
Abstract
- Add to MetaCart
Multi-class problems have a richer structure than binary classification problems. Thus, they can potentially improve their performance by exploiting the relationship among class labels. While for the purposes of providing an automated classification result this class structure does not need to be explicitly unveiled, for human level analysis or interpretation this is valuable. We develop a multi-class large margin classifier that extracts and takes advantage of class relationships. We provide a bi-convex formulation that explicitly learns a matrix that captures these class relationships and is de-coupled from the feature weights. Our representation can take advantage of the class structure to compress the model by reducing the number of classifiers employed, maintaining high accuracy even with large compression. In addition, we present an efficient formulation in terms of speed and memory. 1
Discernibility Concept in Classification Problems
"... The main idea behind this project is that the pattern classification process can be enhanced by taking into account the geometry of class structure in datasets of interest. In contrast to previous work in the literature, this research not only develops a measure of discernibility of individual patte ..."
Abstract
- Add to MetaCart
The main idea behind this project is that the pattern classification process can be enhanced by taking into account the geometry of class structure in datasets of interest. In contrast to previous work in the literature, this research not only develops a measure of discernibility of individual patterns but also consistently applies it to various stages of the classification process. The applications of the discernibility concept cover a wide range of issues from pre-processing to the actual classification and beyond that. Specifically, we apply it for: (a) finding feature subsets of similar classification quality (applicable in diverse ensembles), (b) feature selection, (c) data reduction, (d) reject option, and (e) enhancing the k-NN classifier. Also, a number of auxiliary algorithms and measures are developed to facilitate the proposed methodology. Experiments have been carried out using datasets of the University of California at Irvine (UCI) repository. The experiments provide numerical evidence that the proposed approach does improve the performance of various classifiers. This, together with its simplicity renders it a novel,
Discriminant Analysis for Dimensionality Reduction: An Overview of Recent Developments
"... Many biometric applications such as face recognition involve data with a large number of features [1–3]. Analysis of such data is challenging due to the curse-ofdimensionality [4, 5], which states that an enormous number of samples are required to perform accurate predictions on problems with a high ..."
Abstract
- Add to MetaCart
Many biometric applications such as face recognition involve data with a large number of features [1–3]. Analysis of such data is challenging due to the curse-ofdimensionality [4, 5], which states that an enormous number of samples are required to perform accurate predictions on problems with a high dimensionality. Dimensionality

