Results 1 - 10
of
34
Learning the Kernel Matrix with Semi-Definite Programming
, 2002
"... Kernel-based learning algorithms work by embedding the data into a Euclidean space, and then searching for linear relations among the embedded data points. The embedding is performed implicitly, by specifying the inner products between each pair of points in the embedding space. This information ..."
Abstract
-
Cited by 368 (16 self)
- Add to MetaCart
Kernel-based learning algorithms work by embedding the data into a Euclidean space, and then searching for linear relations among the embedded data points. The embedding is performed implicitly, by specifying the inner products between each pair of points in the embedding space. This information is contained in the so-called kernel matrix, a symmetric and positive definite matrix that encodes the relative positions of all points. Specifying this matrix amounts to specifying the geometry of the embedding space and inducing a notion of similarity in the input space---classical model selection problems in machine learning. In this paper we show how the kernel matrix can be learned from data via semi-definite programming (SDP) techniques. When applied
Multiple kernel learning, conic duality, and the SMO algorithm
- In Proceedings of the 21st International Conference on Machine Learning (ICML
, 2004
"... While classical kernel-based classifiers are based on a single kernel, in practice it is often desirable to base classifiers on combinations of multiple kernels. Lanckriet et al. (2004) considered conic combinations of kernel matrices for the support vector machine (SVM), and showed that the optimiz ..."
Abstract
-
Cited by 168 (19 self)
- Add to MetaCart
While classical kernel-based classifiers are based on a single kernel, in practice it is often desirable to base classifiers on combinations of multiple kernels. Lanckriet et al. (2004) considered conic combinations of kernel matrices for the support vector machine (SVM), and showed that the optimization of the coefficients of such a combination reduces to a convex optimization problem known as a quadratically-constrained quadratic program (QCQP). Unfortunately, current convex optimization toolboxes can solve this problem only for a small number of kernels and a small number of data points; moreover, the sequential minimal optimization (SMO) techniques that are essential in large-scale implementations of the SVM cannot be applied because the cost function is non-differentiable. We propose a novel dual formulation of the QCQP as a second-order cone programming problem, and show how to exploit the technique of Moreau-Yosida regularization to yield a formulation to which SMO techniques can be applied. We present experimental results that show that our SMO-based algorithm is significantly more efficient than the general-purpose interior point methods available in current optimization toolboxes. 1.
A robust minimax approach to classification
- JOURNAL OF MACHINE LEARNING RESEARCH
, 2002
"... When constructing a classifier, the probability of correct classification of future data points should be maximized. We consider a binary classification problem where the mean and covariance matrix of each class are assumed to be known. No further assumptions are made with respect to the class-condi ..."
Abstract
-
Cited by 47 (6 self)
- Add to MetaCart
When constructing a classifier, the probability of correct classification of future data points should be maximized. We consider a binary classification problem where the mean and covariance matrix of each class are assumed to be known. No further assumptions are made with respect to the class-conditional distributions. Misclassification probabilities are then controlled in a worst-case setting: that is, under all possible choices of class-conditional densities with given mean and covariance matrix, we minimize the worst-case (maximum) probability of misclassification of future data points. For a linear decision boundary, this desideratum is translated in a very direct way into a (convex) second order cone optimization problem, with complexity similar to a support vector machine problem. The minimax problem can be interpreted geometrically as minimizing the maximum of the Mahalanobis distances to the two classes. We address the issue of robustness with respect to estimation errors (in the means and covariances of the
KNITRO: An integrated package for nonlinear optimization
- Large Scale Nonlinear Optimization, 35–59, 2006
, 2006
"... This paper describes Knitro 5.0, a C-package for nonlinear optimization that combines complementary approaches to nonlinear optimization to achieve robust performance over a wide range of application requirements. The package is designed for solving large-scale, smooth nonlinear programming problems ..."
Abstract
-
Cited by 19 (3 self)
- Add to MetaCart
This paper describes Knitro 5.0, a C-package for nonlinear optimization that combines complementary approaches to nonlinear optimization to achieve robust performance over a wide range of application requirements. The package is designed for solving large-scale, smooth nonlinear programming problems, and it is also effective for the following special cases: unconstrained optimization, nonlinear systems of equations, least squares, and linear and quadratic programming. Various algorithmic options are available, including two interior methods and an active-set method. The package provides crossover techniques between algorithmic options as well as automatic selection of options and settings. 1
Extracting Shared Subspace for Multi-label Classification
"... Multi-label problems arise in various domains such as multitopic document categorization and protein function prediction. One natural way to deal with such problems is to construct a binary classifier for each label, resulting in a set of independent binary classification problems. Since the multipl ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
Multi-label problems arise in various domains such as multitopic document categorization and protein function prediction. One natural way to deal with such problems is to construct a binary classifier for each label, resulting in a set of independent binary classification problems. Since the multiple labels share the same input space, and the semantics conveyed by different labels are usually correlated, it is essential to exploit the correlation information contained in different labels. In this paper, we consider a general framework for extracting shared structures in multi-label classification. In this framework, a common subspace is assumed to be shared among multiple labels. We show that the optimal solution to the proposed formulation can be obtained by solving a generalized eigenvalue problem, though the problem is nonconvex. For high-dimensional problems, direct computation of the solution is expensive, and we develop an efficient algorithm for this case. One appealing feature of the proposed framework is that it includes several well-known algorithms as special cases, thus elucidating their intrinsic relationships. We have conducted extensive experiments on eleven multitopic web page categorization tasks, and results demonstrate the effectiveness of the proposed formulation in comparison with several representative algorithms.
Interior Point Methods: Current Status And Future Directions
, 1997
"... This article provides a synopsis of the major developments in interior point methods for mathematical programming in the last thirteen years, and discusses current and future research directions in interior point methods, with a brief selective guide to the research literature. AMS Subject Classific ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
This article provides a synopsis of the major developments in interior point methods for mathematical programming in the last thirteen years, and discusses current and future research directions in interior point methods, with a brief selective guide to the research literature. AMS Subject Classification: 90C, 90C05, 90C60 Keywords: Linear Programming, Newton's Method, Interior Point Methods, Barrier Method, Semidefinite Programming, Self-Concordance, Convex Programming, Condition Numbers 1 An earlier version of this article has previously appeared in OPTIMA -- Mathematical Programming Society Newsletter No. 51, 1996 2 M.I.T. Sloan School of Management, Building E40-149A, Cambridge, MA 02139, USA. email: rfreund@mit.edu 3 The Institute of Statistical Mathematics, 4-6-7 Minami-Azabu, Minato-ku, Tokyo 106 JAPAN. e-mail: mizuno@ism.ac.jp INTERIOR POINT METHODS 1 1 Introduction and Synopsis The purpose of this article is twofold: to provide a synopsis of the major developments in ...
Multi-class Discriminant Kernel Learning via Convex Programming
"... Regularized kernel discriminant analysis (RKDA) performs linear discriminant analysis in the feature space via the kernel trick. Its performance depends on the selection of kernels. In this paper, we consider the problem of multiple kernel learning (MKL) for RKDA, in which the optimal kernel matrix ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Regularized kernel discriminant analysis (RKDA) performs linear discriminant analysis in the feature space via the kernel trick. Its performance depends on the selection of kernels. In this paper, we consider the problem of multiple kernel learning (MKL) for RKDA, in which the optimal kernel matrix is obtained as a linear combination of pre-specified kernel matrices. We show that the kernel learning problem in RKDA can be formulated as convex programs. First, we show that this problem can be formulated as a semidefinite program (SDP). Based on the equivalence relationship between RKDA and least square problems in the binary-class case, we propose a convex quadratically constrained quadratic programming (QCQP) formulation for kernel learning in RKDA. A semi-infinite linear programming (SILP) formulation is derived to further improve the efficiency. We extend these formulations to the multi-class case based on a key result established in this paper. That is, the multi-class RKDA kernel learning problem can be decomposed into a set of binary-class kernel learning problems which are constrained to share a common kernel. Based on this decomposition property, SDP formulations are proposed for the multi-class case. Furthermore, it leads naturally to QCQP and SILP formulations. As the performance of RKDA depends on the regularization parameter, we show that this parameter can also be optimized in a joint framework with the kernel. Extensive experiments have been conducted and analyzed, and connections to other algorithms are discussed.
Lean clause-sets: Generalizations of minimally unsatisfiable clause-sets
- Discrete Applied Mathematics
, 2000
"... We study the problem of (efficiently) deleting such clauses from conjunctive normal forms (clause-sets) which can not contribute to any proof of unsatisfiability. For that purpose we introduce the notion of an autarky system, associated with a canonical normal form for every clause-set by deleti ..."
Abstract
-
Cited by 11 (7 self)
- Add to MetaCart
We study the problem of (efficiently) deleting such clauses from conjunctive normal forms (clause-sets) which can not contribute to any proof of unsatisfiability. For that purpose we introduce the notion of an autarky system, associated with a canonical normal form for every clause-set by deleting superfluous clauses. Clause-sets where no clauses can be deleted are called lean, a natural generalization of minimally unsatisfiable clause-sets, opening the possibility for combinatorial approaches (and including also satisfiable instances). Three special examples for autarky systems are considered: general autarkies, linear autarkies (based on linear programming) and matching autarkies (based on matching theory). We give new characterizations of lean and linearly lean clause-sets by "universal linear programming problems," while matching lean clause-sets are characterized in terms of "deficiency, " the difference between the number of clauses and the number of variables, and ...
Lagrangian Heuristics Based on Disaggregated Bundle Methods for Hydrothermal Unit Commitment
- IEEE Transactions on Power Systems
, 2003
"... Abstract—The paper presents a simple and effective Lagrangian relaxation approach for the solution of the optimal short-term unit commitment problem in hydrothermal power-generation systems. The proposed approach, based on a disaggregated Bundle method for the solution of the dual u spinning reserve ..."
Abstract
-
Cited by 8 (6 self)
- Add to MetaCart
Abstract—The paper presents a simple and effective Lagrangian relaxation approach for the solution of the optimal short-term unit commitment problem in hydrothermal power-generation systems. The proposed approach, based on a disaggregated Bundle method for the solution of the dual u spinning reserves Rt in each period t. I-rows T-columns matrix, whose rows are the T-dimensional arrays ui of the 0-1 variables ui,t indicating the commitment state of thermal unit i during period t. problem, with a new warm-starting procedure, achieves accurate solutions in few iterations. The adoption of a disaggregated Bundle method not only improves the convergence of the proposed approach but also provides information that are suitably exploited for generating a feasible solution of the primal pI pH I-rows T-columns matrix, whose rows are the T-dimensional arrays pi of production levels pi,t of thermal unit i during each period t. H-rows T-columns matrix, whose rows are problem and for obtaining an optimal hydro scheduling. A the T-dimensional arrays ph of production comparison between the proposed Lagrangian approach and other ones, based on sub-gradient and Bundle methods, is presented for a simple yet reasonable formulation of the Hydrothermal Unit Commitment problem. Index Terms—Power generation operation, Hydrothermal unit
Robust classification with interval data
, 2003
"... We consider a binary, linear classification problem in which the data points are assumed to be unknown, but bounded within given hyper-rectangles, i.e., the covariates are bounded within intervals explicitly given for each data point separately. We address the problem of designing a robust classifie ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
We consider a binary, linear classification problem in which the data points are assumed to be unknown, but bounded within given hyper-rectangles, i.e., the covariates are bounded within intervals explicitly given for each data point separately. We address the problem of designing a robust classifier in this setting by minimizing the worst-case value of a given loss function, over all possible choices of the data in these multi-dimensional intervals. We examine in detail the application of this methodology to three specific loss functions, arising in support vector machines, in logistic regression and in minimax probability machines. We show that in each case, the resulting problem is amenable to efficient interior-point algorithms for convex optimization. The methods tend to produce sparse classifiers, i.e., they induce many zero coefficients in the resulting weight vectors, and we provide some theoretical grounds for this property. After presenting possible extensions of this framework to handle label errors and other uncertainty models, we discuss in some detail our implementation, which exploits the potential sparsity or a more general property referred to as regularity, of the input matrices. 1

