Results 1  10
of
62
Second Order Cone Programming Approaches for Handling Missing and Uncertain Data
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... We propose a novel second order cone programming formulation for designing robust classifiers which can handle uncertainty in observations. Similar formulations are also derived for designing regression functions which are robust to uncertainties in the regression setting. The proposed formulations ..."
Abstract

Cited by 33 (9 self)
 Add to MetaCart
We propose a novel second order cone programming formulation for designing robust classifiers which can handle uncertainty in observations. Similar formulations are also derived for designing regression functions which are robust to uncertainties in the regression setting. The proposed formulations are independent of the underlying distribution, requiring only the existence of second order moments. These formulations are then specialized to the case of missing values in observations for both classification and regression problems. Experiments show that the proposed formulations outperform imputation.
S.C.M.M.D.R.J.M.: Fast asymmetric learning for cascade face detection
 Machine Intelligence 30 (2008) 369
"... A cascade face detector uses a sequence of node classifiers to distinguish faces from nonfaces. This paper presents a new approach to design node classifiers in the cascade detector. Previous methods used machine learning algorithms that simultaneously select features and form ensemble classifiers. ..."
Abstract

Cited by 31 (3 self)
 Add to MetaCart
A cascade face detector uses a sequence of node classifiers to distinguish faces from nonfaces. This paper presents a new approach to design node classifiers in the cascade detector. Previous methods used machine learning algorithms that simultaneously select features and form ensemble classifiers. We argue that if these two parts are decoupled, we have the freedom to design a classifier that explicitly addresses the difficulties caused by the asymmetric learning goal. There are three contributions in this paper. The first is a categorization of asymmetries in the learning goal, and why they make face detection hard. The second is the Forward Feature Selection (FFS) algorithm and a fast caching strategy for AdaBoost. FFS and the fast AdaBoost can reduce the training time by approximately 100 and 50 times, in comparison to a naive implementation of the AdaBoost feature selection method. The last contribution is Linear Asymmetric Classifier (LAC), a classifier that explicitly handles the asymmetric learning goal as a welldefined constrained optimization problem. We demonstrated experimentally that LAC results in improved ensemble classifier performance. 1
A Direct Method for Building Sparse Kernel Learning Algorithms
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... Many kernel learning algorithms, including support vector machines, result in a kernel machine, such as a kernel classifier, whose key component is a weight vector in a feature space implicitly introduced by a positive definite kernel function. This weight vector is usually obtained by solving a ..."
Abstract

Cited by 25 (0 self)
 Add to MetaCart
Many kernel learning algorithms, including support vector machines, result in a kernel machine, such as a kernel classifier, whose key component is a weight vector in a feature space implicitly introduced by a positive definite kernel function. This weight vector is usually obtained by solving a convex optimization problem. Based on this fact we present a direct method to build sparse kernel learning algorithms by adding one more constraint to the original convex optimization problem, such that the sparseness of the resulting kernel machine is explicitly controlled while at the same time performance is kept as high as possible. A gradient based approach is provided to solve this modified optimization problem. Applying
Theory and applications of Robust Optimization
, 2007
"... In this paper we survey the primary research, both theoretical and applied, in the field of Robust Optimization (RO). Our focus will be on the computational attractiveness of RO approaches, as well as the modeling power and broad applicability of the methodology. In addition to surveying the most pr ..."
Abstract

Cited by 24 (5 self)
 Add to MetaCart
In this paper we survey the primary research, both theoretical and applied, in the field of Robust Optimization (RO). Our focus will be on the computational attractiveness of RO approaches, as well as the modeling power and broad applicability of the methodology. In addition to surveying the most prominent theoretical results of RO over the past decade, we will also present some recent results linking RO to adaptable models for multistage decisionmaking problems. Finally, we will highlight successful applications of RO across a wide spectrum of domains, including, but not limited to, finance, statistics, learning, and engineering.
Multiclass Discriminant Kernel Learning via Convex Programming
"... Regularized kernel discriminant analysis (RKDA) performs linear discriminant analysis in the feature space via the kernel trick. Its performance depends on the selection of kernels. In this paper, we consider the problem of multiple kernel learning (MKL) for RKDA, in which the optimal kernel matrix ..."
Abstract

Cited by 22 (0 self)
 Add to MetaCart
Regularized kernel discriminant analysis (RKDA) performs linear discriminant analysis in the feature space via the kernel trick. Its performance depends on the selection of kernels. In this paper, we consider the problem of multiple kernel learning (MKL) for RKDA, in which the optimal kernel matrix is obtained as a linear combination of prespecified kernel matrices. We show that the kernel learning problem in RKDA can be formulated as convex programs. First, we show that this problem can be formulated as a semidefinite program (SDP). Based on the equivalence relationship between RKDA and least square problems in the binaryclass case, we propose a convex quadratically constrained quadratic programming (QCQP) formulation for kernel learning in RKDA. A semiinfinite linear programming (SILP) formulation is derived to further improve the efficiency. We extend these formulations to the multiclass case based on a key result established in this paper. That is, the multiclass RKDA kernel learning problem can be decomposed into a set of binaryclass kernel learning problems which are constrained to share a common kernel. Based on this decomposition property, SDP formulations are proposed for the multiclass case. Furthermore, it leads naturally to QCQP and SILP formulations. As the performance of RKDA depends on the regularization parameter, we show that this parameter can also be optimized in a joint framework with the kernel. Extensive experiments have been conducted and analyzed, and connections to other algorithms are discussed.
Robustness and regularization of support vector machines
, 1485
"... We consider regularized support vector machines (SVMs) and show that they are precisely equivalent to a new robust optimization formulation. We show that this equivalence of robust optimization and regularization has implications for both algorithms, and analysis. In terms of algorithms, the equival ..."
Abstract

Cited by 19 (4 self)
 Add to MetaCart
We consider regularized support vector machines (SVMs) and show that they are precisely equivalent to a new robust optimization formulation. We show that this equivalence of robust optimization and regularization has implications for both algorithms, and analysis. In terms of algorithms, the equivalence suggests more general SVMlike algorithms for classification that explicitly build in protection to noise, and at the same time control overfitting. On the analysis front, the equivalence of robustness and regularization provides a robust optimization interpretation for the success of regularized SVMs. We use this new robustness interpretation of SVMs to give a new proof of consistency of (kernelized) SVMs, thus establishing robustness as the reason regularized SVMs generalize well.
Robust fisher discriminant analysis
 In In Advances in Neural Information Processing Systems
, 2006
"... Fisher linear discriminant analysis (LDA) can be sensitive to the problem data. Robust Fisher LDA can systematically alleviate the sensitivity problem by explicitly incorporating a model of data uncertainty in a classification problem and optimizing for the worstcase scenario under this model. The ..."
Abstract

Cited by 15 (3 self)
 Add to MetaCart
Fisher linear discriminant analysis (LDA) can be sensitive to the problem data. Robust Fisher LDA can systematically alleviate the sensitivity problem by explicitly incorporating a model of data uncertainty in a classification problem and optimizing for the worstcase scenario under this model. The main contribution of this paper is show that with general convex uncertainty models on the problem data, robust Fisher LDA can be carried out using convex optimization. For a certain type of product form uncertainty model, robust Fisher LDA can be carried out at a cost comparable to standard Fisher LDA. The method is demonstrated with some numerical examples. Finally, we show how to extend these results to robust kernel Fisher discriminant analysis, i.e., robust Fisher LDA in a high dimensional feature space. 1
G.: Robust classification with interval data
, 2003
"... We consider a binary, linear classification problem in which the data points are assumed to be unknown, but bounded within given hyperrectangles, i.e., the covariates are bounded within intervals explicitly given for each data point separately. We address the problem of designing a robust classifie ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
We consider a binary, linear classification problem in which the data points are assumed to be unknown, but bounded within given hyperrectangles, i.e., the covariates are bounded within intervals explicitly given for each data point separately. We address the problem of designing a robust classifier in this setting by minimizing the worstcase value of a given loss function, over all possible choices of the data in these multidimensional intervals. We examine in detail the application of this methodology to three specific loss functions, arising in support vector machines, in logistic regression and in minimax probability machines. We show that in each case, the resulting problem is amenable to efficient interiorpoint algorithms for convex optimization. The methods tend to produce sparse classifiers, i.e., they induce many zero coefficients in the resulting weight vectors, and we provide some theoretical grounds for this property. After presenting possible extensions of this framework to handle label errors and other uncertainty models, we discuss in some detail our implementation, which exploits the potential sparsity or a more general property referred to as regularity, of the input matrices. 1
Selected topics in robust convex optimization
 Math. Prog. B, this issue
, 2007
"... Abstract Robust Optimization is a rapidly developing methodology for handling optimization problems affected by nonstochastic “uncertainbutbounded” data perturbations. In this paper, we overview several selected topics in this popular area, specifically, (1) recent extensions of the basic concept ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
Abstract Robust Optimization is a rapidly developing methodology for handling optimization problems affected by nonstochastic “uncertainbutbounded” data perturbations. In this paper, we overview several selected topics in this popular area, specifically, (1) recent extensions of the basic concept of robust counterpart of an optimization problem with uncertain data, (2) tractability of robust counterparts, (3) links between RO and traditional chance constrained settings of problems with stochastic data, and (4) a novel generic application of the RO methodology in Robust Linear Control. Keywords optimization under uncertainty · robust optimization · convex programming · chance constraints · robust linear control
The minimum error minimax probability machine
 Journal of Machine Learning Research
, 2004
"... We construct a distributionfree Bayes optimal classifier called the Minimum Error Minimax Probability Machine (MEMPM) in a worstcase setting, i.e., under all possible choices of classconditional densities with a given mean and covariance matrix. By assuming no specific distributions for the data, ..."
Abstract

Cited by 14 (4 self)
 Add to MetaCart
We construct a distributionfree Bayes optimal classifier called the Minimum Error Minimax Probability Machine (MEMPM) in a worstcase setting, i.e., under all possible choices of classconditional densities with a given mean and covariance matrix. By assuming no specific distributions for the data, our model is thus distinguished from traditional Bayes optimal approaches, where an assumption on the data distribution is a must. This model is extended from the Minimax Probability Machine (MPM), a recentlyproposed novel classifier, and is demonstrated to be the general case of MPM. Moreover, it includes another special case named the Biased Minimax Probability Machine, which is appropriate for handling biased classification. One appealing feature of MEMPM is that it contains an explicit performance indicator, i.e., a lower bound on the worstcase accuracy, which is shown to be tighter than that of MPM. We provide conditions under which the worstcase Bayes optimal classifier converges to the Bayes optimal classifier. We demonstrate how to apply a more general statistical framework to estimate model input parameters robustly. We also show how to extend our model to nonlinear classification by exploiting kernelization techniques. A series of experiments on both synthetic data sets and real world benchmark data sets validates our proposition and demonstrates the effectiveness of our model.