## A robust minimax approach to classification (2002)

### Cached

### Download Links

Venue: | JOURNAL OF MACHINE LEARNING RESEARCH |

Citations: | 61 - 7 self |

### BibTeX

@ARTICLE{Lanckriet02arobust,

author = {Gert R. G. Lanckriet and Laurent El Ghaoui and Chiranjib Bhattacharyya and Michael I. Jordan},

title = {A robust minimax approach to classification},

journal = {JOURNAL OF MACHINE LEARNING RESEARCH},

year = {2002},

pages = {2002}

}

### Years of Citing Articles

### OpenURL

### Abstract

When constructing a classifier, the probability of correct classification of future data points should be maximized. We consider a binary classification problem where the mean and covariance matrix of each class are assumed to be known. No further assumptions are made with respect to the class-conditional distributions. Misclassification probabilities are then controlled in a worst-case setting: that is, under all possible choices of class-conditional densities with given mean and covariance matrix, we minimize the worst-case (maximum) probability of misclassification of future data points. For a linear decision boundary, this desideratum is translated in a very direct way into a (convex) second order cone optimization problem, with complexity similar to a support vector machine problem. The minimax problem can be interpreted geometrically as minimizing the maximum of the Mahalanobis distances to the two classes. We address the issue of robustness with respect to estimation errors (in the means and covariances of the

### Citations

2649 |
Introduction to Statistical Pattern Recognition”, 2nd edition
- Fukunaga
- 1990
(Show Context)
Citation Context ...r Discriminant Analysis A discriminant hyperplane based on the first two moments can also be computed via Fisher discriminant analysis (FDA). This involves solving the following optimization problem (=-=Fukunaga, 1990-=-): la(x - 9)1 max nD^(a) : (16) a k/aTY]x a + aTy]ya FDA corresponds to a direction The optimal a for the FDA cost function, which we denote as , , which gives good separation between the two projecte... |

2030 | Learning with Kernels
- Scholkopf, Smola
- 2002
(Show Context)
Citation Context ...vely. This brings the total complexity to O(n3 + Nn2 ), where N is the number of data points. This is the same complexity as the quadratic programs one has to solve in linear support vector machines (=-=Schölkopf and Smola, 2002-=-). To gain more insight into the nature of the problem, we propose the following simple and perhaps more transparent iterative least-squares method to globally solve the problem. Similar iterative pro... |

736 | SeDuMi 1.02, a MATLAB toolbox for optimizing over symmetric cones
- Sturm
- 1999
(Show Context)
Citation Context ...olving the Optimization Problem Problem (6) is a convex optimization problem, more precisely a second order cone program (SOCP) (Boyd and Vandenberghe, 2001). General-purpose programs such as SeDuMi (=-=Sturm, 1999-=-) or Mosek (Andersen and Andersen, 2000) can handle those problems efficiently. These codes use interior-point methods for SOCP (Nesterov and Nemirovsky, 1994; Lobo et al., 1998), which yield a worst-... |

315 | Fisher’s discriminant analysis with kernels
- Mika, Rätsch, et al.
- 1999
(Show Context)
Citation Context ...estimator and results in an increase in the upper bound on the probability of misclassification of future data. A third important feature of this approach is that, as in linear discriminant analysis (=-=Mika et al., 1999-=-), it is possible to generalize the basic methodology to allow nonlinear decision boundaries via the use of Mercer kernels. The resulting nonlinear classifiers are competitive with existing classifier... |

278 | Arcing classifiers
- Breiman
- 1998
(Show Context)
Citation Context ... in the article of Breiman (1998)) and to the performance of an SVM with linear kernel (SVML) and an SVM with Gaussian kernel (SVMG). The results are comparable with those in the existing literature (=-=Breiman, 1998-=-) and with those obtained with support vector machines. Also, we notice that c is smaller than the test-set accuracy in all cases. Furthermore,sis smaller for a linear decision boundary then for the n... |

259 | Least squares support vector machine classifiers - Suykens, Vandewalle - 1999 |

248 |
Interior-point polynomial methods in convex programming, ser
- Nesterov, Nemirovsky
- 1994
(Show Context)
Citation Context ...ndenberghe, 2001). General-purpose programs such as SeDuMi (Sturm, 1999) or Mosek (Andersen and Andersen, 2000) can handle those problems efficiently. These codes use interior-point methods for SOCP (=-=Nesterov and Nemirovsky, 1994-=-; Lobo et al., 1998), which yield a worst-case complexity of O(n3). Of course, the first and second moments of x, y must be estimated beforehand, using for example sample moment plug-in estimates fir,... |

215 | The Relevance Vector Machine - Tipping - 2000 |

66 | Optimal inequalities in probability theory: A convex optimization approach - Bertsimas, Popescu - 2002 |

64 |
The MOSEK Interior Point Optimizer for Linear Programming: An Implementation of the Homogeneous Algorithm
- Andersen, A
- 2000
(Show Context)
Citation Context ... Problem Problem (6) is a convex optimization problem, more precisely a second order cone program (SOCP) (Boyd and Vandenberghe, 2001). General-purpose programs such as SeDuMi (Sturm, 1999) or Mosek (=-=Andersen and Andersen, 2000-=-) can handle those problems efficiently. These codes use interior-point methods for SOCP (Nesterov and Nemirovsky, 1994; Lobo et al., 1998), which yield a worst-case complexity of O(n3). Of course, th... |

59 | Duality and geometry in SVM classifiers
- Bennett, Bredensteiner
- 2000
(Show Context)
Citation Context ... both classes. The optimal worst-case misclassification probability is related to the above optimal value r;, by 1 - , 1/(1 + An interesting analogy to this result can be found in SVM classification (=-=Bennett and Bredensteiner, 2000-=-; Crisp and Burges, 1999). In that case, the dual to the problem of finding the maximal margin is the problem of finding points in the convex hulls of the two classes that are closest. This correspond... |

27 |
A geometric interpretation of ν-SVM classifiers
- Crisp, Burges
- 2000
(Show Context)
Citation Context ...lassification probability is related to the above optimal value κ∗ by 1 − α∗ = 1/(1+ κ 2 ∗). An interesting analogy to this result can be found in SVM classification (Bennett and Bredensteiner, 2000, =-=Crisp and Burges, 1999-=-). In that case, the dual to the problem of finding the maximal margin is the problem of finding points in the convex hulls of the two classes that are closest. This corresponds to a quadratic program... |

24 | Vicinal risk minimization
- Chapelle, Weston, et al.
- 2000
(Show Context)
Citation Context ...e "typical" rather than the boundary points, the MPM is in some sense similar to the relevance vector machine proposed in Tipping (2000). Furthermore, the MPM is related to vicihal risk mini=-=mization (Chapelle et al., 2001; Vapnik, -=-1999), in which SVMs were "improved" using the covariance of the classes to push the hyperplane away from the samples that belong to the class with the largest covariance matrix. 2.6 Making ... |

22 | Minimax probability machine - Lanckriet, Ghaoui, et al. - 2002 |

22 | Multivariate chebyshev inequalities - Marshall, Olkin - 1960 |

20 | Robust Novelty Detection with Single-Class MPM - Lanckriet, Ghaoui, et al. - 2002 |

19 |
Classification into two multivariate normal distributions with different covariance matrices
- Anderson, Bahadur
- 1962
(Show Context)
Citation Context ...002b). 7. Conclusions The problem of linear discrimination has a long and distinguished history. Many results on misclassification rates have been obtained by making distributional assumptions (e.g., =-=Anderson and Bahadur, 1962-=-). Our results, on the other hand, make use of the moment-based inequalities of Marshall and Olkin (1960) to obtain distribution-free results for linear discriminants. We considered the case of binary... |

17 |
Asymptotics in bayesian computation
- Kass, Tierney, et al.
- 1988
(Show Context)
Citation Context ...the mean uncertainty is motivated by the standard statistical approach to estimating a region of confidence based on Laplace (that is, second-order) approximations to a likelihood function (see e.g., =-=Kass et al., 1988-=-). The model for uncertainty in the covariance matrix is perhaps less classical from a statistical viewpoint, but it leads to a regularization term of a classical form. The specific values ofsand p in... |

10 |
Applications of second order cone programming. Linear Algebra and its Applications 284
- Lobo, Vandenberghe, et al.
- 1998
(Show Context)
Citation Context ...ose programs such as SeDuMi (Sturm, 1999) or Mosek (Andersen and Andersen, 2000) can handle those problems efficiently. These codes use interior-point methods for SOCP (Nesterov and Nemirovsky, 1994; =-=Lobo et al., 1998-=-), which yield a worst-case complexity of O(n3). Of course, the first and second moments of x, y must be estimated beforehand, using for example sample moment plug-in estimates fir, jr, x, y for x, y,... |

7 |
A geometric interpretation of n SVM classifiers
- Crisp, Burges
- 1999
(Show Context)
Citation Context ...case misclassification probability is related to the above optimal value r;, by 1 - , 1/(1 + An interesting analogy to this result can be found in SVM classification (Bennett and Bredensteiner, 2000; =-=Crisp and Burges, 1999-=-). In that case, the dual to the problem of finding the maximal margin is the problem of finding points in the convex hulls of the two classes that are closest. This corresponds to a quadratic program... |

7 |
Convex optimization. Course notes for EE364
- Boyd, Vandenberghe
- 2001
(Show Context)
Citation Context ...f Theorem 2. � 559sLanckriet, El Ghaoui, Bhattacharyya and Jordan 2.3 Solving the Optimization Problem Problem (6) is a convex optimization problem, more precisely a second order cone program (SOCP) (=-=Boyd and Vandenberghe, 2001-=-). General-purpose programs such as SeDuMi (Sturm, 1999) or Mosek (Andersen and Andersen, 2000) can handle those problems efficiently. These codes use interior-point methods for SOCP (Nesterov and Nem... |

6 | Fast training of support vector classifiers - Pérez-Cruz, Alarcón-Diana, et al. - 2001 |

2 |
Convex optimization, course notes for EE364, StanfordUniversity, The notes are available at http://www.stanford.edu/class/ee364
- Boyd, Vandenberghe
- 2001
(Show Context)
Citation Context ...ptimal a and b in this case. This ends our proof of Theorem 2. [] 2.3 Solving the Optimization Problem Problem (6) is a convex optimization problem, more precisely a second order cone program (SOCP) (=-=Boyd and Vandenberghe, 2001-=-). General-purpose programs such as SeDuMi (Sturm, 1999) or Mosek (Andersen and Andersen, 2000) can handle those problems efficiently. These codes use interior-point methods for SOCP (Nesterov and Nem... |

1 | Fast training of support vector classifiers - Prez-Cruz, Alarc6n-Diana, et al. - 2001 |

1 | A geometric interpretation of ν-SVM classifiers - Lanckriet, Bhattacharyya, et al. - 1999 |

1 | The relevance vector machine - E - 2000 |