Results 11  20
of
51
The minimum error minimax probability machine
 Journal of Machine Learning Research
, 2004
"... We construct a distributionfree Bayes optimal classifier called the Minimum Error Minimax Probability Machine (MEMPM) in a worstcase setting, i.e., under all possible choices of classconditional densities with a given mean and covariance matrix. By assuming no specific distributions for the data, ..."
Abstract

Cited by 13 (4 self)
 Add to MetaCart
We construct a distributionfree Bayes optimal classifier called the Minimum Error Minimax Probability Machine (MEMPM) in a worstcase setting, i.e., under all possible choices of classconditional densities with a given mean and covariance matrix. By assuming no specific distributions for the data, our model is thus distinguished from traditional Bayes optimal approaches, where an assumption on the data distribution is a must. This model is extended from the Minimax Probability Machine (MPM), a recentlyproposed novel classifier, and is demonstrated to be the general case of MPM. Moreover, it includes another special case named the Biased Minimax Probability Machine, which is appropriate for handling biased classification. One appealing feature of MEMPM is that it contains an explicit performance indicator, i.e., a lower bound on the worstcase accuracy, which is shown to be tighter than that of MPM. We provide conditions under which the worstcase Bayes optimal classifier converges to the Bayes optimal classifier. We demonstrate how to apply a more general statistical framework to estimate model input parameters robustly. We also show how to extend our model to nonlinear classification by exploiting kernelization techniques. A series of experiments on both synthetic data sets and real world benchmark data sets validates our proposition and demonstrates the effectiveness of our model.
Robust sparse hyperplane classifiers: application to uncertain molecular profiling data
 Journal of Computational Biology
, 2004
"... Key words: robust sparse hyperplanes; secondorder cone program; linear programming; breast cancer; molecular profiling; twoclass highdimensional data ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
Key words: robust sparse hyperplanes; secondorder cone program; linear programming; breast cancer; molecular profiling; twoclass highdimensional data
Learning Classifiers from Imbalanced Data Based on Biased Minimax Probability Machine
, 2004
"... We consider the problem of the binary classification on imbalanced data, in which nearly all the instances are labelled as one class, while far fewer instances are labelled as the other class, usually the more important class. Traditional machine learning methods seeking an accurate performance over ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
We consider the problem of the binary classification on imbalanced data, in which nearly all the instances are labelled as one class, while far fewer instances are labelled as the other class, usually the more important class. Traditional machine learning methods seeking an accurate performance over a full range of instances are not suitable to deal with this problem, since they tend to classify all the data into the majority, usually the less important class. Moreover, some current methods have tried to utilize some intermediate factors, e.g., the distribution of the training set, the decision thresholds or the cost matrices, to influence the bias of the classification. However, it remains uncertain whether these methods can improve the performance in a systematic way. In this paper, we propose a novel model named Biased Minimax Probability Machine. Different from previous methods, this model directly controls the worstcase real accuracy of classification of the future data to build up biased classifiers. Hence, it provides a rigorous treatment on imbalanced data. The experimental results on the novel model comparing with those of three competitive methods, i.e., the Naive Bayesian classifier, the kNearest Neighbor method, and the decision tree method C4.5, demonstrate the superiority of our novel model.
Generalized Chebyshev bounds via semidefinite programming
 SIAM Review
"... Abstract. A sharp lower bound on the probability of a set defined by quadratic inequalities, given the first two moments of the distribution, can be efficiently computed using convex optimization. This result generalizes Chebyshev’s inequality for scalar random variables. Two semidefinite programmin ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
Abstract. A sharp lower bound on the probability of a set defined by quadratic inequalities, given the first two moments of the distribution, can be efficiently computed using convex optimization. This result generalizes Chebyshev’s inequality for scalar random variables. Two semidefinite programming formulations are presented, with a constructive proof based on convex optimization duality and elementary linear algebra. Key words. Semidefinite programming, convex optimization, duality theory, Chebyshev inequalities, moment problems. AMS subject classifications. 90C22, 90C25, 6008.
Gaussian margin machines
 In Proceedings on the International Conference on Artificial Intelligence and Statistics (AISTATS
, 2009
"... We introduce Gaussian Margin Machines (GMMs), which maintain a Gaussian distribution over weight vectors for binary classification. The learning algorithm for these machines seeks the least informative distribution that will classify the training data correctly with high probability. One formulation ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
We introduce Gaussian Margin Machines (GMMs), which maintain a Gaussian distribution over weight vectors for binary classification. The learning algorithm for these machines seeks the least informative distribution that will classify the training data correctly with high probability. One formulation can be expressed as a convex constrained optimization problem whose solution can be represented linearly in terms of training instances and their inner and outer products, supporting kernelization. The algorithm admits a natural PACBayesian justification and is shown to minimize a quantity directly related to a PACBayesian generalization bound. A preliminary evaluation on handwriting recognition data shows that our algorithm improves on SVMs for the same task, achieving lower test error and lower test error variance. 1
Biased Minimax Probability Machine for Medical Diagnosis
 In the Eighth International Symposium on Artif icial Intelligence and Mathematics
, 2004
"... The Minimax Probability Machine (MPM) constructs a classifier, which provides a worstcase bound on the probability of misclassification of future data points based on reliable estimates of means and covariance matrices of the classes from the training data points, and achieves the comparative per ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
The Minimax Probability Machine (MPM) constructs a classifier, which provides a worstcase bound on the probability of misclassification of future data points based on reliable estimates of means and covariance matrices of the classes from the training data points, and achieves the comparative performance with a stateoftheart classifier, the Support Vector Machine. In this paper, we eliminate the assumption of the unbiased weight for each class in the MPM and develop a critical extension, named Biased Minimax Probability Machine (BMPM), to deal with biased classification tasks, especially in the medical diagnostic applications. We outline the theoretical derivatives of the BMPM. Moreover, we demonstrate that this model can be transformed into a concaveconvex Fractional Programming (FP) problem or a pseudoconcave problem. After illustrating our model with a synthetic dataset and applying it to the realworld medical diagnosis datasets, we obtain encouraging and promising experimental results.
Nash Equilibria of Static Prediction Games
"... The standard assumption of identically distributed training and test data is violated when an adversary can exercise some control over the generation of the test data. In a prediction game, a learner produces a predictive model while an adversary may alter the distribution of input data. We study si ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
The standard assumption of identically distributed training and test data is violated when an adversary can exercise some control over the generation of the test data. In a prediction game, a learner produces a predictive model while an adversary may alter the distribution of input data. We study singleshot prediction games in which the cost functions of learner and adversary are not necessarily antagonistic. We identify conditions under which the prediction game has a unique Nash equilibrium, and derive algorithms that will find the equilibrial prediction models. In a case study, we explore properties of Nashequilibrial prediction models for email spam filtering empirically. 1
Second order cone programming formulations for feature selection
 Journal of Machine Learning Research
"... This paper addresses the issue of feature selection for linear classifiers given the moments of the class conditional densities. The problem is posed as finding a minimal set of features such that the resulting classifier has a low misclassification error. Using a bound on the misclassification erro ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
This paper addresses the issue of feature selection for linear classifiers given the moments of the class conditional densities. The problem is posed as finding a minimal set of features such that the resulting classifier has a low misclassification error. Using a bound on the misclassification error involving the mean and covariance of class conditional densities and minimizing an L1 norm as an approximate criterion for feature selection, a second order programming formulation is derived. To handle errors in estimation of mean and covariances, a tractable robust formulation is also discussed. In a slightly different setting the Fisher discriminant is derived. Feature selection for Fisher discriminant is also discussed. Experimental results on synthetic data sets and on real life microarray data show that the proposed formulations are competitive with the state of the art linear programming formulation. 1.
Pareto optimal linear classification
 in Proc. ICML, 2006
, 1990
"... We consider the problem of choosing a linear classifier that minimizes misclassification probabilities in twoclass classification, which is a bicriterion problem, involving a tradeoff between two objectives. We assume that the classconditional distributions are Gaussian. This assumption makes it ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
We consider the problem of choosing a linear classifier that minimizes misclassification probabilities in twoclass classification, which is a bicriterion problem, involving a tradeoff between two objectives. We assume that the classconditional distributions are Gaussian. This assumption makes it computationally tractable to find Pareto optimal linear classifiers whose classification capabilities are inferior to no other linear ones. The main purpose of this paper is to establish several robustness properties of those classifiers with respect to variations and uncertainties in the distributions. We also extend the results to kernelbased classification. Finally, we show how to carry out tradeoff analysis empirically with a finite number of given labeled data. 1.