Results 1  10
of
10
Robust fisher discriminant analysis
 In In Advances in Neural Information Processing Systems
, 2006
"... Fisher linear discriminant analysis (LDA) can be sensitive to the problem data. Robust Fisher LDA can systematically alleviate the sensitivity problem by explicitly incorporating a model of data uncertainty in a classification problem and optimizing for the worstcase scenario under this model. The ..."
Abstract

Cited by 15 (3 self)
 Add to MetaCart
Fisher linear discriminant analysis (LDA) can be sensitive to the problem data. Robust Fisher LDA can systematically alleviate the sensitivity problem by explicitly incorporating a model of data uncertainty in a classification problem and optimizing for the worstcase scenario under this model. The main contribution of this paper is show that with general convex uncertainty models on the problem data, robust Fisher LDA can be carried out using convex optimization. For a certain type of product form uncertainty model, robust Fisher LDA can be carried out at a cost comparable to standard Fisher LDA. The method is demonstrated with some numerical examples. Finally, we show how to extend these results to robust kernel Fisher discriminant analysis, i.e., robust Fisher LDA in a high dimensional feature space. 1
Biased Minimax Probability Machine for Medical Diagnosis
 In the Eighth International Symposium on Artif icial Intelligence and Mathematics
, 2004
"... The Minimax Probability Machine (MPM) constructs a classifier, which provides a worstcase bound on the probability of misclassification of future data points based on reliable estimates of means and covariance matrices of the classes from the training data points, and achieves the comparative per ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
The Minimax Probability Machine (MPM) constructs a classifier, which provides a worstcase bound on the probability of misclassification of future data points based on reliable estimates of means and covariance matrices of the classes from the training data points, and achieves the comparative performance with a stateoftheart classifier, the Support Vector Machine. In this paper, we eliminate the assumption of the unbiased weight for each class in the MPM and develop a critical extension, named Biased Minimax Probability Machine (BMPM), to deal with biased classification tasks, especially in the medical diagnostic applications. We outline the theoretical derivatives of the BMPM. Moreover, we demonstrate that this model can be transformed into a concaveconvex Fractional Programming (FP) problem or a pseudoconcave problem. After illustrating our model with a synthetic dataset and applying it to the realworld medical diagnosis datasets, we obtain encouraging and promising experimental results.
Pareto optimal linear classification
 in Proc. ICML, 2006
, 1990
"... We consider the problem of choosing a linear classifier that minimizes misclassification probabilities in twoclass classification, which is a bicriterion problem, involving a tradeoff between two objectives. We assume that the classconditional distributions are Gaussian. This assumption makes it ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
We consider the problem of choosing a linear classifier that minimizes misclassification probabilities in twoclass classification, which is a bicriterion problem, involving a tradeoff between two objectives. We assume that the classconditional distributions are Gaussian. This assumption makes it computationally tractable to find Pareto optimal linear classifiers whose classification capabilities are inferior to no other linear ones. The main purpose of this paper is to establish several robustness properties of those classifiers with respect to variations and uncertainties in the distributions. We also extend the results to kernelbased classification. Finally, we show how to carry out tradeoff analysis empirically with a finite number of given labeled data. 1.
Maximum Margin based Semisupervised Spectral Kernel Learning
"... Abstract — Semisupervised kernel learning is attracting increasing research interests recently. It works by learning an embedding of data from the input space to a Hilbert space using both labeled data and unlabeled data, and then searching for relations among the embedded data points. One of the m ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Abstract — Semisupervised kernel learning is attracting increasing research interests recently. It works by learning an embedding of data from the input space to a Hilbert space using both labeled data and unlabeled data, and then searching for relations among the embedded data points. One of the most wellknown semisupervised kernel learning approaches is the spectral kernel learning methodology which usually tunes the spectral empirically or through optimizing some generalized performance measures. However, the kernel designing process does not involve the bias of a kernelbased learning algorithm, the deduced kernel matrix cannot necessarily facilitate a specific learning algorithm. To supplement the spectral kernel learning methods, this paper proposes a novel approach, which not only learns a kernel matrix by maximizing another generalized performance measure, the margin between two classes of data, but also leads directly to a convex optimization method for learning the margin parameters in support vector machines. Moreover, experimental results demonstrate that our proposed spectral kernel learning method achieves promising results against other spectral kernel learning methods. I.
MaxiMin Margin Machine: Learning Large Margin Classifiers Locally and Globally
"... Abstract — We propose a novel large margin classifier, called the MaxiMin Margin Machine (M 4). This model learns the decision boundary both locally and globally. In comparison, other large margin classifiers construct separating hyperplanes only either locally or globally. For example, a stateof ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Abstract — We propose a novel large margin classifier, called the MaxiMin Margin Machine (M 4). This model learns the decision boundary both locally and globally. In comparison, other large margin classifiers construct separating hyperplanes only either locally or globally. For example, a stateoftheart large margin classifier, the Support Vector Machine (SVM), considers data only locally, while another significant model, the Minimax Probability Machine (MPM), focuses on building the decision hyperplane exclusively based on the global information. As a major contribution, we show that SVM yields the same solution as M 4 when data satisfy certain conditions, and MPM can be regarded as a relaxation model of M 4. Moreover, based on our proposed local and global view of data, another popular model, the Linear Discriminant Analysis, can easily be interpreted and extended as well. We describe the M 4 model definition, provide a geometrical interpretation, present theoretical justifications, and propose a practical sequential conic programming method to solve the optimization problem. We also show how to exploit Mercer kernels to extend M 4 for nonlinear classifications. Furthermore, we perform a series of evaluations on both synthetic data sets and real world benchmark data sets. Comparison with SVM and MPM demonstrates the advantages of our new model. Index Terms — classification, large margin, kernel methods, second order cone programming, learning locally and globally I.
Generative Prior Knowledge for Discriminative Classification
"... We present a novel framework for integrating prior knowledge into discriminative classifiers. Our framework allows discriminative classifiers such as Support Vector Machines (SVMs) to utilize prior knowledge specified in the generative setting. The dual objective of fitting the data and respecting p ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We present a novel framework for integrating prior knowledge into discriminative classifiers. Our framework allows discriminative classifiers such as Support Vector Machines (SVMs) to utilize prior knowledge specified in the generative setting. The dual objective of fitting the data and respecting prior knowledge is formulated as a bilevel program, which is solved (approximately) via iterative application of secondorder cone programming. To test our approach, we consider the problem of using WordNet (a semantic database of English language) to improve lowsample classification accuracy of newsgroup categorization. WordNet is viewed as an approximate, but readily available source of background knowledge, and our framework is capable of utilizing it in a flexible way. 1.
Maximizing Sensitivity in Medical Diagnosis Using Biased Minimax Probability Machine
"... Abstract—The challenging task of medical diagnosis based on machine learning techniques requires an inherent bias, i.e., the diagnosis should favor the “ill ” class over the “healthy ” class, since misdiagnosing a patient as a healthy person may delay the therapy and aggravate the illness. Therefore ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract—The challenging task of medical diagnosis based on machine learning techniques requires an inherent bias, i.e., the diagnosis should favor the “ill ” class over the “healthy ” class, since misdiagnosing a patient as a healthy person may delay the therapy and aggravate the illness. Therefore, the objective in this task is not to improve the overall accuracy of the classification, but to focus on improving the sensitivity (the accuracy of the “ill ” class) while maintaining an acceptable specificity (the accuracy of the “healthy ” class). Some current methods adopt roundabout ways to impose a certain bias toward the important class, i.e., they try to utilize some intermediate factors to influence the classification. However, it remains uncertain whether these methods can improve the classification performance systematically. In this paper, by engaging a novel learning tool, the biased minimax probability machine (BMPM), we deal with the issue in a more elegant way and directly achieve the objective of appropriate medical diagnosis. More specifically, the BMPM directly controls the worst case accuracies to incorporate a bias toward the “ill ” class. Moreover, in a distributionfree way, the BMPM derives the decision rule in such a way as to maximize the worst case sensitivity while maintaining an acceptable worst case specificity. By directly controlling the accuracies, the BMPM provides a more rigorous way to handle medical diagnosis; by deriving a distributionfree decision rule, the BMPM distinguishes itself from a large family of classifiers, namely, the generative classifiers, where an assumption on the data distribution is necessary. We evaluate the performance of the model and compare it with three traditional classifiers: thenearest neighbor, the naive Bayesian, and the C4.5. The test results on two medical datasets, the breastcancer dataset and the heart disease dataset, show that the BMPM outperforms the other three models. Index Terms—Biased classification, medical diagnosis, minimax probability machine, worst case accuracy. I.
Learning with Unlabeled Data
"... We consider the problem of learning from both labeled and unlabeled data through the analysis on the quality of the unlabeled data. Usually, learning from both labeled and unlabeled data is regarded as semisupervised learning, where the unlabeled data and the labeled data are assumed to be generate ..."
Abstract
 Add to MetaCart
We consider the problem of learning from both labeled and unlabeled data through the analysis on the quality of the unlabeled data. Usually, learning from both labeled and unlabeled data is regarded as semisupervised learning, where the unlabeled data and the labeled data are assumed to be generated from the same distribution. When this assumption is not satisfied, new learning paradigms are needed in order to effectively explore the information underneath the unlabeled data. This thesis consists of two parts: the first part analyzes the fundamental assumptions of semisupervised learning and proposes a few efficient semisupervised learning models; the second part discusses three learning frameworks in order to deal with the case that unlabeled data do not satisfy the conditions of semisupervised learning. In the first part, we deal with the unlabeled data that are in
Conjugate Relation between Loss Functions and Uncertainty Sets in Classification Problems
"... There are two main approaches to binary classification problems: the loss function approach and the uncertainty set approach. The loss function approach is widely used in realworld data analysis. Statistical decision theory has been used to elucidate its properties such as statistical consistency. ..."
Abstract
 Add to MetaCart
There are two main approaches to binary classification problems: the loss function approach and the uncertainty set approach. The loss function approach is widely used in realworld data analysis. Statistical decision theory has been used to elucidate its properties such as statistical consistency. Conditional probabilities can also be estimated by using the minimum solution of the loss function. In the uncertainty set approach, an uncertainty set is defined for each binary label from training samples. The best separating hyperplane between the two uncertainty sets is used as the decision function. Although the uncertainty set approach provides an intuitive understanding of learning algorithms, its statistical properties have not been sufficiently studied. In this paper, we show that the uncertainty set is deeply connected with the convex conjugate of a loss function. On the basis of the conjugate relation, we propose a way of revising the uncertainty set approach so that it will have good statistical properties such as statistical consistency. We also introduce statistical models corresponding to uncertainty sets in order to estimate conditional probabilities. Finally, we present numerical experiments, verifying that the learning with revised uncertainty sets improves the prediction accuracy.
An Extension of a Minimax Approach to Multiple Classification
, 2006
"... Abstract When the mean vectors and the covariance matrices of two classes are available in a binary classification problem, Lanckriet et al. [6] propose a minimax approach for finding a linear classifier which minimizes the worstcase (maximum) misclassification probability. We extend the minimax ap ..."
Abstract
 Add to MetaCart
Abstract When the mean vectors and the covariance matrices of two classes are available in a binary classification problem, Lanckriet et al. [6] propose a minimax approach for finding a linear classifier which minimizes the worstcase (maximum) misclassification probability. We extend the minimax approach to a multiple classification problem, where the number m of classes could be more than two. Assume that the mean vectors and the covariance matrices of all the classes are available, but no further assumptions are made with respect to classconditional distributions. Then we define a problem for finding linear classifiers which minimize the worstcase misclassification probability ¯α. Unfortunately, no efficient algorithms for solving the problem are known. So we introduce the maximum pairwise misclassification probability ¯ β instead of ¯α. It is shown that ¯ β is a lower bound of ¯α and a good approximation of ¯α when m or ¯α are small. We define a problem for finding linear classifiers which minimize the probability ¯ β and show some basic properties of the problem. Then the problem is transformed to a parametric Second Order Cone Programming problem (SOCP). We propose an algorithm for solving it by using properties of the problem. We conduct preliminary numerical experiments and confirm that classifiers computed by our method work very well to benchmark problems.