Results 1 -
8 of
8
Nightmare at test time: Robust learning by feature deletion
- In ICML
, 2006
"... When constructing a classifier from labeled data, it is important not to assign too much weight to any single input feature, in order to increase the robustness of the classifier. This is particularly important in domains with nonstationary feature distributions or with input sensor failures. A comm ..."
Abstract
-
Cited by 23 (2 self)
- Add to MetaCart
When constructing a classifier from labeled data, it is important not to assign too much weight to any single input feature, in order to increase the robustness of the classifier. This is particularly important in domains with nonstationary feature distributions or with input sensor failures. A common approach to achieving such robustness is to introduce regularization which spreads the weight more evenly between the features. However, this strategy is very generic, and cannot induce robustness specifically tailored to the classification task at hand. In this work, we introduce a new algorithm for avoiding single feature over-weighting by analyzing robustness using a game theoretic formalization. We develop classifiers which are optimally resilient to deletion of features in a minimax sense, and show how to construct such classifiers using quadratic programming. We illustrate the applicability of our methods on spam filtering and handwritten digit recognition tasks, where feature deletion is indeed a realistic noise model. 1. Building Robust Classifiers When constructing classifiers over high dimensional spaces such as texts or images, one is inherently faced with the problem of under-sampling of the true data distribution. Even so-called “discriminative ” methods which focus on minimizing classification error (or a bound on it) are exposed to this difficulty since the training objective will be calculated over the observed input vectors only, and thus may not be a good approximation of the average objective on the test data. This is especially important in settings such as document
Theory and applications of Robust Optimization
, 2007
"... In this paper we survey the primary research, both theoretical and applied, in the field of Robust Optimization (RO). Our focus will be on the computational attractiveness of RO approaches, as well as the modeling power and broad applicability of the methodology. In addition to surveying the most pr ..."
Abstract
-
Cited by 9 (4 self)
- Add to MetaCart
In this paper we survey the primary research, both theoretical and applied, in the field of Robust Optimization (RO). Our focus will be on the computational attractiveness of RO approaches, as well as the modeling power and broad applicability of the methodology. In addition to surveying the most prominent theoretical results of RO over the past decade, we will also present some recent results linking RO to adaptable models for multi-stage decision-making problems. Finally, we will highlight successful applications of RO across a wide spectrum of domains, including, but not limited to, finance, statistics, learning, and engineering.
Nash Equilibria of Static Prediction Games
"... The standard assumption of identically distributed training and test data is violated when an adversary can exercise some control over the generation of the test data. In a prediction game, a learner produces a predictive model while an adversary may alter the distribution of input data. We study si ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
The standard assumption of identically distributed training and test data is violated when an adversary can exercise some control over the generation of the test data. In a prediction game, a learner produces a predictive model while an adversary may alter the distribution of input data. We study single-shot prediction games in which the cost functions of learner and adversary are not necessarily antagonistic. We identify conditions under which the prediction game has a unique Nash equilibrium, and derive algorithms that will find the equilibrial prediction models. In a case study, we explore properties of Nash-equilibrial prediction models for email spam filtering empirically. 1
CS 294 – Practical Machine Learning Term Project Robust Classification for Data with Interval Uncertainty and Label Errors
, 2006
"... A robust linear binary classification problem will be considered. Robustness will be for data with interval uncertainty, i.e., data points are unknown but their mean and bounds on their components are known. Convex optimization formulation for the problem is derived and the method is applied to a ge ..."
Abstract
- Add to MetaCart
A robust linear binary classification problem will be considered. Robustness will be for data with interval uncertainty, i.e., data points are unknown but their mean and bounds on their components are known. Convex optimization formulation for the problem is derived and the method is applied to a genomic micro-array data. An extension for this framework will be developed for data with uncertainties due to label errors. For this problem, the convex optimization formulation is derived. The implementation of this method is postponed due to lack of a useful data set. 1
Interval Data Classification under Partial Information: A Chance-Constraint Approach
"... Abstract. This paper presents a novel methodology for constructing maximum-margin classifiers which are robust to interval-valued uncertainty in examples. The idea is to employ chance-constraints which ensure that the uncertain examples are classified correctly with high probability. The key novelty ..."
Abstract
- Add to MetaCart
Abstract. This paper presents a novel methodology for constructing maximum-margin classifiers which are robust to interval-valued uncertainty in examples. The idea is to employ chance-constraints which ensure that the uncertain examples are classified correctly with high probability. The key novelty is in employing Bernstein bounding schemes to relax the resulting chance-constrained program as a convex second order cone program. The Bernstein based relaxations presented in the paper require the knowledge of support and mean of the uncertain examples alone and make no assumptions on distributions regarding the underlying uncertainty. Classifiers built using the proposed methodology model interval-valued uncertainty in a less conservative fashion and hence are expected to generalize better than existing methods. Experimental results on synthetic and real-world datasets show that the proposed classifiers are better equipped to handle interval-valued uncertainty than state-of-the-art. 1
JOURNAL OF L ATEX CLASS FILES 1 Robust Logistic Regression with Bounded Data Uncertainties
"... Abstract—Building on previous work in robust optimization, we present a formulation of robust logistic regression under bounded data uncertainties. The robust estimates are obtained using block coordinate gradient descent with iterative group thresholding, which zeros out highly uncertain variables. ..."
Abstract
- Add to MetaCart
Abstract—Building on previous work in robust optimization, we present a formulation of robust logistic regression under bounded data uncertainties. The robust estimates are obtained using block coordinate gradient descent with iterative group thresholding, which zeros out highly uncertain variables. For high dimensional problems with uncertain measurements, we discuss the addition of regularization penalties such that both robustness and block sparsity are imposed in the parameter estimates. An empirical approach to estimate the uncertainty magnitude is presented through the use of quantiles. We compare the results of ℓ1-Logistic Regression against ℓ1-Robust Logistic Regression on a real gene expression data set and achieve reductions in the worstcase false alarm rate and probability of error by 10 % − 20%, thus illustrating the value added of using robust classifiers in risk sensitive domains when confronted with uncertain measurements.
INVERSE PROBLEMS IN HIGH DIMENSIONAL STOCHASTIC SYSTEMS UNDER UNCERTAINTY
, 2010
"... If I can attain half of the success you have achieved in marriage and in life, I will have lived a full and purposeful life. A son could not ask for better parents. ii ACKNOWLEDGEMENTS I am extremely grateful to have been advised by a brilliantly creative human being. Professor Alfred Hero has allow ..."
Abstract
- Add to MetaCart
If I can attain half of the success you have achieved in marriage and in life, I will have lived a full and purposeful life. A son could not ask for better parents. ii ACKNOWLEDGEMENTS I am extremely grateful to have been advised by a brilliantly creative human being. Professor Alfred Hero has allowed me to mature as an independent researcher capable of abstractly analyzing complex problems. Between day to day interactions, coursework, and discussions about research, I am forever grateful for the interactions I have had with my committee members Professors Burns, Burmeister, Shedden, and Zhu. I will always appreciate the hands on interaction and development of ideas with my post-doctoral researchers Mark Kliger and Ami Wiesel. I am so appreciative of the time and effort you both spent with me, especially early on in my graduate student career. I cannot thank Arvind Rao enough for his wisdom and insight into developing good research topics and sharing a few good laughs on our road trip to Madison, WI. I know he will be a very successful faculty member some day. For both academic collaboration and extracurricular mischief, I will never forget the moments spent with fellow graduate students and now life long friends Yongsheng Huang and Arnau Tibau Puig. Most importantly, I thank my beautiful wife and best friend Erica for her love and patience over these past four years while pursuing my Ph.D. iii TABLE OF CONTENTS
Universal Learning over Related Distributions and Adaptive Graph Transduction
"... Abstract. The basis assumption that “training and test data drawn from the same distribution ” is often violated in reality. In this paper, we propose one common solution to cover various scenarios of learning under “different but related distributions ” in a single framework. Explicit examples incl ..."
Abstract
- Add to MetaCart
Abstract. The basis assumption that “training and test data drawn from the same distribution ” is often violated in reality. In this paper, we propose one common solution to cover various scenarios of learning under “different but related distributions ” in a single framework. Explicit examples include (a) sample selection bias between training and testing data, (b) transfer learning or no labeled data in target domain, and (c) noisy or uncertain training data. The main motivation is that one could ideally solve as many problems as possible with a single approach. The proposed solution extends graph transduction using the maximum margin principle over unlabeled data. The error of the proposed method is bounded under reasonable assumptions even when the training and testing distributions are different. Experiment results demonstrate that the proposed method improves the traditional graph transduction by as much as 15 % in accuracy and AUC in all common situations of distribution difference. Most importantly, it outperforms, by up to 10 % in accuracy, several state-of-art approaches proposed to solve specific category of distribution difference, i.e, BRSD [1] for sample selection bias, CDSC [2] for transfer learning, etc. The main claim is that the adaptive graph transduction is a general and competitive method to solve distribution differences implicitly without knowing and worrying about the exact type. These at least include sample selection bias, transfer learning, uncertainty mining, as well as those alike that are still not studied yet.Thesourcecodeanddatasetsareavailablefromtheauthors. 1

