Results 1  10
of
22
Nightmare at test time: Robust learning by feature deletion
 In ICML
, 2006
"... When constructing a classifier from labeled data, it is important not to assign too much weight to any single input feature, in order to increase the robustness of the classifier. This is particularly important in domains with nonstationary feature distributions or with input sensor failures. A comm ..."
Abstract

Cited by 42 (3 self)
 Add to MetaCart
When constructing a classifier from labeled data, it is important not to assign too much weight to any single input feature, in order to increase the robustness of the classifier. This is particularly important in domains with nonstationary feature distributions or with input sensor failures. A common approach to achieving such robustness is to introduce regularization which spreads the weight more evenly between the features. However, this strategy is very generic, and cannot induce robustness specifically tailored to the classification task at hand. In this work, we introduce a new algorithm for avoiding single feature overweighting by analyzing robustness using a game theoretic formalization. We develop classifiers which are optimally resilient to deletion of features in a minimax sense, and show how to construct such classifiers using quadratic programming. We illustrate the applicability of our methods on spam filtering and handwritten digit recognition tasks, where feature deletion is indeed a realistic noise model. 1. Building Robust Classifiers When constructing classifiers over high dimensional spaces such as texts or images, one is inherently faced with the problem of undersampling of the true data distribution. Even socalled “discriminative ” methods which focus on minimizing classification error (or a bound on it) are exposed to this difficulty since the training objective will be calculated over the observed input vectors only, and thus may not be a good approximation of the average objective on the test data. This is especially important in settings such as document
Theory and applications of Robust Optimization
, 2007
"... In this paper we survey the primary research, both theoretical and applied, in the field of Robust Optimization (RO). Our focus will be on the computational attractiveness of RO approaches, as well as the modeling power and broad applicability of the methodology. In addition to surveying the most pr ..."
Abstract

Cited by 26 (5 self)
 Add to MetaCart
(Show Context)
In this paper we survey the primary research, both theoretical and applied, in the field of Robust Optimization (RO). Our focus will be on the computational attractiveness of RO approaches, as well as the modeling power and broad applicability of the methodology. In addition to surveying the most prominent theoretical results of RO over the past decade, we will also present some recent results linking RO to adaptable models for multistage decisionmaking problems. Finally, we will highlight successful applications of RO across a wide spectrum of domains, including, but not limited to, finance, statistics, learning, and engineering.
Nash Equilibria of Static Prediction Games
"... The standard assumption of identically distributed training and test data is violated when an adversary can exercise some control over the generation of the test data. In a prediction game, a learner produces a predictive model while an adversary may alter the distribution of input data. We study si ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
(Show Context)
The standard assumption of identically distributed training and test data is violated when an adversary can exercise some control over the generation of the test data. In a prediction game, a learner produces a predictive model while an adversary may alter the distribution of input data. We study singleshot prediction games in which the cost functions of learner and adversary are not necessarily antagonistic. We identify conditions under which the prediction game has a unique Nash equilibrium, and derive algorithms that will find the equilibrial prediction models. In a case study, we explore properties of Nashequilibrial prediction models for email spam filtering empirically. 1
Static prediction games for adversarial learning problems
 Journal of Machine Learning Research
"... The standard assumption of identically distributed training and test data is violated when the test data are generated in response to the presence of a predictive model. This becomes apparent, for example, in the context of email spam filtering. Here, email service providers employ spam filters, an ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
The standard assumption of identically distributed training and test data is violated when the test data are generated in response to the presence of a predictive model. This becomes apparent, for example, in the context of email spam filtering. Here, email service providers employ spam filters, and spam senders engineer campaign templates to achieve a high rate of successful deliveries despite the filters. We model the interaction between the learner and the data generator as a static game in which the cost functions of the learner and the data generator are not necessarily antagonistic. We identify conditions under which this prediction game has a unique Nash equilibrium and derive algorithms that find the equilibrial prediction model. We derive two instances, the Nash logistic regression and the Nash support vector machine, and empirically explore their properties in a case study on email spam filtering.
Stackelberg games for adversarial prediction problems
 In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
, 2011
"... The standard assumption of identically distributed training and test data is violated when test data are generated in response to a predictive model. This becomes apparent, for example, in the context of email spam filtering, where an email service provider employs a spam filter and the spam sender ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
The standard assumption of identically distributed training and test data is violated when test data are generated in response to a predictive model. This becomes apparent, for example, in the context of email spam filtering, where an email service provider employs a spam filter and the spam sender can take this filter into account when generating new emails. We model the interaction between learner and data generator as a Stackelberg competition in which the learner plays the role of the leader and the data generator may react on the leader’s move. We derive an optimization problem to determine the solution of this game and present several instances of the Stackelberg prediction game. We show that the Stackelberg prediction game generalizes existing prediction models. Finally, we explore properties of the discussed models empirically in the context of email spam filtering.
Using machine teaching to identify optimal trainingset attacks on machine learners
 in ‘The TwentyNinth AAAI Conference on Artificial Intelligence
, 2015
"... We investigate a problem at the intersection of machine learning and security: trainingset attacks on machine learners. In such attacks an attacker contaminates the training data so that a specific learning algorithm would produce a model profitable to the attacker. Understanding trainingset atta ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
We investigate a problem at the intersection of machine learning and security: trainingset attacks on machine learners. In such attacks an attacker contaminates the training data so that a specific learning algorithm would produce a model profitable to the attacker. Understanding trainingset attacks is important as more intelligent agents (e.g. spam filters and robots) are equipped with learning capability and can potentially be hacked via data they receive from the environment. This paper identifies the optimal trainingset attack on a broad family of machine learners. First we show that optimal trainingset attack can be formulated as a bilevel optimization problem. Then we show that for machine learners with certain KarushKuhnTucker conditions we can solve the bilevel problem efficiently using gradient methods on an implicit function. As examples, we demonstrate optimal trainingset attacks on Support Vector Machines, logistic regression, and linear regression with extensive experiments. Finally, we discuss potential defenses against such attacks.
Convex Adversarial Collective Classification
"... In this paper, we present a novel method for robustly performing collective classification in the presence of a malicious adversary that can modify up to a fixed number of binaryvalued attributes. Our method is formulated as a convex quadratic program that guarantees optimal weights against a worst ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
In this paper, we present a novel method for robustly performing collective classification in the presence of a malicious adversary that can modify up to a fixed number of binaryvalued attributes. Our method is formulated as a convex quadratic program that guarantees optimal weights against a worstcase adversary in polynomial time. In addition to increased robustness against active adversaries, this kind of adversarial regularization can also lead to improved generalization even when no adversary is present. In experiments on real and simulated data, our method consistently outperforms both nonadversarial and nonrelational baselines. 1.
Adversarial Support Vector Machine Learning
"... Many learning tasks such as spam filtering and credit card fraud detection face an active adversary that tries to avoid detection. For learning problems that deal with an active adversary, it is important to model the adversary’s attack strategy and develop robust learning models to mitigate the att ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Many learning tasks such as spam filtering and credit card fraud detection face an active adversary that tries to avoid detection. For learning problems that deal with an active adversary, it is important to model the adversary’s attack strategy and develop robust learning models to mitigate the attack. These are the two objectives of this paper. We consider two attack models: a freerange attack model that permits arbitrary data corruption and a restrained attack model that anticipates more realistic attacks that a reasonable adversary would devise under penalties. We then develop optimal SVM learning strategies against the two attack models. The learning algorithms minimize the hinge loss while assuming the adversary is modifying data to maximize the loss. Experiments are performed on both artificial and real data sets. We demonstrate that optimal solutions may be overly pessimistic when the actual attacks are much weaker than expected. More important, we demonstrate that it is possible to develop a much more resilient SVM learning model while making loose assumptions on the data corruption models. When derived under the restrained attack model, our optimal SVM learning strategy provides more robust overall performance under a wide range of attack parameters.
A Simple Geometric Interpretation of SVM using Stochastic Adversaries
"... We present a minimax framework for classification that considers stochastic adversarial perturbations to the training data. We show that for binary classification it is equivalent to SVM, but with a very natural interpretation of regularization parameter. In the multiclass case, we obtain that our f ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
We present a minimax framework for classification that considers stochastic adversarial perturbations to the training data. We show that for binary classification it is equivalent to SVM, but with a very natural interpretation of regularization parameter. In the multiclass case, we obtain that our formulation is equivalent to regularizing the hinge loss with the maximum norm of the weight vector (i.e., the twoinfinity norm). We test this new regularization scheme and show that it is competitive with the Frobenius regularization commonly used for multiclass SVM. We proceed to analyze various forms of stochastic perturbations and obtain compact optimization problems for the optimal classifiers. Taken together, our results illustrate the advantage of using stochastic perturbations rather than deterministic ones, as well as offer a simple geometric interpretation for SVM optimization. 1
Interval Data Classification under Partial Information: A ChanceConstraint Approach
"... Abstract. This paper presents a novel methodology for constructing maximummargin classifiers which are robust to intervalvalued uncertainty in examples. The idea is to employ chanceconstraints which ensure that the uncertain examples are classified correctly with high probability. The key novelty ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Abstract. This paper presents a novel methodology for constructing maximummargin classifiers which are robust to intervalvalued uncertainty in examples. The idea is to employ chanceconstraints which ensure that the uncertain examples are classified correctly with high probability. The key novelty is in employing Bernstein bounding schemes to relax the resulting chanceconstrained program as a convex second order cone program. The Bernstein based relaxations presented in the paper require the knowledge of support and mean of the uncertain examples alone and make no assumptions on distributions regarding the underlying uncertainty. Classifiers built using the proposed methodology model intervalvalued uncertainty in a less conservative fashion and hence are expected to generalize better than existing methods. Experimental results on synthetic and realworld datasets show that the proposed classifiers are better equipped to handle intervalvalued uncertainty than stateoftheart. 1