MetaCart Sign in to MyCiteSeerX

Include Citations | Advanced Search | Help

Disambiguated Search | Include Citations | Advanced Search | Help

Adversarial Classification (2004)

by Nilesh Dalvi Pedro ,  Pedro Domingos ,  Mausam Sumit ,  Sanghai Deepak Verma
In KDD
Add To MetaCart

Abstract:

Essentially all data mining algorithms assume that the datagenerating process is independent of the data miner's activities. However, in many domains, including spam detection, intrusion detection, fraud detection, surveillance and counter-terrorism, this is far from the case: the data is actively manipulated by an adversary seeking to make the classifier produce false negatives. In these domains, the performance of a classifier can degrade rapidly after it is deployed, as the adversary learns to defeat it. Currently the only solution to this is repeated, manual, ad hoc reconstruction of the classifier. In this paper we develop a formal framework and algorithms for this problem. We view classification as a game between the classifier and the adversary, and produce a classifier that is optimal given the adversary's optimal strategy. Experiments in a spam detection domain show that this approach can greatly outperform a classifier learned in the standard way, and (within the parameters of the problem) automatically adapt the classifier to the adversary's evolving manipulations.

Citations

773 Game Theory – Fudenberg, Tirole - 1993
592 Wrappers for feature subset selection – Kohavi, John - 1996
514 A comparison of event models for naive bayes text classification – McCallum, Nigam - 1998
363 On the optimality of the simple Bayesian classifier under zero-one loss – Domingos, Pazzani - 1997
341 The Theory of Learning in Games – Fudenberg, Levine - 1998
340 Markov games as a framework for multi-agent reinforcement learning – Littman - 1994
309 Evolution and the Theory of Games – Smith - 1982
272 A Scalable Comparison-Shopping Agent for the World Wide Web – Doorenbos, Etzioni, et al. - 1997
236 A Bayesian approach to filtering junk e-mail – Sahami, Dumais, et al. - 1998
170 Robust classification for imprecise environments – Provost, Fawcett
155 Mining time-changing data streams – Hulten, Spencer, et al. - 2001
153 MetaCost: A general method for making classifiers cost-sensitive – Domingos - 1999
119 Adaptive Fraud Detection – Fawcett, Provost - 1997
57 Learning Nonstationary Models of Normal Network Traffic for Detecting Novel Attacks – Mahoney, Chan - 2002
25 A memory-based approach to anti-spam filtering for mailing lists – Sakkis, Androutsopoulos, et al. - 2003
12 Adaptive image analysis for aerial surveillance – Robertson, Brady - 1999
8 Cost-sensitive learning bibliography. C Online bibliography – Turney
6 Information awareness: A prospective technical assessment – Jensen, Rattigan, et al. - 2003
5 In vivo” spam filtering: A challenge problem for KDD – Fawcett - 2003
4 Online piracy spurs high-tech arms race – Krebs - 2003
4 Ongoing Management and Application of Discovered Knowledge in a Large Regulatory Organization: A Case Study of the Use and – Senator - 2000
1 Retailers rise in Google rankings as rivals cry foul – Guernsey - 2003
1 Computational game theory. Tutorial – Kearns - 2002
1 Been gazumped by Google? Trying to make sense of the “Florida” update. Search Engine Guide – Lloyd - 2003
1 Email data – Nielsen - 2003
1 Ifile spam classifier – Rennie - 2003