## Dirichlet-enhanced spam filtering based on biased samples (2007)

### Cached

### Download Links

- [books.nips.cc]
- [www.informatik.hu-berlin.de]
- [www.cs.uni-potsdam.de]
- [www.mpi-inf.mpg.de]
- DBLP

### Other Repositories/Bibliography

Venue: | Advances in Neural Information Processing Systems 19 |

Citations: | 33 - 7 self |

### BibTeX

@INPROCEEDINGS{Bickel07dirichlet-enhancedspam,

author = {Steffen Bickel and Tobias Scheffer},

title = {Dirichlet-enhanced spam filtering based on biased samples},

booktitle = {Advances in Neural Information Processing Systems 19},

year = {2007},

pages = {161--168},

publisher = {MIT Press}

}

### OpenURL

### Abstract

We study a setting that is motivated by the problem of filtering spam messages for many users. Each user receives messages according to an individual, unknown distribution, reflected only in the unlabeled inbox. The spam filter for a user is required to perform well with respect to this distribution. Labeled messages from publicly available sources can be utilized, but they are governed by a distinct distribution, not adequately representing most inboxes. We devise a method that minimizes a loss function with respect to a user’s personal distribution based on the available biased sample. A nonparametric hierarchical Bayesian model furthermore generalizes across users by learning a common prior which is imposed on new email accounts. Empirically, we observe that bias-corrected learning outperforms naive reliance on the assumption of independent and identically distributed data; Dirichlet-enhanced generalization across users outperforms a single (“one size fits all”) filter as well as independent filters for all users. 1

### Citations

1292 |
Sample Selection Bias as a Specification error
- Heckman
- 1979
(Show Context)
Citation Context ...n to depend on several characteristics of the person queried; work that led to a method for correcting sample selection bias for a class of regression problems has been distinguished by a Nobel Prize =-=[6]-=-. In machine learning, the case of training data that is only biased with respect to the ratio of class labels has been studied [4, 7]. Zadrozny [14] has derived a bias correction theorem that applies... |

716 |
A Bayesian analysis of some nonparametric problems
- Ferguson
- 1973
(Show Context)
Citation Context ... us assume a parametric form (we employ a logistic model), and let wn+1 be the parameters that satisfy p(s = 1|x, θn+1, λ) = p(s = 1|x; wn+1) ∝ p(x|λ)/p(x|θn+1). We resort to a Dirichlet process (DP) =-=[5]-=- G(wi) as a model for the prior belief on wn+1 given w1, . . . , wn. Dirichlet process G|{α, G0} ∼ DP (α, G0) with concentration parameter α and base distribution G0 generates parameters wi: The first... |

375 | Markov Chain Sampling Methods for Dirichlet Process Mixture Models
- Neal
- 2000
(Show Context)
Citation Context ...le, an estimate wn+1|L, {Ui} n i=1 ∼ ˆ G has to be based on the available data. Exact calculation of this prior requires integrating over the w1, . . . , wn; since this is not feasible, MCMC sampling =-=[10]-=- or variational approximation [1] can be used. In our application, the model of p(s|x, θi, λ) involves a regularized logistic regression in a space of more than 800,000 dimensions. In each iteration o... |

270 | The foundations of cost-sensitive learning
- Elkan
- 2001
(Show Context)
Citation Context ...class of regression problems has been distinguished by a Nobel Prize [6]. In machine learning, the case of training data that is only biased with respect to the ratio of class labels has been studied =-=[4, 7]-=-. Zadrozny [14] has derived a bias correction theorem that applies when the bias is conditionally independent of the class label given the instance, and when every instance has a nonzero probability o... |

242 | Support Vector Machines for Spam Categorization
- Drucker, Wu, et al.
- 1999
(Show Context)
Citation Context ... a Bias-Corrected Support Vector Machine Given the requirement of high accuracy and the need to handle many attributes, SVMs are widely acknowledged to be a good learning mechanism for spam filtering =-=[2]-=-. The final bias-corrected SVM p(s=1|θi,λ) fn+1 can be trained by re-sampling or re-weighting L according to s(x) = p(s=1|x,Un+1;L,U1,...,Un) , where p(s|x, Un+1; L, U1, . . . , Un) is the empirical s... |

183 |
The class imbalance problem: A systematic study
- Japkowicz, Stephen
(Show Context)
Citation Context ...class of regression problems has been distinguished by a Nobel Prize [6]. In machine learning, the case of training data that is only biased with respect to the ratio of class labels has been studied =-=[4, 7]-=-. Zadrozny [14] has derived a bias correction theorem that applies when the bias is conditionally independent of the class label given the instance, and when every instance has a nonzero probability o... |

126 |
The Enron corpus: A new dataset for email classification research
- Klimt, Yang
- 2004
(Show Context)
Citation Context ...s caught in “spam traps” – email addresses that are published on the web in an invisible font and are harvested by spammers [11] – the Enron corpus that was disclosed in the course of the Enron trial =-=[8]-=-, and SpamAssassin data. These collections have diverse properties and none of them represents the global distribution of all emails, let alone the distribution received by some particular user. The r... |

81 | Text categorization based on regularized linear classifiers
- Zhang, Oles
- 2001
(Show Context)
Citation Context ... λ, θi) = p(s = 1|x; wi) = xu∈Ui p(s = 0|xu, w) � 1 . (6) 1 + e 〈wi,x〉 xℓ∈L p(s = 1|xℓ, w). (7) We train parameters wi = argmax w log P (sUi,L|w, Ui, L) + log η(w) (we write η(w) for the regularizer) =-=[15]-=- using the fast implementation of regularized logistic regression of [9]. 2.2 Dirichlet-Enhanced Bias Estimation This section addresses estimation of the sample bias p(s|x, θn+1, λ) for a new user n+1... |

80 | Learning and evaluating classifiers under sample selection bias
- Zadrozny
(Show Context)
Citation Context ...on problems has been distinguished by a Nobel Prize [6]. In machine learning, the case of training data that is only biased with respect to the ratio of class labels has been studied [4, 7]. Zadrozny =-=[14]-=- has derived a bias correction theorem that applies when the bias is conditionally independent of the class label given the instance, and when every instance has a nonzero probability of being drawn i... |

43 | Variational methods for the Dirichlet process
- Blei, Jordan
(Show Context)
Citation Context ... ∼ ˆ G has to be based on the available data. Exact calculation of this prior requires integrating over the w1, . . . , wn; since this is not feasible, MCMC sampling [10] or variational approximation =-=[1]-=- can be used. In our application, the model of p(s|x, θi, λ) involves a regularized logistic regression in a space of more than 800,000 dimensions. In each iteration of the MCMC process or the variati... |

25 | S.: Correcting sample selection bias in maximum entropy density estimation
- Dudík, Schapire, et al.
- 2005
(Show Context)
Citation Context ...ally independent of the class label given the instance, and when every instance has a nonzero probability of being drawn into the sample. Sample bias correction for maximum entropy density estimation =-=[3]-=- and the analysis of the generalization error under covariate shift [12] follow the same intuition. In our email spam filtering setting, a server handles many email accounts (in case of our industrial... |

10 | Logistic Regression for Data Mining and HighDimensional Classification
- Komarek
- 2004
(Show Context)
Citation Context ...∈L p(s = 1|xℓ, w). (7) We train parameters wi = argmax w log P (sUi,L|w, Ui, L) + log η(w) (we write η(w) for the regularizer) [15] using the fast implementation of regularized logistic regression of =-=[9]-=-. 2.2 Dirichlet-Enhanced Bias Estimation This section addresses estimation of the sample bias p(s|x, θn+1, λ) for a new user n+1 by generalizing across existing users U1, . . . , Un. The resulting est... |

8 | An introduction to nonparametric hierarchical bayesian modelling with a focus on multi-agent learning
- Tresp, Yu
- 2004
(Show Context)
Citation Context ...000 dimensions. In each iteration of the MCMC process or the variational inference of [1], logistic density estimators for all users would need to be trained—which is prohibitive. We therefore follow =-=[13]-=- and approximate the Dirichlet Process as ˆG(w) ≈ αG0 + �n i=1 φiδ(w∗ i ) . (11) α + n Compared to the original Equation 8, the sum of point distributions at true parameters wi is replaced by a weight... |

5 | Model selection under covariate shift
- Sugiyama, Müller
- 2005
(Show Context)
Citation Context ...instance has a nonzero probability of being drawn into the sample. Sample bias correction for maximum entropy density estimation [3] and the analysis of the generalization error under covariate shift =-=[12]-=- follow the same intuition. In our email spam filtering setting, a server handles many email accounts (in case of our industrial partner, several millions), and delivers millions of emails per day. A ... |

3 |
Eric Langheinrich. Understanding how spammers steal your e-mail address: An analysis of the first six months of data from project honey pot
- Prince, Dahl, et al.
- 2005
(Show Context)
Citation Context ... non-spam) sources are publicly available. They include collections of emails caught in “spam traps” – email addresses that are published on the web in an invisible font and are harvested by spammers =-=[11]-=- – the Enron corpus that was disclosed in the course of the Enron trial [8], and SpamAssassin data. These collections have diverse properties and none of them represents the global distribution of all... |