MetaCart Sign in to MyCiteSeerX

Include Citations | Advanced Search | Help

Disambiguated Search | Include Citations | Advanced Search | Help

Spam Filtering with Naive Bayes -- Which Naive Bayes? (2006)

by Vangelis Metsis Telecommunications ,  Vangelis Metsis
Third Conference on Email and Anti-Spam (CEAS
Add To MetaCart

Abstract:

Naive Bayes is very popular in commercial and open-source anti-spam e-mail filters. There are, however, several forms of Naive Bayes, something the anti-spam literature does not always acknowledge. We discuss five di#erent versions of Naive Bayes, and compare them on six new, non-encoded datasets, that contain ham messages of particular Enron users and fresh spam messages. The new datasets, which we make publicly available, are more realistic than previous comparable benchmarks, because they maintain the temporal order of the messages in the two categories, and they emulate the varying proportion of spam and ham messages that users receive over time. We adopt an experimental procedure that emulates the incremental training of personalized spam filters, and we plot roc curves that allow us to compare the di#erent versions of nb over the entire tradeo# between true positives and true negatives.

Citations

363 On the optimality of the simple Bayesian classifier under zero-one loss – Domingos, Pazzani - 1997
133 Vapnik V: Support vector machines for spam categorization – Drucker, Wu - 1995
63 33 experimental comparison of naive Bayesian and keyword-based antispam filtering with encrypted personal e-mail messages – Androutsopoulos, Koutsias, et al. - 2000
55 Boosting Trees for Anti-Spam Email Filtering – Carreras, Márquez - 2001
15 Automatic categorization of email into folders: Benchmark experiments on enron and sri corpora – Bekkerman, McCallum, et al. - 2004