Abstract:
In addressing the growing problem of junk E-mail on the Internet, we examine methods for the automated construction of filters to eliminate such unwanted messages from a user's mail stream. By casting this problem in a decision theoretic framework, we are able to make use of probabilistic learning methods in conjunction with a notion of differential misclassification cost to produce filters which are especially appropriate for the nuances of this task. While this may appear, at first, to be a straight-forward text classification problem, we show that by considering domain-specific features of this problem in addition to the raw text of E-mail messages, we can produce much more accurate filters. Finally, we show the efficacy of such filters in a real world usage scenario, arguing that this technology is mature enough for deployment. Introduction As the number of users connected to the Internet continues to skyrocket, electronic mail (E-mail) is quickly becoming one of the fastest and m...
Citations
|
5044
|
Statistical Learning Theory
– Vapnik
- 1998
|
|
4923
|
Elements of Information Theory
– Cover, Thomas
- 1991
|
|
4701
|
Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference
– Pearl
- 1988
|
|
2329
|
Introduction to modern information retrieval
– Salton
- 1983
|
|
1053
|
Text Categorization with Support Vector Machines: Learning with Many Relevant Features
– Joachims
- 1998
|
|
726
|
A bayesian method for the induction of probabilistic networks from data
– Cooper, Herskovits
- 1992
|
|
615
|
Learning Bayesian networks: The combination of knowledge and statistical data
– Heckerman, Geiger, et al.
- 1995
|
|
301
|
Hierarchically classifying documents using very few words
– Koller, Sahami
- 1997
|
|
259
|
Toward optimal feature selection
– Koller, Sahami
- 1996
|
|
213
|
A comparison of two learning algorithms for text categorization
– Lewis, Ringuette
- 1994
|
|
177
|
Boolean feature discovery in empirical learning
– Pagallo, Haussler
- 1990
|
|
169
|
Improving text classification by shrinkage in a hierarchy of classes
– McCallum, Rosenfeld, et al.
- 1998
|
|
126
|
Learning rules that classify e-mail
– Cohen
- 1996
|
|
115
|
The Estimation of Probabilities: An Essay on Modern Bayesian Methods
– Good
- 1965
|
|
82
|
Learning limited dependence Bayesian classifiers
– Sahami
- 1996
|
|
55
|
Feature selection in statistical learning of text categorization
– Yang, Pedersen
- 1997
|
|
44
|
Smokey: Automatic Recognition of Hostile Messages
– Spertus
- 1997
|
|
26
|
Bayesian network classifiers. Machine Learning (this volume
– Friedman, Geiger
- 1997
|