Microsoft Research, MIT AI Lab, and University of Washington; Dept. of Computer Science and Engineering
SVM HeaderParse 0.2
Box 352350; Seattle, WA 98195
SVM HeaderParse 0.1
Abusive messages (flames) can be both a source of frustration and a waste of time for Internet users. This paper describes some approaches to flame recognition, including a prototype system, Smokey. Smokey builds a 47-element feature vector based on the syntax and semantics of each sentence, combining the vectors for the sentences within each message. A training set of 720 messages was used by Quinlan's C4.5 decision-tree generator to determine featurebased rules that were able to correctly categorize 64% of the flames and 98% of the non-flames in a separate test set of 460 messages. Additional techniques for greater accuracy and user customization are also discussed. Introduction Flames are one of the current hazards of on-line communication. While some people enjoy exchanging flames, most users consider these abusive and insulting messages to be a nuisance or even upsetting. I describe Smokey, a prototype system to automatically recognize email flames. Smokey combines natural-langu...