Results 1 -
1 of
1
An Intelligent Support System for Developing Text Classifiers
, 2004
"... In this paper we introduce a generic text mining system, called iTM. iTM builds models and supports the classification process. It deals with different kind of sources: text files, e-mail/newsgroup messages and HTML pages. iTM provides rich user feedback: important words and phrases of the text whic ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
In this paper we introduce a generic text mining system, called iTM. iTM builds models and supports the classification process. It deals with different kind of sources: text files, e-mail/newsgroup messages and HTML pages. iTM provides rich user feedback: important words and phrases of the text which are really significant in decision making. We have implemented a few active learning methods which reduce the number of manually labeled examples. We demonstrate the usefulness of iTM by using it for labeling 13000 HTML pages of the www.cs.vu.nl domain. The full classification process was done in a few hours. With 550 manually labeled examples the accuracy was given to 74%. Moreover we run a number of experiments on the 20-newsgroups dataset. We demonstrate that by using our system the number of manually labeled documents can be decreased by 45 % to achieve 80 % accuracy. Furthermore, with only 100 labeled examples our methods reach 42 % accuracy.

