MetaCart Sign in to MyCiteSeerX

Include Citations | Advanced Search | Help

Disambiguated Search | Include Citations | Advanced Search | Help

Boosting and Rocchio Applied to Text Filtering (1998) [83 citations — 2 self]

Abstract:

We discuss two learning algorithms for text filtering: modified Rocchio and a boosting algorithm called AdaBoost. We show how both algorithms can be adapted to maximize any general utility matrix that associates cost (or gain) for each pair of machine prediction and correct label. We first show that AdaBoost significantly outperforms another highly effective text filtering algorithm. We then compare AdaBoost and Rocchio over three large text filtering tasks. Overall both algorithms are comparable and are quite effective. AdaBoost produces better classifiers than Rocchio when the training collection contains a very large number of relevant documents. However, on these tasks, Rocchio runs much faster than AdaBoost. 1 Introduction It is becoming increasingly hard to cope with the explosion of electronic information that is now available. Information filtering systems that automatically send articles of potential interest are becoming a necessity in this information age. Typically users i...

Citations

2329 Introduction to modern information retrieval – Salton - 1983
1205 Schapire, “Decision-theoretic generalization of on-line learning and application to boosting – Freund, E - 1997
1053 Text Categorization with Support Vector Machines: Learning with Many Relevant Features – Joachims - 1998
1045 Experiments with a new boosting algorithm – Freund, Schapire - 1996
594 Relevance feedback in information retrieval – Rocchio - 1971
500 Boosting the margin: A new explanation for the effectiveness of voting methods – Schapire, Freund, et al. - 1998
465 Improving retrieval performance by relevance feedback – Salton, Buckley - 1990
346 An evaluation of statistical approaches to text categorization – Yang - 1999
282 A sequential algorithm for training text classifiers – Lewis, Gale - 1994
261 Pivoted document length normalization – Singhal, Buckley, et al. - 1996
222 Bagging, boosting, and C4.5 – Quinlan - 1996
215 Some simple effective approximations to 2-Poisson method for probabilistic weighted retrieval – Robertson, Walker - 1994
213 A comparison of two learning algorithms for text categorization – Lewis, Ringuette - 1994
209 Training algorithms for linear text classifiers – Lewis, Schapire, et al. - 1996
194 Context-sensitive learning methods for text categorization – Cohen, Singer - 1996
174 Overview of the Third Text REtrieval Conference – Harman - 1995
172 An evaluation of phrasal and clustered representations on a text categorization task – Lewis - 1992
147 Employing EM in pool-based active learning for text classification – McCallum, Nigam - 1998
127 Expert network: Effective and efficient learning from human decisions in text categorization and retrieval – Yang - 1994
109 Generalized vector space model in information retrieval – Wong, Ziarko, et al. - 1985
105 Empirical support for winnow and weighted-majority based algorithms: results on a calendar scheduling domain – Blum - 1997
102 Optimization of relevance feedback weights – Buckley, Salton - 1995
89 Feature selection, perceptron learning, and a usability case study for text categorization – Ng, Goh, et al. - 1997
87 Overview of the sixth text retrieval conference – Voorhees, Harman - 1998
83 Incremental Relevance Feedback for Information Filtering – Allan - 1996
74 Towards language independent automated learning of text categorization models – Apté, Damerau, et al. - 1994
73 arcing classifiers – Bias - 1996
72 Evaluating and optimizing autonomous text classification systems – Lewis - 1995
55 Using and combining predictors that specialize – Freund, Schapire, et al. - 1997
49 Learning routing queries in a query zone – Singhal, Mitra, et al. - 1997
44 Document filtering with inference networks – Callan - 1996
44 Noise reduction in a statistical approach to text categorization – Yang - 1995
40 Method combination for document filtering – Hull, Pedersen, et al. - 1996
38 The TREC-7 filtering track: description and analysis – Hull - 1998
27 The trec-4 filtering track – Lewis - 1996
23 Automatic Text Processing---the Transformation, Analysis and Retrieval of Information by Computer – Salton - 1989
21 AT&T at TREC-6 – Singhal - 1998
16 The Importance of Proper Weighting Methods – Buckley - 1993
8 Document Retrieval Systems--Optimization and Evaluation – Rocchio - 1966
3 Improving performance by relevance feedback – Salton - 1990
2 An evaluation of statistical approachesto text categorization – Yang - 1999