• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization (1997)

Cached

  • Download as a PDF
  •  
  • Download as a PS

Download Links

  • [www.cs.cornell.edu]
  • [www-connex.lip6.fr]
  • [reports.adm.cs.cmu.edu]
  • [www-ai.informatik.uni-dortmund.de]
  • [www.cs.cornell.edu]
  • [www.cs.cornell.edu]
  • [www-ai.cs.uni-dortmund.de]
  • [www-ai.informatik.uni-dortmund.de]

  • Other Repositories/Bibliography

  • DBLP
  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Thorsten Joachims
Citations:285 - 1 self
  • Summary
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@INPROCEEDINGS{Joachims97aprobabilistic,
    author = {Thorsten Joachims},
    title = {A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization},
    booktitle = {},
    year = {1997},
    pages = {143--151}
}

Years of Citing Articles

Bookmark

citeulike Connotea Bibsonomy Del.icio.us Digg Reddit

OpenURL

 

Abstract

The Rocchio relevance feedback algorithm is one of the most popular and widely applied learning methods from information retrieval. Here, a probabilistic analysis of this algorithm is presented in a text categorization framework. The analysis gives theoretical insight into the heuristics used in the Rocchio algorithm, particularly the word weighting scheme and the similarity metric. It also suggests improvements which lead to a probabilistic variant of the Rocchio classifier. The Rocchio classifier, its probabilistic variant, and a naive Bayes classifier are compared on six text categorization tasks. The results show that the probabilistic algorithms are preferable to the heuristic Rocchio classifier not only because they are more well-founded, but also because they achieve better performance.

Citations

4071 C4.5: Programs for machine learning - Quinlan - 1993
1216 Term-weighting approaches in automatic text retrieval - Salton, Buckley - 1988
728 Relevance feedback in information retrieval - Rocchio - 1971
688 Estimation of dependencies based on empirical data - Vapnik - 1982
353 Newsweeder: Learning to filter netnews - Lang - 1995
290 WebWatcher: A Learning Apprentice for the World Wide Web. AAAI Spring Symposium on Information Gathering from Heterogeneous, Distributed Environments - Armstrong, Freitag, et al. - 1995
207 Automated learning of decision rules for text categorization - Apte, Damerau, et al. - 1994
186 An evaluation of phrasal and clustered representations on a text categorization task - Lewis - 1992
172 Greedy attribute selection - Caruana, Freitag
147 A Comparison of Classifiers and Document Representations for the Routing Problem - Schutze, Hull, et al. - 1995
142 The Effect of Adding Relevance Information in a Relevance Feedback Environment - Buckley, Salton, et al. - 1994
141 Representation and Learning in Information Retrieval - Lewis - 1992
116 Learning information retrieval agents: Experiments with automated web browsing. On-line Working - Balabonovic, Shoham
84 Classification algorithms - James - 1985
78 Models for retrieval with probabilistic indexing - Fuhr - 1989
32 Introduction to Probability Theory and Statistical Inference - Larson - 1982
21 A news story categorization system - Hayes, Knecht, et al. - 1988
11 Developments in Automatic Text - Salton - 1991
10 A Comparison of Search Term Weighting: Term Relevance vs. Inverse Document Frequency - Wu, Salton - 1981
4 A note on inverse document frequency weighting scheme [sic - Wong, Yao - 1989
2 Explanation and Generalization of Vector Models - Bookstein - 1982
1 Experiments in Learning from Text", to be published as a technical report - Freitag, Hirsh, et al. - 1996
1 An Analysis of Vector Space Models Based on - Wang, Wong, et al. - 1992
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University