MetaCart Sign in to MyCiteSeerX

Include Citations | Advanced Search | Help

Disambiguated Search | Include Citations | Advanced Search | Help

A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization (1997) [250 citations — 2 self]

Abstract:

The Rocchio relevance feedback algorithm is one of the most popular and widely applied learning methods from information retrieval. Here, a probabilistic analysis of this algorithm is presented in a text categorization framework. The analysis gives theoretical insight into the heuristics used in the Rocchio algorithm, particularly the word weighting scheme and the similarity metric. It also suggests improvements which lead to a probabilistic variant of the Rocchio classifier. The Rocchio classifier, its probabilistic variant, and a naive Bayes classifier are compared on six text categorization tasks. The results show that the probabilistic algorithms are preferable to the heuristic Rocchio classifier not only because they are more well-founded, but also because they achieve better performance.

Citations

No citations identified.