MetaCart Sign in to MyCiteSeerX

Include Citations | Advanced Search | Help

Disambiguated Search | Include Citations | Advanced Search | Help

Pivoted Document Length Normalization (1996) [261 citations — 17 self]

by Amit Singhal ,  Chris Buckley ,  Mandar Mitra ,  Ar Mitra
Add To MetaCart

Abstract:

Automatic information retrieval systems have to deal with documents of varying lengths in a text collection. Document length normalization is used to fairly retrieve documents of all lengths. In this study, we observe that a normalization scheme that retrieves documents of all lengths with similar chances as their likelihood of relevance will outperform another scheme which retrieves documents with chances very different from their likelihood of relevance. We show that the retrieval probabilities for a particular normalization method deviate systematically from the relevance probabilities across different collections. We present pivoted normalization, a technique that can be used to modify any normalization function thereby reducing the gap between the relevance and the retrieval probabilities. Training pivoted normalization on one collection, we can successfully use it on other (new) text collections, yielding a robust, collection independent normalization technique. We use the idea o...

Citations

2329 Introduction to modern information retrieval – Salton - 1983
915 Term-weighting approaches in automatic text retrieval – Salton, Buckley - 1988
260 Okapi at TREC-3 – Robertson, Walker, et al. - 1992
215 Some simple effective approximations to 2-Poisson method for probabilistic weighted retrieval – Robertson, Walker - 1994
189 Inference networks for document retrieval – Turtle, Croft - 1990
174 Overview of the Third Text REtrieval Conference – Harman - 1995
109 Generalized vector space model in information retrieval – Wong, Ziarko, et al. - 1985
40 Document retrieval and routing using the INQUERY system – Broglio, Callan, et al. - 1995
23 Automatic Text Processing---the Transformation, Analysis and Retrieval of Information by Computer – Salton - 1989
22 Amit Singhal. Automatic query expansion using SMART: TREC 3 – Buckley, Salton, et al. - 1995
16 The Importance of Proper Weighting Methods – Buckley - 1993
13 Length Normalization in Degraded Text Collections – Singhal, Salton, et al. - 1995
3 Document Length Normalization. Information Processing and Management (to appear). Also – Singhal, Salton, et al. - 1995