• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval (2001)

Cached

  • Download as a PDF

Download Links

  • [www-2.cs.cmu.edu]
  • [www.cs.cmu.edu]
  • [www.cs.cmu.edu]
  • [sifaka.cs.uiuc.edu]
  • [www.cs.cmu.edu]
  • [sifaka.cs.uiuc.edu]
  • [www.aladdin.cs.cmu.edu]
  • [www-poleia.lip6.fr]
  • [www-connex.lip6.fr]
  • [sifaka.cs.uiuc.edu]
  • [sifaka.cs.uiuc.edu]
  • [hachita.nmsu.edu]

  • Other Repositories/Bibliography

  • DBLP
  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Chengxiang Zhai , John Lafferty
Citations:498 - 33 self
  • Summary
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@INPROCEEDINGS{Zhai01astudy,
    author = {Chengxiang Zhai and John Lafferty},
    title = {A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval},
    booktitle = {},
    year = {2001},
    pages = {334--342}
}

Years of Citing Articles

Bookmark

citeulike Connotea Bibsonomy Del.icio.us Digg Reddit

OpenURL

 

Abstract

Language modeling approaches to information retrieval are attractive and promising because they connect the problem of retrieval with that of language model estimation, which has been studied extensively in other application areas such as speech recognition. The basic idea of these approaches is to estimate a language model for each document, and to then rank documents by the likelihood of the query according to the estimated language model. A central issue in language model estimation is smoothing, the problem of adjusting the maximum likelihood estimator to compensate for data sparseness. In this article, we study the problem of language model smoothing and its influence on retrieval performance. We examine the sensitivity of retrieval performance to the smoothing parameters and compare several popular smoothing methods on different test collections. Experimental results show that not only is the retrieval performance generally sensitive to the smoothing parameters, but also the sensitivity pattern is affected by the query type, with performance being more sensitive to smoothing for verbose queries than for keyword queries. Verbose queries also generally require more aggressive smoothing to achieve optimal performance. This suggests that smoothing plays two different role—to make the estimated document language model more accurate and to “explain ” the noninformative words in the query. In order to decouple these two distinct roles of

Citations

1216 Term-weighting approaches in automatic text retrieval - Salton, Buckley - 1988
684 A language modeling approach to information retrieval - Ponte, Croft - 1998
631 J: An Empirical Study of Smoothing Techniques for Language Modeling - SF, Goodman - 1996
594 A vector space model for automatic indexing - Salton, Wong, et al. - 1975
574 Estimation of probabilities from sparse data for the language model component of a speech recogniser - Katz - 1987
538 Improving retrieval performance by relevance feedback - Salton, Buckley - 1990
370 Okapi at trec-3 - Robertson, Walker, et al. - 1994
313 M.: Pivoted document length normalization - Singhal, Buckley, et al. - 1996
286 The population frequencies of species and the estimation of population parameters - Good - 1953
286 Interpolated estimation of Markov source parameters from sparse data - Jelinek, Mercer - 1980
244 Relevance-based language models - Lavrenko, Croft - 2001
234 Document language models, query models, and risk minimization for information retrieval - Lafferty, Zhai - 2001
220 Information retrieval as statistical translation - Berger, Lafferty - 1999
179 Improved backing-off for m-gram language modeling - Kneser, Ney - 1995
158 A Hidden Markov Model Information Retrieval System - Miller, Leek, et al. - 1999
158 A non-classical logic for information retrieval - Rijsbergen - 1986
152 A general language model for information retrieval - Song, Croft - 1999
149 On structuring probabilistic dependencies in stochastic language modeling. Computer Speech and Language - Ney, Essen, et al. - 1994
114 The importance of prior probabilities for entry page search - Kraaij, Westerveld, et al. - 2002
101 Twenty-One at TREC-7: Ad-hoc and Cross-language Track - Hiemstra, Kraaij - 1999
95 On modeling information retrieval with probabilistic inference - Wong, Yao - 1995
87 Probabilistic Models in Information Retrieval - Fuhr - 1992
66 A hierarchical Dirichlet language model - MacKay, Peto - 1994
53 Model-based feedback in the kl-divergence retrieval model - Zhai, Lafferty
48 On the estimation of small probabilities by leaving-one-out - Ney, Essen, et al. - 1995
45 Probabilistic models of indexing and searching - ROBERTSON, RIJSBERGEN, et al. - 1981
35 Improving two-stage ad-hoc retrieval for short queries - Kwok, Chan, et al. - 1998
10 Rijsbergen - van - 1979
6 Improved smoothing for mgram language modeling - Kneser, Ney - 1995
4 Interpolated estimation of markov sourceparameters from sparse data - JELINEK, MERCER - 1980
3 A hierarchical Dirichlet language - MACKAY, L - 1995
2 Okapi at TREC-3," The Third Text REtrieval - Robertson, Walker, et al. - 1995
1 A Study of Smoothing Methods for Language Models 33 - LAVRENKO, CROFT - 2001
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University