## A Study of Smoothing Methods for Language Models Applied to Information Retrieval (2001)

### Cached

### Download Links

- [www-2.cs.cmu.edu]
- [www.cs.cmu.edu]
- [www.cs.cmu.edu]
- [www.iro.umontreal.ca]
- [sifaka.cs.uiuc.edu]
- [www.cs.cmu.edu]
- [sifaka.cs.uiuc.edu]
- [www.cs.cmu.edu]
- [www-2.cs.cmu.edu]
- [www.aladdin.cs.cmu.edu]
- [www-poleia.lip6.fr]
- [www-connex.lip6.fr]
- [sifaka.cs.uiuc.edu]
- [sifaka.cs.uiuc.edu]
- [hachita.nmsu.edu]
- DBLP

### Other Repositories/Bibliography

Citations: | 753 - 39 self |

### BibTeX

@MISC{Zhai01astudy,

author = {Chengxiang Zhai and John Lafferty},

title = {A Study of Smoothing Methods for Language Models Applied to Information Retrieval},

year = {2001}

}

### Years of Citing Articles

### OpenURL

### Abstract

this paper we study the problem of language model smoothing and its inuence on retrieval performance. We examine the sensitivity of retrieval performance to the smoothing parameters and compare several popular smoothing methods on dierent test collections. Experimental results show that not only is the retrieval performance generally sensitive to the smoothing parameters, but also the sensitivity pattern is aected by the query type, with performance being more sensitive to smoothing for verbose queries than for keyword queries. Verbose queries also generally require more aggressive smoothing to achieve optimal performance. This suggests that smoothing plays two dierent role|to make the estimated document language model more accurate and to \explain" the non-informative words in the query. In order to decouple these two distinct roles of smoothing, we propose a two-stage smoothing strategy, which yields better sensitivity patterns and facilitates the setting of smoothing parameters automatically. We further propose methods for estimating the smoothing parameters automatically. Evaluation on ve dierent databases and four types of queries indicates that the two-stage smoothing method with the proposed parameter estimation methods consistently gives retrieval performance that is close to|or better than|the best results achieved using a single smoothing method and exhaustive parameter search on the test data

### Citations

1673 | Term weighting approaches in automatic text retrieval
- Salton, Buckley
- 1988
(Show Context)
Citation Context ...s kinds of logic models and probabilistic models (e.g., [14, 3, 15, 22]). On the other hand, there have been many empirical studies of models, including many variants of the vector space model (e.g., =-=[17, 18, 19]-=-). In some cases, there have been theoretically motivated models that also perform well empirically; for example, the BM25 retrieval function, motivated by the 2-Poisson probabilistic retrieval model,... |

945 | A language modeling approach to information retrieval
- PONTE, B
- 1998
(Show Context)
Citation Context ...s how well the document \ts" the particular query q. In the simplest case, p(d) is assumed to be uniform, and so does not aect document ranking. This assumption has been taken in most existing wo=-=rk [1, 13, 12, 5, 20]-=-. In other cases, p(d) can be used to capture non-textual information, e.g., the length of a document or links in a web page, as well as other format/style features of a document. In our study, we ass... |

939 | A vector space model for automatic indexing - Salton, Wong, et al. - 1975 |

927 | An empirical study of smoothing techniques for language modeling
- Chen, Goodman
- 1998
(Show Context)
Citation Context ...ity to the unseen words and improve the accuracy of word probability estimation in general. There are many smoothing methods that have been proposed, mostly in the context of speech recognition tasks =-=[2]-=-. In general, all smoothing methods are trying to discount the probabilities of the words seen in the text, and to then assign the extra probability mass to the unseen words according to some \fallbac... |

700 | Estimation of Probabilities from Sparse Data for Language Model Component of a Speech Recognizer
- Katz
- 1987
(Show Context)
Citation Context ...ained by the eciency of the smoothing method. We selected three representative methods that are popular and relatively ecient to implement. We excluded some well-known methods, such as Katz smoothing =-=[7]-=- and Good-Turing estimation [4], because of the eciency constraint 2 . Although the methods we evaluated are simple, the issues that they bring to light are relevant to more advanced methods. The thre... |

650 | Improving retrieval performance by relevance feedback
- Salton, Buckley
- 1990
(Show Context)
Citation Context ...s kinds of logic models and probabilistic models (e.g., [14, 3, 15, 22]). On the other hand, there have been many empirical studies of models, including many variants of the vector space model (e.g., =-=[17, 18, 19]-=-). In some cases, there have been theoretically motivated models that also perform well empirically; for example, the BM25 retrieval function, motivated by the 2-Poisson probabilistic retrieval model,... |

492 | Okapi at TREC-3 - ROBERTSON, WALKER, et al. - 1995 |

399 |
The population frequencies of species and the estimation of population parameters. Biometrica 40:237–264
- Good
- 1953
(Show Context)
Citation Context ...othing method. We selected three representative methods that are popular and relatively ecient to implement. We excluded some well-known methods, such as Katz smoothing [7] and Good-Turing estimation =-=[4]-=-, because of the eciency constraint 2 . Although the methods we evaluated are simple, the issues that they bring to light are relevant to more advanced methods. The three methods are described below. ... |

386 | Pivoted document length normalization
- Singhal, Buckley, et al.
- 1996
(Show Context)
Citation Context ...s kinds of logic models and probabilistic models (e.g., [14, 3, 15, 22]). On the other hand, there have been many empirical studies of models, including many variants of the vector space model (e.g., =-=[17, 18, 19]-=-). In some cases, there have been theoretically motivated models that also perform well empirically; for example, the BM25 retrieval function, motivated by the 2-Poisson probabilistic retrieval model,... |

352 | Interpolated estimation of markov source parameters from sparse data - JELINEK, MERCER - 1980 |

339 | Relevance-based language models - Lavrenko, Croft - 2001 |

321 | Document language models, query models, and risk minimization for information retrieval - Lafferty, Zhai - 2001 |

299 | Improved backing-off for m-gram language modeling - Kneser, Ney - 1995 |

283 | Information retrieval as statistical translation
- Berger, Lafferty
- 1999
(Show Context)
Citation Context ...rs or to redistribute to lists, requires prior specific permission and/or a fee. SIGIR'01, September 9-12, 2001, New Orleans, Louisiana, USA Copyright 2001 ACM 1-58113-331-6/01/0009 ...$5.00. trieval =-=[13, 1, 10, 5]-=-. The basic idea behind the new approach is extremely simple|estimate a language model for each document, and rank documents by the likelihood of the query according to the language model. Yet this ne... |

203 |
A hidden Markov model information retrieval system
- Miller, Leek, et al.
- 1999
(Show Context)
Citation Context ...rs or to redistribute to lists, requires prior specific permission and/or a fee. SIGIR'01, September 9-12, 2001, New Orleans, Louisiana, USA Copyright 2001 ACM 1-58113-331-6/01/0009 ...$5.00. trieval =-=[13, 1, 10, 5]-=-. The basic idea behind the new approach is extremely simple|estimate a language model for each document, and rank documents by the likelihood of the query according to the language model. Yet this ne... |

201 | A general language model for information retrieval
- Song, Croft
- 1999
(Show Context)
Citation Context ...s how well the document \ts" the particular query q. In the simplest case, p(d) is assumed to be uniform, and so does not aect document ranking. This assumption has been taken in most existing wo=-=rk [1, 13, 12, 5, 20]-=-. In other cases, p(d) can be used to capture non-textual information, e.g., the length of a document or links in a web page, as well as other format/style features of a document. In our study, we ass... |

184 |
On structuring probabilistic dependencies in stochastic language modeling
- Ney, Essen, et al.
- 1994
(Show Context)
Citation Context ...ace method is a special case of this technique. Absolute discounting. The idea of the absolute discounting method is to lower the probability of seen words by subtracting a constant from their counts =-=[1-=-1]. It is similar to the Jelinek-Mercer method, but diers in that it discounts the seen word probability by subtracting a constant instead of multiplying it by (1-). The model is given by ps(w j d) = ... |

183 | A non-classical logic for Information Retrieval - Rijsbergen - 1986 |

135 | The Importance of Prior Probabilities for Entry Page Search - Kraaij, Westerveld, et al. |

112 | Probabilistic models in information retrieval
- Fuhr
- 1992
(Show Context)
Citation Context ...ways. On the one hand, theoretical studies of an underlying model have been developed; this direction is, for example, represented by the various kinds of logic models and probabilistic models (e.g., =-=[14, 3, 15, 22]-=-). On the other hand, there have been many empirical studies of models, including many variants of the vector space model (e.g., [17, 18, 19]). In some cases, there have been theoretically motivated m... |

109 | Twenty-one at TREC-7: Ad-hoc and cross-language track
- Hiemstra, Kraaij
- 1998
(Show Context)
Citation Context ...rs or to redistribute to lists, requires prior specific permission and/or a fee. SIGIR'01, September 9-12, 2001, New Orleans, Louisiana, USA Copyright 2001 ACM 1-58113-331-6/01/0009 ...$5.00. trieval =-=[13, 1, 10, 5]-=-. The basic idea behind the new approach is extremely simple|estimate a language model for each document, and rank documents by the likelihood of the query according to the language model. Yet this ne... |

99 |
On Modeling Information Retrieval with Probabilistic Inference
- Wong, Yao
- 1995
(Show Context)
Citation Context ...ways. On the one hand, theoretical studies of an underlying model have been developed; this direction is, for example, represented by the various kinds of logic models and probabilistic models (e.g., =-=[14, 3, 15, 22]-=-). On the other hand, there have been many empirical studies of models, including many variants of the vector space model (e.g., [17, 18, 19]). In some cases, there have been theoretically motivated m... |

83 | A Hierarchical Dirichlet Language Model - MacKay, Peto - 1994 |

63 | Model-based feedback in the KL-divergence retrieval model - Zhai, Lafferty - 2001 |

53 |
Probabilistic models of indexing and searching
- Robertson, Rijsbergen, et al.
- 1981
(Show Context)
Citation Context ...ways. On the one hand, theoretical studies of an underlying model have been developed; this direction is, for example, represented by the various kinds of logic models and probabilistic models (e.g., =-=[14, 3, 15, 22]-=-). On the other hand, there have been many empirical studies of models, including many variants of the vector space model (e.g., [17, 18, 19]). In some cases, there have been theoretically motivated m... |

53 | On the estimation of ‘small’ probabilities by leaving-one-out - Ney, Essen, et al. - 1995 |

42 | Improving two-stage ad-hoc retrieval for short queries - Kwok, Chan |

13 |
Rijsbergen
- van
- 1979
(Show Context)
Citation Context |

8 |
Improved smoothing for m-gram language modeling
- Kneser, Ney
- 1995
(Show Context)
Citation Context ...mplement the role of query modeling. Finally, there are many other eective smoothing algorithms that we have not yet tested (e.g., Good-Turing smoothing [4], Katz smoothing [7], Kneser-Ney smoothing [=-=8]-=-); evaluation of them would be a natural further research direction. It is also very important to study how to exploit the past relevance judgments, the current query, and the current database to trai... |

8 | Interpolated estimation of markov sourceparameters from sparse data - Jelinek, Mercer - 1980 |

3 | A hierarchical Dirichlet language - MACKAY, L - 1995 |

2 |
Okapi at TREC-3," The Third Text REtrieval
- Robertson, Walker, et al.
- 1995
(Show Context)
Citation Context ...ly motivated models that also perform well empirically; for example, the BM25 retrieval function, motivated by the 2-Poisson probabilistic retrieval model, has proven to be quite eective in practice [=-=16]-=-. Recently, a new approach based on language modeling has been successfully applied to the problem of ad hoc rePermission to make digital or hard copies of all or part of this work for personal or cla... |

1 | A Study of Smoothing Methods for Language Models 33 - LAVRENKO, CROFT - 2001 |

1 |
Okapi at TREC-3,” The Third Text REtrieval
- Robertson, Walker, et al.
- 1995
(Show Context)
Citation Context ...y motivated models that also perform well empirically; for example, the BM25 retrieval function, motivated by the 2-Poisson probabilistic retrieval model, has proven to be quite effective in practice =-=[16]-=-. Recently, a new approach based on language modeling has been successfully applied to the problem of ad hoc rePermission to make digital or hard copies of all or part of this work for personal or cla... |