## A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval (2001)

### Cached

### Download Links

- [www-2.cs.cmu.edu]
- [www.cs.cmu.edu]
- [www.cs.cmu.edu]
- [www.iro.umontreal.ca]
- [sifaka.cs.uiuc.edu]
- [www.cs.cmu.edu]
- [sifaka.cs.uiuc.edu]
- [www.cs.cmu.edu]
- [www-2.cs.cmu.edu]
- [www.aladdin.cs.cmu.edu]
- [www-poleia.lip6.fr]
- [www-connex.lip6.fr]
- [sifaka.cs.uiuc.edu]
- [sifaka.cs.uiuc.edu]
- [hachita.nmsu.edu]
- DBLP

### Other Repositories/Bibliography

Citations: | 757 - 39 self |

### BibTeX

@INPROCEEDINGS{Zhai01astudy,

author = {Chengxiang Zhai},

title = {A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval},

booktitle = {},

year = {2001},

pages = {334--342}

}

### Years of Citing Articles

### OpenURL

### Abstract

### Citations

1713 | Buckley,“Term Weighting Approaches in Automatic Text Retrieval
- Salton, C
- 1987
(Show Context)
Citation Context ...s kinds of logic models and probabilistic models (e.g., [14, 3, 15, 22]). On the other hand, there have been many empirical studies of models, including many variants of the vector space model (e.g., =-=[17, 18, 19]-=-). In some cases, there have been theoretically motivated models that also perform well empirically; for example, the BM25 retrieval function, motivated by the 2-Poisson probabilistic retrieval model,... |

971 | A vector space model for automatic indexing - Salton, Wong, et al. - 1975 |

956 | A language modeling approach to information retrieval
- PONTE, CROFT
- 1998
(Show Context)
Citation Context ...s how well the document \ts" the particular query q. In the simplest case, p(d) is assumed to be uniform, and so does not aect document ranking. This assumption has been taken in most existing wo=-=rk [1, 13, 12, 5, 20]-=-. In other cases, p(d) can be used to capture non-textual information, e.g., the length of a document or links in a web page, as well as other format/style features of a document. In our study, we ass... |

949 | An empirical study of smoothing techniques for language modeling
- Chen, Goodman
- 1998
(Show Context)
Citation Context ...ity to the unseen words and improve the accuracy of word probability estimation in general. There are many smoothing methods that have been proposed, mostly in the context of speech recognition tasks =-=[2]-=-. In general, all smoothing methods are trying to discount the probabilities of the words seen in the text, and to then assign the extra probability mass to the unseen words according to some \fallbac... |

706 | Estimation of probabilities from sparse data for the language model component of a speech recognizer
- Katz
- 1987
(Show Context)
Citation Context ...ained by the eciency of the smoothing method. We selected three representative methods that are popular and relatively ecient to implement. We excluded some well-known methods, such as Katz smoothing =-=[7]-=- and Good-Turing estimation [4], because of the eciency constraint 2 . Although the methods we evaluated are simple, the issues that they bring to light are relevant to more advanced methods. The thre... |

663 | Improving retrieval performance by relevance feedback
- Salton, Buckley
- 1990
(Show Context)
Citation Context ...s kinds of logic models and probabilistic models (e.g., [14, 3, 15, 22]). On the other hand, there have been many empirical studies of models, including many variants of the vector space model (e.g., =-=[17, 18, 19]-=-). In some cases, there have been theoretically motivated models that also perform well empirically; for example, the BM25 retrieval function, motivated by the 2-Poisson probabilistic retrieval model,... |

505 | Okapi at trec-3 - Robertson, Walker, et al. - 1994 |

415 |
The population frequency of species and the estimation of population parameters
- Good
- 1953
(Show Context)
Citation Context ...othing method. We selected three representative methods that are popular and relatively ecient to implement. We excluded some well-known methods, such as Katz smoothing [7] and Good-Turing estimation =-=[4]-=-, because of the eciency constraint 2 . Although the methods we evaluated are simple, the issues that they bring to light are relevant to more advanced methods. The three methods are described below. ... |

392 | Document length normalization
- Singhal, Salton, et al.
- 1996
(Show Context)
Citation Context ...s kinds of logic models and probabilistic models (e.g., [14, 3, 15, 22]). On the other hand, there have been many empirical studies of models, including many variants of the vector space model (e.g., =-=[17, 18, 19]-=-). In some cases, there have been theoretically motivated models that also perform well empirically; for example, the BM25 retrieval function, motivated by the 2-Poisson probabilistic retrieval model,... |

359 | Interpolated estimation of Markov source parameters from sparse data - Jelinek, Mercer - 1980 |

344 | Relevance-based Language Models - Lavrenko, Croft - 2001 |

324 | Document language models, query models, and risk minimization for information retrieval - Lafferty, Zhai - 2001 |

309 | Improved backing-off for m-gram language modeling - Kneser, Ney - 1995 |

285 | Information retrieval as statistical translation
- Berger, Lafferty
- 1999
(Show Context)
Citation Context ...rs or to redistribute to lists, requires prior specific permission and/or a fee. SIGIR'01, September 9-12, 2001, New Orleans, Louisiana, USA Copyright 2001 ACM 1-58113-331-6/01/0009 ...$5.00. trieval =-=[13, 1, 10, 5]-=-. The basic idea behind the new approach is extremely simple|estimate a language model for each document, and rank documents by the likelihood of the query according to the language model. Yet this ne... |

206 | A General Language Model for Information Retrieval
- Song, Croft
- 1999
(Show Context)
Citation Context ...s how well the document \ts" the particular query q. In the simplest case, p(d) is assumed to be uniform, and so does not aect document ranking. This assumption has been taken in most existing wo=-=rk [1, 13, 12, 5, 20]-=-. In other cases, p(d) can be used to capture non-textual information, e.g., the length of a document or links in a web page, as well as other format/style features of a document. In our study, we ass... |

204 |
A hidden markov model information retrieval system
- Miller, Leek, et al.
- 1999
(Show Context)
Citation Context ...rs or to redistribute to lists, requires prior specific permission and/or a fee. SIGIR'01, September 9-12, 2001, New Orleans, Louisiana, USA Copyright 2001 ACM 1-58113-331-6/01/0009 ...$5.00. trieval =-=[13, 1, 10, 5]-=-. The basic idea behind the new approach is extremely simple|estimate a language model for each document, and rank documents by the likelihood of the query according to the language model. Yet this ne... |

185 |
On structuring probabilistic dependences in stochastic language modelling
- Ney, Essen, et al.
- 1994
(Show Context)
Citation Context ...ace method is a special case of this technique. Absolute discounting. The idea of the absolute discounting method is to lower the probability of seen words by subtracting a constant from their counts =-=[1-=-1]. It is similar to the Jelinek-Mercer method, but diers in that it discounts the seen word probability by subtracting a constant instead of multiplying it by (1-). The model is given by ps(w j d) = ... |

183 | A non-classical logic for Information Retrieval - Rijsbergen - 1986 |

136 | The importance of prior probabilities for entry page search - Kraaij, Westerveld, et al. - 2002 |

114 | Probabilistic models in information retrieval
- FUHR
- 1992
(Show Context)
Citation Context ...ways. On the one hand, theoretical studies of an underlying model have been developed; this direction is, for example, represented by the various kinds of logic models and probabilistic models (e.g., =-=[14, 3, 15, 22]-=-). On the other hand, there have been many empirical studies of models, including many variants of the vector space model (e.g., [17, 18, 19]). In some cases, there have been theoretically motivated m... |

110 | Twenty-One at TREC-7: ad-hoc and crosslanguage track
- Hiemstra, Kraaij
- 1999
(Show Context)
Citation Context ...rs or to redistribute to lists, requires prior specific permission and/or a fee. SIGIR'01, September 9-12, 2001, New Orleans, Louisiana, USA Copyright 2001 ACM 1-58113-331-6/01/0009 ...$5.00. trieval =-=[13, 1, 10, 5]-=-. The basic idea behind the new approach is extremely simple|estimate a language model for each document, and rank documents by the likelihood of the query according to the language model. Yet this ne... |

100 |
On modeling information retrieval with probabilistic inference
- Wong, Yao
- 1995
(Show Context)
Citation Context ...ways. On the one hand, theoretical studies of an underlying model have been developed; this direction is, for example, represented by the various kinds of logic models and probabilistic models (e.g., =-=[14, 3, 15, 22]-=-). On the other hand, there have been many empirical studies of models, including many variants of the vector space model (e.g., [17, 18, 19]). In some cases, there have been theoretically motivated m... |

86 | A hierarchical dirichlet language model - MacKay, Peto - 1995 |

63 | Model-based feedback in the KL-divergence retrieval model - Zhai, Lafferty - 2001 |

54 |
Probabilistic models of indexing and searching
- Robertson, Rijsbergen, et al.
- 1981
(Show Context)
Citation Context ...ways. On the one hand, theoretical studies of an underlying model have been developed; this direction is, for example, represented by the various kinds of logic models and probabilistic models (e.g., =-=[14, 3, 15, 22]-=-). On the other hand, there have been many empirical studies of models, including many variants of the vector space model (e.g., [17, 18, 19]). In some cases, there have been theoretically motivated m... |

53 | On the estimation of ‘small’ probabilities by leaving-one-out - Ney, Essen, et al. - 1995 |

42 | Improving Two-Stage Ad-Hoc Retrieval for Short Queries - Kwok, Chan - 1998 |

13 |
Rijsbergen
- van
- 1979
(Show Context)
Citation Context |

8 |
Improved smoothing for mgram language modeling
- Kneser, Ney
- 1995
(Show Context)
Citation Context ...mplement the role of query modeling. Finally, there are many other eective smoothing algorithms that we have not yet tested (e.g., Good-Turing smoothing [4], Katz smoothing [7], Kneser-Ney smoothing [=-=8]-=-); evaluation of them would be a natural further research direction. It is also very important to study how to exploit the past relevance judgments, the current query, and the current database to trai... |

8 | Interpolated estimation of markov sourceparameters from sparse data - Jelinek, Mercer - 1980 |

3 | A hierarchical Dirichlet language - MACKAY, L - 1995 |

2 |
Okapi at TREC-3," The Third Text REtrieval
- Robertson, Walker, et al.
- 1995
(Show Context)
Citation Context ...ly motivated models that also perform well empirically; for example, the BM25 retrieval function, motivated by the 2-Poisson probabilistic retrieval model, has proven to be quite eective in practice [=-=16]-=-. Recently, a new approach based on language modeling has been successfully applied to the problem of ad hoc rePermission to make digital or hard copies of all or part of this work for personal or cla... |

1 | A Study of Smoothing Methods for Language Models 33 - LAVRENKO, CROFT - 2001 |

1 |
Okapi at TREC-3,” The Third Text REtrieval
- Robertson, Walker, et al.
- 1995
(Show Context)
Citation Context ...y motivated models that also perform well empirically; for example, the BM25 retrieval function, motivated by the 2-Poisson probabilistic retrieval model, has proven to be quite effective in practice =-=[16]-=-. Recently, a new approach based on language modeling has been successfully applied to the problem of ad hoc rePermission to make digital or hard copies of all or part of this work for personal or cla... |