## A Language Modeling Approach to Information Retrieval (1998)

Citations: | 945 - 38 self |

### BibTeX

@INPROCEEDINGS{Ponte98alanguage,

author = {Jay M. Ponte and W. Bruce Croft},

title = {A Language Modeling Approach to Information Retrieval},

booktitle = {},

year = {1998},

pages = {275--281},

publisher = {}

}

### Years of Citing Articles

### OpenURL

### Abstract

Models of document indexing and document retrieval have been extensively studied. The integration of these two classes of models has been the goal of several researchers but it is a very difficult problem. We argue that much of the reason for this is the lack of an adequate indexing model. This suggests that perhaps a better indexing model would help solve the problem. However, we feel that making unwarranted parametric assumptions will not lead to better retrieval performance. Furthermore, making prior assumptions about the similarity of documents is not warranted either. Instead, we propose an approach to retrieval based on probabilistic language modeling. We estimate models for each document individually. Our approach to modeling is non-parametric and integrates document indexing and document retrieval into a single model. One advantage of our approach is that collection statistics which are used heuristically in many other retrieval models are an integral part of our model. We have...

### Citations

2560 |
Density Estimation for Statistics and Data Analysis, Chapman and Hall SCRP (2000) a) Andit Tid, Shewa; b) Anjeni, Gojam; c
- Silverman
- 1986
(Show Context)
Citation Context ... parametric assumptions, as is done in the 2-Poisson model it is assumed that terms follow a mixture of two Poisson distributions, as Silverman said, "the data will be allowed to speak for themse=-=lves [16].&quo-=-t; We feel that it is unnecessary to construct a parametric model of the data when we have the actual data. Instead, we rely on non-parametric methods. Regarding the second assumption, the 2-Poisson m... |

819 | On estimation of a probability density function and mode - Parzen - 1962 |

724 |
Automatic Text Processing
- Salton
- 1989
(Show Context)
Citation Context ...ieval task itself. The best example of this is the vector space model which allows one to talk about the task of retrieval apart from implementation details such as storage media, and data structures =-=[15]-=-. A second sense of the word 'model' is the probabilistic sense where it refers to an explanatory model of the data. This was intention behind the 2Poisson model. We add a third sense of the word when... |

696 |
Statistical Analysis of Finite Mixture Distributions
- Titterington, Smith, et al.
(Show Context)
Citation Context ... sense, this is not surprising. For large values of n one can fit a very complex distribution arbitrarily closely by a mixture of n parametric models if one has enough data to estimate the parameters =-=[18]-=-. However, what is somewhat surprising is the closeness of fit for relatively small values of n reported by Margulis [10]. Nevertheless, the n-Poisson model has not brought about increased retrieval e... |

648 |
Relevance weighting of search terms
- Robertson, Jones, et al.
- 1976
(Show Context)
Citation Context ...set of specialty words or assume a preexisting classification of documents into elite and non-elite sets. Two well known probabilistic approaches to retrieval are the Robertson and Sparck Jones model =-=[14]-=- and the Croft and Harper model [3]. Both of these models estimate the probability of relevance of each document to the query. Our approach differs in that we do not focus on relevance except to the e... |

386 | Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval
- Robertson, Walker
- 1994
(Show Context)
Citation Context ...ness. The success of the 2-Poisson model has been somewhat limited but it should be noted that Robertson's tf, which has been quite successful, was intended to behave similarly to the 2-Poisson model =-=[12]-=-. Other researchers have proposed a mixture model of more than two Poisson distributions in order to better fit the observed data. Margulis proposed the n-Poisson model and tested the idea empirically... |

174 |
Using probabilistic models of document retrieval without relevance information
- Croft, Harper
- 1979
(Show Context)
Citation Context ...reexisting classification of documents into elite and non-elite sets. Two well known probabilistic approaches to retrieval are the Robertson and Sparck Jones model [14] and the Croft and Harper model =-=[3]-=-. Both of these models estimate the probability of relevance of each document to the query. Our approach differs in that we do not focus on relevance except to the extent that the process of query pro... |

97 |
A probabilistic approach to automatic keyword indexing (part i & ii
- Harter, July-August
- 1975
(Show Context)
Citation Context ...ise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or fee. SIGIR'98, Melbourne, Australia c fl 1998 ACM 1-58113-015-5 8/98 $5.00. also to Harter =-=[7]-=-. By analogy to manual indexing, the task was to assign a subset of words contained in a document (the `specialty words') as indexing terms. The probability model was intended to indicate the useful i... |

88 | Models for retrieval with probabilistic indexing
- Fuhr
- 1989
(Show Context)
Citation Context ...the query. Our approach differs in that we do not focus on relevance except to the extent that the process of query production is correlated with it. An additional probabilistic model is that of Fuhr =-=[4]-=-. A notable feature of the Fuhr model is the integration of indexing and retrieval models. The main difference between this approach and ours is that in the Fuhr model the collection statistics are us... |

88 | Text segmentation by topic
- Ponte, Croft
- 1997
(Show Context)
Citation Context ...ototype retrieval engine known as Labrador to test our approach. This engine was originally implemented as a high throughput retrieval system in the context of our previous work on topic segmentation =-=[13]-=-. For these experiments, the system does tokenization, stopping and stemming in the usual way. We have implemented both standard tf.idf weighting as well our language modeling approach. 4.3 Recall/Pre... |

72 |
Probabilistic models for automatic indexing
- Bookstein, Swanson
- 1976
(Show Context)
Citation Context ...rd tf.idf method of retrieval. Now we take a brief look at some existing models of document indexing. We begin our discussion of indexing models with the 2-Poisson model, due to Bookstein and Swanson =-=[1]-=- and Permission to make digital/hard copy of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial adv... |

48 |
Efficient Probabilistic Inference for Text Retrieval
- Turtle, Croft
- 1991
(Show Context)
Citation Context ...approach, we are able to avoid using heuristic methods since we are not inferring concepts from terms. Another recent probabilistic approach is the INQUERY inference network model by Turtle and Croft =-=[19]-=-. Similar to the Fuhr model, Turtle and Croft integrate indexing and retrieval by making inferences of concepts from features. Features include words, phrases and more complex structured features. Evi... |

43 | An inquiry into the nature of frequency distributions representative of multiple happenings with particular reference to the occurrence of multiple attack of disease or repeated accidents - Greenwood, GU - 1920 |

38 |
A new method of weighting query terms for ad-hoc retrieval
- Kwok
- 1996
(Show Context)
Citation Context ...n section 5. In section 3 we will discuss our probability estimation procedure. One statistic that we will be using is the average probability of term occurrence. A similar statistic was used by Kwok =-=[9]-=- for a different purpose. Kwok used the unnormalized average tf to estimate the importance of a term with respect to the query. In our approach we use the average of tf normalized by document length i... |

34 | A new probabilistic model of text classification and retrieval
- Kalt
- 1996
(Show Context)
Citation Context ...e. In our approach we have been able to avoid this extra complexity and perform retrieval according to a single probabilistic model. The most similar approach to the one we have taken is that of Kalt =-=[8]-=-. In this model, documents are assumed to be generated by a stochastic process; a multinomial model. The task Kalt investigated was text classification. Each document was treated as a sample from a la... |

33 | Eficiency of Nonparametric Density Estimators - Terrell - 1984 |

22 |
A probability distribution model for information retrieval
- Wong, Yao
- 1989
(Show Context)
Citation Context ...tf to estimate the importance of a term with respect to the query. In our approach we use the average of tf normalized by document length in the estimation of the generation probability. Wong and Yao =-=[20]-=- proposed a model in which they represented documents according to a probability distribution. They then developed two separate approaches to retrieval, one based on utility theory and the other based... |

18 |
Modelling documents with multiple Poisson distributions
- MARGULIS
- 1993
(Show Context)
Citation Context ... Other researchers have proposed a mixture model of more than two Poisson distributions in order to better fit the observed data. Margulis proposed the n-Poisson model and tested the idea empirically =-=[10]-=-. The conclusion of this study was that a mixture of n-Poisson distributions provides a very close fit to the data. In a certain sense, this is not surprising. For large values of n one can fit a very... |

9 |
Construction of improved estimators in multiparameter estimation for discrete exponential families, Ann
- Ghosh, Hwang, et al.
- 1983
(Show Context)
Citation Context ...ind this formula is that as the tf gets further away from the normalized mean, the mean probability becomes riskier to use as an estimate. For a somewhat related use of the geometric distribution see =-=[5]-=-. Now we will use this risk function as a mixing parameter in our calculation ofsp(QjMd ), our estimate of the probability of producing the query for a given document model as follows: Let,sp(tjMd ) =... |

4 |
Topic detection and tracking segmentation task
- Yamron
- 1997
(Show Context)
Citation Context ...guage modeling. The phrase `language model' is used by the speech recognition community to refer to a probability distribution that captures the statistical regularities of the generation of language =-=[21]-=-. In the context of the retrieval task, we can treat the generation of queries as a random process. Generally speaking, language models for speech attempt to predict the probability of the next word i... |

1 |
Probability Functions " Handbook of Mathematical Functions
- Zelen, Severo
- 1964
(Show Context)
Citation Context ...ith different term frequencies. In order to benefit from the robustness of this estimator and to minimize the risk we will model the risk for a term t in a document d using the geometric distribution =-=[22]-=- as follows:sR t;d = ` 1:0 (1:0 + f t ) ' \Theta ` f t (1:0 + f t ) ' tf t;d where f t is the the mean term frequency of term t in documents where t occurs, i.e., pavg (t) \Theta dld . Another way to ... |