Results 1 -
3 of
3
A Language Modeling Approach to Information Retrieval
, 1998
"... Models of document indexing and document retrieval have been extensively studied. The integration of these two classes of models has been the goal of several researchers but it is a very difficult problem. We argue that much of the reason for this is the lack of an adequate indexing model. This sugg ..."
Abstract
-
Cited by 684 (25 self)
- Add to MetaCart
Models of document indexing and document retrieval have been extensively studied. The integration of these two classes of models has been the goal of several researchers but it is a very difficult problem. We argue that much of the reason for this is the lack of an adequate indexing model. This suggests that perhaps a better indexing model would help solve the problem. However, we feel that making unwarranted parametric assumptions will not lead to better retrieval performance. Furthermore, making prior assumptions about the similarity of documents is not warranted either. Instead, we propose an approach to retrieval based on probabilistic language modeling. We estimate models for each document individually. Our approach to modeling is non-parametric and integrates document indexing and document retrieval into a single model. One advantage of our approach is that collection statistics which are used heuristically in many other retrieval models are an integral part of our model. We have...
Statistical Models for Text Segmentation
- Machine Learning
, 1999
"... . This paper introduces a new statistical approach to automatically partitioning text into coherent segments. The approach is based on a technique that incrementally builds an exponential model to extract features that are correlated with the presence of boundaries in labeled training text. The mod ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
. This paper introduces a new statistical approach to automatically partitioning text into coherent segments. The approach is based on a technique that incrementally builds an exponential model to extract features that are correlated with the presence of boundaries in labeled training text. The models use two classes of features: topicality features that use adaptive language models in a novel way to detect broad changes of topic, and cue-word features that detect occurrences of specific words, whichmay be domain-specific, that tend to be used near segment boundaries. Assessment of our approachonquantitative and qualitative grounds demonstrates its effectiveness in twovery different domains, Wall Street Journal news articles and television broadcast news story transcripts. Quantitative results on these domains are presented using a new probabilistically motivated error metric, whichcombines precision and recall in a natural and flexible way. This metric is used to make a quantitative ...
DEEP FOCUS -- Hydra-headed Metadata
"... After Sept. 11, authorities said information-stove-piping by intelligence agencies was one of the biggest stumbling blocks in the fight against terrorism. Now, two leading researchers discuss different approaches to merging government files, and cracking open their secrets. ..."
Abstract
- Add to MetaCart
After Sept. 11, authorities said information-stove-piping by intelligence agencies was one of the biggest stumbling blocks in the fight against terrorism. Now, two leading researchers discuss different approaches to merging government files, and cracking open their secrets.

