Results 1 - 10
of
15
Passage-Level Evidence in Document Retrieval
, 1994
"... The increasing lengths of documents in full-text collections encourages renewed interest in the ranking and retrieval of document passages. Past research showed that evidence from passages can improve retrieval results, but it also raised questions about how passages are defined, how they can be r ..."
Abstract
-
Cited by 179 (4 self)
- Add to MetaCart
The increasing lengths of documents in full-text collections encourages renewed interest in the ranking and retrieval of document passages. Past research showed that evidence from passages can improve retrieval results, but it also raised questions about how passages are defined, how they can be ranked efficiently, and what is their proper role in long, structured documents.
Subtopic Structuring for Full-Length Document Access
, 1993
"... We argue that the advent of large volumes of fulllength text, as opposed to short texts like abstracts and newswire, should be accompanied by corresponding new approaches to information access. Toward this end, we discuss the merits of imposing structure .on fulllength text documents; that is, ..."
Abstract
-
Cited by 169 (8 self)
- Add to MetaCart
We argue that the advent of large volumes of fulllength text, as opposed to short texts like abstracts and newswire, should be accompanied by corresponding new approaches to information access. Toward this end, we discuss the merits of imposing structure .on fulllength text documents; that is, a partition of t'he text into coherent multi-paragraph units that represent the pattern of subtopics that comprise the text. Using this structure, we can make a distinction between the main topics, which occur throughout the length of the text, and the subtopics, which are of only limited extent. We discuss why recognition of subtopic structure is important and how, to some degree of accuracy, it can be found. We describe a new way of specifying queries on full-length documents and then describe an experiment in which making use of the recognition of local st'ructure achieves better results on a typical information retrieval task than does a standard IR measure.
Automatic Query Expansion Using SMART : TREC 3
- In Proceedings of The third Text REtrieval Conference (TREC-3
"... The Smart information retrieval project emphasizes completely automatic approaches to the understanding and retrieval of large quantities of text. We continue our work in TREC 3, performing runs in the routing, ad-hoc, and foreign language environments. Our major focus is massive query expansion: ad ..."
Abstract
-
Cited by 139 (2 self)
- Add to MetaCart
The Smart information retrieval project emphasizes completely automatic approaches to the understanding and retrieval of large quantities of text. We continue our work in TREC 3, performing runs in the routing, ad-hoc, and foreign language environments. Our major focus is massive query expansion: adding from 300 to 530 terms to each query. These terms come from known relevant documents in the case of routing, and from just the top retrieved documents in the case of ad-hoc and Spanish. This approach improves effectiveness from 7% to 25% in the various experiments. Other ad-hoc work extends our investigations into combining global similarities, giving an overall indication of how a document matches a query, with local similarities identifying a smaller part of the document which matches the query. Using an overlapping text window definition of "local", we achieve a 16% improvement. Introduction For over 30 years, the Smart project at Cornell University has been interested in the analy...
Cognitive perspectives of information retrieval interaction: elements of a cognitive IR theory
- Journal of Documentation
, 1996
"... The objective of the paper is to amalgamate theories of text retrieval from various research traditions into a cognitive theory for information retrieval interaction. Set in a cognitive framework, the paper outlines the concept of polyrepresentation applied to both the user's cognitive space and the ..."
Abstract
-
Cited by 96 (7 self)
- Add to MetaCart
The objective of the paper is to amalgamate theories of text retrieval from various research traditions into a cognitive theory for information retrieval interaction. Set in a cognitive framework, the paper outlines the concept of polyrepresentation applied to both the user's cognitive space and the information space of IR systems. The concept seeks to represent the current user's information need, problem state, and domain work task or interest in a structure of causality. Further, it implies that we should apply different methods of representation and a variety of IR techniques of different cognitive and functional origin simultaneously to each semantic full-text entity in the information space. The cognitive differences imply that by applying cognitive overlaps of information objects, originating from different interpretations of such objects through time and by type, the degree of uncertainty inherent in IR is decreased. Polyrepresentation and the use of cognitive overlaps are associated with, but not identical to, data
MURAX: A Robust Linguistic Approach For Question Answering Using An On-Line Encyclopedia
, 1993
"... Robust linguistic methods are applied to the task of answering closed-class questions using a corpus of natural language. The methods are illustrated in a broad do- main: answering general-knowledge questions using an on-line encyclopedia. ..."
Abstract
-
Cited by 76 (1 self)
- Add to MetaCart
Robust linguistic methods are applied to the task of answering closed-class questions using a corpus of natural language. The methods are illustrated in a broad do- main: answering general-knowledge questions using an on-line encyclopedia.
Automatic Routing and Ad-hoc Retrieval Using SMART : TREC 2
- Proceedings of the Second Text REtrieval Conference (TREC-2), pages 45--56. NIST Special Publication
, 1994
"... The Smart information retrieval project emphasizes completely automatic approaches to the understanding and retrieval of large quantities of text. We continue our work in the TREC 2 environment, performing both routing and ad-hoc experiments. The ad-hoc work extends our investigations into combining ..."
Abstract
-
Cited by 42 (5 self)
- Add to MetaCart
The Smart information retrieval project emphasizes completely automatic approaches to the understanding and retrieval of large quantities of text. We continue our work in the TREC 2 environment, performing both routing and ad-hoc experiments. The ad-hoc work extends our investigations into combining global similarities, giving an overall indication of how a document matches a query, with local similarities identifying a smaller part of the document which matches the query. The performance of the ad-hoc runs is good, but it is clear we are not yet taking full advantage of the available local information. Our routing experiments use conventional relevance feedback approaches to routing, but with a much greater degree of query expansion than was done in TREC 1. The length of a query vector is increased by a factor of 5 to 10 by adding terms found in previously seen relevant documents. This approach improves effectiveness by 30--40% over the original query. Introduction For over 30 years...
Relevance Feedback With Too Much Data
, 1995
"... Modern text collections often contain large documents which span several subject areas. Such documents are problematic for relevance feedback since inappropriate terms can easily be chosen. This study explores the highly effective approach of feeding back passages of large documents. A less-expen ..."
Abstract
-
Cited by 34 (3 self)
- Add to MetaCart
Modern text collections often contain large documents which span several subject areas. Such documents are problematic for relevance feedback since inappropriate terms can easily be chosen. This study explores the highly effective approach of feeding back passages of large documents. A less-expensive method which discards long documents is also reviewed and found to be effective if there are enough relevant documents. A hybrid approach which feeds back short documents and passages of long documents may be the best compromise. 1 1 Introduction As the amount of on-line text has increased, so has the size of individual documents in those collections. Information retrieval methods that could easily be applied to the full text of abstracts or short documents are sometimes less effective or prohibitively expensive for large documents. This problem has led to a resurgence of interest in techniques for handling large texts, including passage retrieval, theme identification, document su...
Automatic Text Decomposition and Structuring
, 1994
"... Sophisticated text similarity measurements are used to determine relationships between naturallanguage texts and text segments. The resulting linked hypertext maps are used to identify different text types and text structures, leading to improved text access and utilization. Examples of text decompo ..."
Abstract
-
Cited by 29 (0 self)
- Add to MetaCart
Sophisticated text similarity measurements are used to determine relationships between naturallanguage texts and text segments. The resulting linked hypertext maps are used to identify different text types and text structures, leading to improved text access and utilization. Examples of text decomposition are given for expository and non-expository texts. 1 Automatic Text Comparison Methods The vector processing model of retrieval has been used with substantial success to manipulate large collections of natural-language text. In vector processing, texts or text excerpts, as well as requests for information, are represented by sets of terms, or term vectors. Collectively the terms assigned to a given text are used to represent text content. Substantially identical methods are usable for determining collection structure (by comparing pairs of text vectors with each other and identifying text pairs found to be sufficiently similar), and for retrieving information (by comparing query vecto...
Information genealogy: Uncovering the flow of ideas in non-hyperlinked document databases
- In Knowledge Discovery and Data Mining (KDD) Conference
, 2007
"... We now have incrementally-grown databases of text documents ranging back for over a decade in areas ranging from personal email, to news-articles and conference proceedings. While accessing individual documents is easy, methods for overviewing and understanding these collections as a whole are lacki ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
We now have incrementally-grown databases of text documents ranging back for over a decade in areas ranging from personal email, to news-articles and conference proceedings. While accessing individual documents is easy, methods for overviewing and understanding these collections as a whole are lacking in number and in scope. In this paper, we address one such global analysis task, namely the problem of automatically uncovering how ideas spread through the collection over time. We refer to this problem as Information Genealogy. In contrast to bibliometric methods that are limited to collections with explicit citation structure, we investigate content-based methods requiring only the text and timestamps of the documents. In particular, we propose a language-modeling approach and a likelihood ratio test to detect influence between documents in a statistically wellfounded way. Furthermore, we show how this method can be used to infer citation graphs and to identify the most influential documents in the collection. Experiments on the NIPS conference proceedings and the Physics ArXiv show that our method is more effective than methods based on

