Results 1 - 10
of
11
Passage-Level Evidence in Document Retrieval
, 1994
"... The increasing lengths of documents in full-text collections encourages renewed interest in the ranking and retrieval of document passages. Past research showed that evidence from passages can improve retrieval results, but it also raised questions about how passages are defined, how they can be r ..."
Abstract
-
Cited by 179 (4 self)
- Add to MetaCart
The increasing lengths of documents in full-text collections encourages renewed interest in the ranking and retrieval of document passages. Past research showed that evidence from passages can improve retrieval results, but it also raised questions about how passages are defined, how they can be ranked efficiently, and what is their proper role in long, structured documents.
Subtopic Structuring for Full-Length Document Access
, 1993
"... We argue that the advent of large volumes of fulllength text, as opposed to short texts like abstracts and newswire, should be accompanied by corresponding new approaches to information access. Toward this end, we discuss the merits of imposing structure .on fulllength text documents; that is, ..."
Abstract
-
Cited by 169 (8 self)
- Add to MetaCart
We argue that the advent of large volumes of fulllength text, as opposed to short texts like abstracts and newswire, should be accompanied by corresponding new approaches to information access. Toward this end, we discuss the merits of imposing structure .on fulllength text documents; that is, a partition of t'he text into coherent multi-paragraph units that represent the pattern of subtopics that comprise the text. Using this structure, we can make a distinction between the main topics, which occur throughout the length of the text, and the subtopics, which are of only limited extent. We discuss why recognition of subtopic structure is important and how, to some degree of accuracy, it can be found. We describe a new way of specifying queries on full-length documents and then describe an experiment in which making use of the recognition of local st'ructure achieves better results on a typical information retrieval task than does a standard IR measure.
Automatic Query Expansion Using SMART : TREC 3
- In Proceedings of The third Text REtrieval Conference (TREC-3
"... The Smart information retrieval project emphasizes completely automatic approaches to the understanding and retrieval of large quantities of text. We continue our work in TREC 3, performing runs in the routing, ad-hoc, and foreign language environments. Our major focus is massive query expansion: ad ..."
Abstract
-
Cited by 139 (2 self)
- Add to MetaCart
The Smart information retrieval project emphasizes completely automatic approaches to the understanding and retrieval of large quantities of text. We continue our work in TREC 3, performing runs in the routing, ad-hoc, and foreign language environments. Our major focus is massive query expansion: adding from 300 to 530 terms to each query. These terms come from known relevant documents in the case of routing, and from just the top retrieved documents in the case of ad-hoc and Spanish. This approach improves effectiveness from 7% to 25% in the various experiments. Other ad-hoc work extends our investigations into combining global similarities, giving an overall indication of how a document matches a query, with local similarities identifying a smaller part of the document which matches the query. Using an overlapping text window definition of "local", we achieve a 16% improvement. Introduction For over 30 years, the Smart project at Cornell University has been interested in the analy...
Document and Passage Retrieval Based on Hidden Markov Models
- In Proceedings of the Seventeenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
, 1994
"... Introduced is a new approach to Information Retrieval developed on the bais of Hidden Markov Models (HMMs). HMMs are shown to provide a mathematically sound framework for retrieving documents--documents with predefined boundaries and also entities of information that are of arbitrary lengths and ..."
Abstract
-
Cited by 49 (2 self)
- Add to MetaCart
Introduced is a new approach to Information Retrieval developed on the bais of Hidden Markov Models (HMMs). HMMs are shown to provide a mathematically sound framework for retrieving documents--documents with predefined boundaries and also entities of information that are of arbitrary lengths and formats (passage retrieval). Our retrieval model is shown to encompass promising capabilities: First, the position of occurrences of indexing features can be used for indexing. Posi- tional information is essential, for instance, when considering phrases, negation, and the proximity of features. Second, from training collections we can derive automatically optimal weights for arbitrary features. Third, a query dependent structure can be determined for every document by segmenting the documents into passages that axe either relevant or irrelevant to the query. The theoretical analysis of our retrieval model is complemented by the results of pre]imlnaxy experiments.
Automatic Routing and Ad-hoc Retrieval Using SMART : TREC 2
- Proceedings of the Second Text REtrieval Conference (TREC-2), pages 45--56. NIST Special Publication
, 1994
"... The Smart information retrieval project emphasizes completely automatic approaches to the understanding and retrieval of large quantities of text. We continue our work in the TREC 2 environment, performing both routing and ad-hoc experiments. The ad-hoc work extends our investigations into combining ..."
Abstract
-
Cited by 42 (5 self)
- Add to MetaCart
The Smart information retrieval project emphasizes completely automatic approaches to the understanding and retrieval of large quantities of text. We continue our work in the TREC 2 environment, performing both routing and ad-hoc experiments. The ad-hoc work extends our investigations into combining global similarities, giving an overall indication of how a document matches a query, with local similarities identifying a smaller part of the document which matches the query. The performance of the ad-hoc runs is good, but it is clear we are not yet taking full advantage of the available local information. Our routing experiments use conventional relevance feedback approaches to routing, but with a much greater degree of query expansion than was done in TREC 1. The length of a query vector is increased by a factor of 5 to 10 by adding terms found in previously seen relevant documents. This approach improves effectiveness by 30--40% over the original query. Introduction For over 30 years...
Effective ranking with arbitrary passages
- Journal of the American Society for Information Science and Technology
, 2001
"... Text retrieval systems store agreat variety of documents, from abstracts, newspaper articles, and Web pages to journal articles, books, court transcripts, and legislation. Collections of diverse types of documents expose shortcomings in current approaches to ranking. Use of short fragments of docume ..."
Abstract
-
Cited by 40 (1 self)
- Add to MetaCart
Text retrieval systems store agreat variety of documents, from abstracts, newspaper articles, and Web pages to journal articles, books, court transcripts, and legislation. Collections of diverse types of documents expose shortcomings in current approaches to ranking. Use of short fragments of documents, called passages, instead of whole documents can overcome these shortcomings: passage ranking provides convenient units of text to return to the user, can avoid the difficulties of comparing documents of different length, and enables identificationofshortblocksofrelevantmaterialamong otherwise irrelevant text. In this article, we compare severalkindsofpassageinanextensiveseriesofexperiments. We introduce anew type of passage, overlapping fragments of either fixed or variable length. We show that ranking with these arbitrary passages gives substantial improvements in retrieval effectiveness over traditional document ranking schemes, particularly for queries on collections of long documents. Ranking with arbitrary passages shows consistent improvements compared to ranking with whole documents, and to ranking with previous passage types that depend on document structure or topic shifts in documents.
Efficient Passage Ranking for Document Databases
- ACM Transactions on Information Systems
, 1999
"... Queries to text collections are resolved by ranking the documents in the collection and returning the highest-scoring documents to the user. An alternative retrieval method is to rank passages, that is, short fragments of documents, a strategy that can improve effectiveness and identify relevant mat ..."
Abstract
-
Cited by 39 (5 self)
- Add to MetaCart
Queries to text collections are resolved by ranking the documents in the collection and returning the highest-scoring documents to the user. An alternative retrieval method is to rank passages, that is, short fragments of documents, a strategy that can improve effectiveness and identify relevant material in documents that are too large for users to consider as a whole. However, ranking of passages can considerably increase retrieval costs. In this paper we explore alternative query evaluation techniques, and develop new techniques for evaluating queries on passages. We show experimentally that, appropriately implemented, effective passage retrieval is practical in limited memory on a desktop machine. Compared to passage ranking with adaptations of current document ranking algorithms, our new "DO-TOS" passage ranking algorithm requires only a fraction of the resources, at the cost of a small loss of effectiveness.
Financial Information Extraction using pre-defined and user-definable Templates in the LOLITA System
- Proceedings of the Fifteenth International Conference on Computational Linguistics (COLING-92
, 1997
"... Financial operators have today access to an extremely large amount of data, both quantitative and qualitative, real-time or historical and can use this information to support their decision-making process. Quantitative data are largely processed by automatic computer programs, often based on artific ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Financial operators have today access to an extremely large amount of data, both quantitative and qualitative, real-time or historical and can use this information to support their decision-making process. Quantitative data are largely processed by automatic computer programs, often based on artificial intelligence techniques, that produce quantitative analysis, such as historical price analysis or technical analysis of price behaviour. Differently, little progress has been made in the processing of qualitative data, which mainly consists of financial news articles from financial newspapers or on-line news providers. As a result the financial market players are overloaded with qualitative information which is potentially extremely useful but, due to the lack of time, is often ignored. The goal of this work is to reduce the qualitative data-overload of the financial operators. The research involves the identification of the information in the source financial articles which is relevant ...
Combining Positive and Negative Query Feedback in Passage Retrieval
- In Proceedings of RIAO, Coupling
, 2004
"... Information Retrieval Systems aim at retrieving relevant documents according to the information needs which users express. Most Information Retrieval Systems focus on passage retrieval where the granularity of information retrieved is not the document but a smaller unit such as a sentence or passage ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Information Retrieval Systems aim at retrieving relevant documents according to the information needs which users express. Most Information Retrieval Systems focus on passage retrieval where the granularity of information retrieved is not the document but a smaller unit such as a sentence or passage. These systems try to better answer the users ’ needs by giving more importance to the most relevant document parts. This paper addresses the problem of passage retrieval as defined by the TREC novelty track, subtask 1 where the aim is retrieving relevant sentences from relevant documents. We define a new term weighting function that takes non relevancy information into account and which is based on query evidence only meaning that it does not need global parameters such as tf.idf term weights. Our method is evaluated on both the 2002 and 2003 TREC novelty collection where we show that taking into account the narrative part that describes nonrelevant documents is useful as well as is emphasising terms from topic titles. 1

