Results 1 - 10
of
45
Refining Event Extraction Through Cross-document Inference
- Proc
, 2008
"... We apply the hypothesis of “One Sense Per Discourse ” (Yarowsky, 1995) to information extraction (IE), and extend the scope of “discourse” from one single document to a cluster of topically-related documents. We employ a similar approach to propagate consistent event arguments across sentences and d ..."
Abstract
-
Cited by 16 (11 self)
- Add to MetaCart
We apply the hypothesis of “One Sense Per Discourse ” (Yarowsky, 1995) to information extraction (IE), and extend the scope of “discourse” from one single document to a cluster of topically-related documents. We employ a similar approach to propagate consistent event arguments across sentences and documents. Combining global evidence from related documents with local decisions, we design a simple scheme to conduct cross-document inference for improving the ACE event extraction task 1. Without using any additional labeled data this new approach obtained 7.6% higher F-Measure in trigger labeling and 6% higher F-Measure in argument labeling over a state-of-the-art IE system which extracts events independently for each sentence. 1
Creating a test collection for citation-based IR experiments
- In Proceedings of HLT-06
, 2006
"... We present an approach to building a test collection of research papers. The approach is based on the Cranfield 2 tests but uses as its vehicle a current conference; research questions and relevance judgements of all cited papers are elicited from conference authors. The resultant test collection is ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
We present an approach to building a test collection of research papers. The approach is based on the Cranfield 2 tests but uses as its vehicle a current conference; research questions and relevance judgements of all cited papers are elicited from conference authors. The resultant test collection is different from TREC’s in that it comprises scientific articles rather than newspaper text and, thus, allows for IR experiments that include citation information. The test collection currently consists of 170 queries with relevance judgements; the document collection is the ACL Anthology. We describe properties of our queries and relevance judgements, and demonstrate the use of the test collection in an experimental setup. One potentially problematic property of our collection is that queries have a low number of relevant documents; we discuss ways of alleviating this. 1
Refining Event Extraction through Unsupervised Cross-document Inference
"... We apply the hypothesis of “One Sense Per Discourse ” (Yarowsky, 1995) to information extraction (IE), and extend the scope of “discourse ” from one single document to a cluster of topically-related documents. We employ a similar approach to propagate consistent event arguments across sentences and ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
We apply the hypothesis of “One Sense Per Discourse ” (Yarowsky, 1995) to information extraction (IE), and extend the scope of “discourse ” from one single document to a cluster of topically-related documents. We employ a similar approach to propagate consistent event arguments across sentences and documents. Combining global evidence from related documents with local decisions, we design a simple unsupervised learning scheme to conduct cross-document inference for improving the ACE event extraction task 1. Without using any additional labeled data this new approach obtained 7.6 % higher F-Measure in trigger labeling and 6 % higher F-Measure in argument labeling over a state-of-the-art IE system which extracts events independently for each sentence. 1
Statistical Sentence Extraction for Information Distillation
- in International Conference on Acoustics, Speech, and Signal Processing
, 2007
"... Information distillation aims to extract the most useful pieces of information related to a given query from massive, possibly multilingual, audio and textual document sources. One critical componentin a distillation engine is detecting sentences to be extracted from each relevant document. In this ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
Information distillation aims to extract the most useful pieces of information related to a given query from massive, possibly multilingual, audio and textual document sources. One critical componentin a distillation engine is detecting sentences to be extracted from each relevant document. In this paper, we present a statistical sentence extraction approach for distillation. Basically, we frame this task as a classi�cation problem, where each candidate sentence in documents is classi�ed as relevant to the query or not. These documents may be in textual or audio format and in a number of languages. For audio documents, we use both manual and automatic transcriptions, for non-English documents, we use automatic translations. In this work, we use AdaBoost, a discriminative classi�cation method with both lexical and semantic features. The results indicate 11%-13 % relative improvement over a baseline keyword-spotting-based approach. We also show the robustness of our method on the audio subset of the document sources using manual and automatic transcriptions. Index Terms — information distillation, information extraction, language understanding, speech processing, natural language processing
Integrating Several Annotation Layers for Statistical Information Distillation
- in Proceedings of the IEEE ASRU Workshop
, 2007
"... We present a sentence extraction algorithm for Information Distillation, a task where for a given templated query, relevant passages must be extracted from massive audio and textual document sources. For each sentence of the relevant documents (that are assumed to be known from the upstream stages) ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
We present a sentence extraction algorithm for Information Distillation, a task where for a given templated query, relevant passages must be extracted from massive audio and textual document sources. For each sentence of the relevant documents (that are assumed to be known from the upstream stages) we employ statistical classification methods to estimate the extent of its relevance to the query, whereby two aspects of relevance are taken into account: the template (type) of the query and its slots (free-text descriptions of names, organizations, topic, events and so on, around which templates are centered). The idiosyncrasy of the presented method is in the choice of features used for classification. We extract our features from charts, compilations of elements from various annotation levels, such as word transcriptions, syntactic and semantic parses, and Information Extraction annotations. In our experiments we show that this integrated approach outperforms a purely lexical baseline by as much as 30 % relative in terms of F-measure. We also investigate the algorithm’s behavior under noisy conditions, by comparing its performance on ASR output and on corresponding manual transcriptions.
The Impact of Document Level Ranking on Focused Retrieval
"... Abstract. Document retrieval techniques have proven to be competitive methods in the evaluation of focused retrieval. Although focused approaches such as XML element retrieval and passage retrieval allow for locating the relevant text within a document, using the larger context of the whole document ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
Abstract. Document retrieval techniques have proven to be competitive methods in the evaluation of focused retrieval. Although focused approaches such as XML element retrieval and passage retrieval allow for locating the relevant text within a document, using the larger context of the whole document often leads to superior document level ranking. In this paper we investigate the impact of using the document retrieval ranking in two collections used in the INEX 2008 Ad hoc and Book Tracks; the relatively short documents of the Wikipedia collection and the much longer books in the Book Track collection. We experiment with several methods of combining document and element retrieval approaches. Our findings are that 1) we can get the best of both worlds and improve upon both individual retrieval strategies by retaining the document ranking of the document retrieval approach and replacing the documents by the retrieved elements of the element retrieval approach, and 2) using document level ranking has a positive impact on focused retrieval in Wikipedia, but has more impact on the much longer books in the Book Track collection. 1
Exploiting Information Extraction Annotations for Document Retrieval
- in Distillation Tasks,” in InterSpeech
, 2007
"... Information distillation aims to extract relevant pieces of information related to a given query from massive, possibly multilingual, audio and textual document sources. In this paper, we present our approach for using information extraction annotations to augment document retrieval for distillation ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Information distillation aims to extract relevant pieces of information related to a given query from massive, possibly multilingual, audio and textual document sources. In this paper, we present our approach for using information extraction annotations to augment document retrieval for distillation. We take advantage of the fact that some of the distillation queries can be associated with annotation elements introduced for the NIST Automatic Content Extraction (ACE) task. We experimentally show that using the ACE events to constrain the document set returned by an information retrieval engine significantly improves the precision at various recall rates for two different query templates. Index Terms: information distillation, information retrieval, information extraction, document retrieval
Using Information Extraction to Improve Cross-lingual Document Retrieval
"... We present a filtering mechanism using two cross-lingual information extraction (CLIE) systems for improving document relevance of cross-lingual information retrieval (CLIR) for queries conforming to predefined templates. Experiments on retrieving Chinese documents in response to English GALE 1 arre ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
We present a filtering mechanism using two cross-lingual information extraction (CLIE) systems for improving document relevance of cross-lingual information retrieval (CLIR) for queries conforming to predefined templates. Experiments on retrieving Chinese documents in response to English GALE 1 arrest queries show that this approach can obtain a 12.7 % absolute improvement in relevance (representing a 24.8 % relative error reduction) for the top 25 retrieved documents. We also demonstrate that Chinese IE can provide a valuable supplement to English IE to enhance retrieval performance.

