Results 1 - 10
of
216
Summarizing Scientific Articles - Experiments with Relevance and Rhetorical Status
- Computational Linguistics
, 2002
"... this paper we argue that scientific articles require a different summarization strategy than, for instance, news articles. We propose a strategy which concentrates on the rhetorical status of statements in the article: Material for summaries is selected in such a way that summaries can highlight the ..."
Abstract
-
Cited by 103 (2 self)
- Add to MetaCart
this paper we argue that scientific articles require a different summarization strategy than, for instance, news articles. We propose a strategy which concentrates on the rhetorical status of statements in the article: Material for summaries is selected in such a way that summaries can highlight the new contribution of the source paper and situate it with respect to earlier work. We provide a gold standard for summaries of this kind consisting of a substantial corpus of conference articles in computational linguistics with human judgements of rhetorical status and relevance. We present several experiments measuring our judges' agreement on these annotations. We also present an algorithm which, on the basis of the annotated training material, selects content and classifies it into a fixed set of seven rhetorical categories. The output of this extraction and classification system can be viewed as a single-document summary in its own right; alternatively, it can be used to generate task-oriented and user-tailored summaries designed to give users an overview of a scientific field.
The Rhetorical Parsing, Summarization, and Generation of Natural Language Texts
, 1997
"... This thesis is an inquiry into the nature of the high-level, rhetorical structure of unrestricted natural language texts, computational means to enable its derivation, and two applications (in automatic summarization and natural language generation) that follow from the ability to build such structu ..."
Abstract
-
Cited by 98 (9 self)
- Add to MetaCart
This thesis is an inquiry into the nature of the high-level, rhetorical structure of unrestricted natural language texts, computational means to enable its derivation, and two applications (in automatic summarization and natural language generation) that follow from the ability to build such structures automatically. The thesis proposes a first-order formalization of the high-level, rhetorical structure of text. The formalization assumes that text can be sequenced into elementary units; that discourse relations hold between textual units of various sizes; that some textual units are more important to the writer's purpose than others; and that trees are a good approximation of the abstract structure of text. The formalization also introduces a linguistically motivated compositionality criterion, which is shown to hold for the text structures that are valid. The thesis proposes, analyzes theoretically, and compares empirically four algorithms for determining the valid text structures of ...
A Critique and Improvement of an Evaluation Metric for Text Segmentation
- COMPUTATIONAL LINGUISTICS
, 2002
"... ..."
Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory
- CURRENT DIRECTIONS IN DISCOURSE AND DIALOGUE
, 2001
"... We describe our experience in developing a discourse-annotated corpus for community-wide use. Working in ..."
Abstract
-
Cited by 71 (2 self)
- Add to MetaCart
We describe our experience in developing a discourse-annotated corpus for community-wide use. Working in
The Automated Acquisition of Topic Signatures for Text Summarization
- Proc. Of the COLING Conference
, 2000
"... In order to produce a good summary, one has to identify the most relevant portions of a given text. We describe in this paper a method for automatically training topic signatures{sets of related words, with associated weights, organized around head topics{and illustrate with signatures we created wi ..."
Abstract
-
Cited by 69 (7 self)
- Add to MetaCart
In order to produce a good summary, one has to identify the most relevant portions of a given text. We describe in this paper a method for automatically training topic signatures{sets of related words, with associated weights, organized around head topics{and illustrate with signatures we created with 6,194 TREC collection texts over 4 selected topics. We describe the possible integration of topic signatures with ontologies and its evaluaton on an automated text summarization system. 1 Introduction This paper describes the automated creation of what we call topic signatures, constructs that can play a central role in automated text summarization and information retrieval. Topic signatures can be used to identify the presence of a complex concept|a concept that consists of several related components in xed relationships. Restaurant-visit, for example, involves at least the concepts menu, eat, pay, and possibly waiter, and Dragon Boat Festival (in Taiwan) involves the concepts calamus...
The Automatic Construction of Large-Scale Corpora for Summarization Research
- University of California, Berkely
, 1999
"... Summarization research is notorious for its lack of adequate corpora: today, there exist only a few small collections of texts whose units have been manually annotated for textual importance. Given the cost and tediousness of the annotation process, it is very unlikely that we will ever manually ann ..."
Abstract
-
Cited by 58 (2 self)
- Add to MetaCart
Summarization research is notorious for its lack of adequate corpora: today, there exist only a few small collections of texts whose units have been manually annotated for textual importance. Given the cost and tediousness of the annotation process, it is very unlikely that we will ever manually annotate for textual importance sufficiently large corpora of texts. To circumvent this problem, we have developed an algorithm that constructs such corpora automatically. Our algorithm takes as input an hAbstract, Texti tuple and generates the corresponding Extract, i.e., the set of clauses (sentences) in the Text that were used to write the Abstract. The performance of the algorithm is shown to be close to that of humans by means of an empirical experiment. The experiment also suggests extraction strategies that could improve the performance of automatic summarization systems. 1 Introduction 1.1 Motivation All research on the automatic generation of generic abstracts assumes that the firs...
Multi-Document Summarization By Sentence Extraction
- In Proceedings of the ANLP/NAACL Workshop on Automatic Summarization
, 2000
"... This paper discusses a text extraction approach to multidocument summarization that builds on single-document summarization methods by using additional, available in-i formation about the document set as a whole and the relationships between the documents. Multi-document summarization differs from ..."
Abstract
-
Cited by 54 (0 self)
- Add to MetaCart
This paper discusses a text extraction approach to multidocument summarization that builds on single-document summarization methods by using additional, available in-i formation about the document set as a whole and the relationships between the documents. Multi-document summarization differs from single in that the issues of compression, speed, redundancy and passage selec- tion are critical in the formation of useful summaries.
Inter-Coder Agreement for Computational Linguistics
- COMPUTATIONAL LINGUISTICS
, 2008
"... This article is a survey of methods for measuring agreement among corpus annotators. It exposes the mathematics and underlying assumptions of agreement coefficients, covering Krippendorff’s alpha as well as Scott’s pi and Cohen’s kappa; discusses the use of coefficients in several annotation tasks; ..."
Abstract
-
Cited by 54 (1 self)
- Add to MetaCart
This article is a survey of methods for measuring agreement among corpus annotators. It exposes the mathematics and underlying assumptions of agreement coefficients, covering Krippendorff’s alpha as well as Scott’s pi and Cohen’s kappa; discusses the use of coefficients in several annotation tasks; and argues that weighted, alpha-like coefficients, traditionally less used than kappa-like measures in Computational Linguistics, may be more appropriate for many corpus annotation tasks – but that their use makes the interpretation of the value of the coefficient even harder.
Advances in automatic meeting record creation and access
- in Proc. IEEE ICASSP
, 2001
"... Oral communication is transient but many important decisions, so-cial contracts and fact 'ndings are 'rst canied out in an oral setup, documented in written form and later retrieved. At Carnegie Mel-lons University s Interactive Systems Laboratories we have been experimenting with the documentation ..."
Abstract
-
Cited by 52 (6 self)
- Add to MetaCart
Oral communication is transient but many important decisions, so-cial contracts and fact 'ndings are 'rst canied out in an oral setup, documented in written form and later retrieved. At Carnegie Mel-lons University s Interactive Systems Laboratories we have been experimenting with the documentation of meetings. Ths paper summarizes part of the progress that we have made in this test bed, speci'cally on the question of automatic transcription us-ing LVCSR, information access using non-keyword based meth-ods, summarization and user interfaces. The system is capable to automatically construct a searchable and browsable audiovisual database of meetings and provide access to these records. 1.
Discourse Trees Are Good Indicators of Importance in Text
- Advances in Automatic Text Summarization
, 1999
"... Researchers in computational linguistics have long speculated that the nuclei of the rhetorical structure tree of a text form an adequate "summary" of the text for which that tree was built. However, to my knowledge, there has been no experiment to confirm how valid this speculation really is. ..."
Abstract
-
Cited by 50 (6 self)
- Add to MetaCart
Researchers in computational linguistics have long speculated that the nuclei of the rhetorical structure tree of a text form an adequate "summary" of the text for which that tree was built. However, to my knowledge, there has been no experiment to confirm how valid this speculation really is. In this paper, I describe a psycholinguistic experiment that shows that the concepts of discourse structure and nuclearity can be used effectively in text summarization. More precisely, I show that there is a strong correlation between the nuclei of the discourse structure of a text and what readers perceive to be the most important units in that text. In addition, I propose and evaluate the quality of an automatic, discourse-based summarization system that implements the methods that were validated by the psycholinguistic experiment. The evaluation indicates that although the system does not match yet the results that would be obtained if discourse trees had been built manuall...

