Results 1 - 10
of
79
Using Lexical Chains for Text Summarization
, 1997
"... We investigate one technique to produce a summary of an original text without requiring its full semantic interpretation, but instead relying on a model of the topic progression in the text derived from lexical chains. We present a new algorithm to compute lexical chains in a text, merging several r ..."
Abstract
-
Cited by 276 (7 self)
- Add to MetaCart
We investigate one technique to produce a summary of an original text without requiring its full semantic interpretation, but instead relying on a model of the topic progression in the text derived from lexical chains. We present a new algorithm to compute lexical chains in a text, merging several robust knowledge sources: the WordNet thesaurus, a part-of-speech tagger and shallow parser for the ldentification of nominal groups, and a segmentation algorithm derived from (Hearst, 1994) Summarization proceeds in three steps: the original text m first segmented, lexical chains are constructed, strong chains are identified and significant sentences are extracted from the text. We present in this paper empirical results on the identification of strong chain and of significant sentences.
Summarizing Text Documents: Sentence Selection and Evaluation Metrics
- In Research and Development in Information Retrieval
, 1999
"... Human-quality text summarization systems are difficult to design, and even more difficult to evaluate, in part because documents can differ along several dimensions, such as length, writing style and lexical usage. Nevertheless, certain cues can often help suggest the selection of sentences for incl ..."
Abstract
-
Cited by 156 (5 self)
- Add to MetaCart
Human-quality text summarization systems are difficult to design, and even more difficult to evaluate, in part because documents can differ along several dimensions, such as length, writing style and lexical usage. Nevertheless, certain cues can often help suggest the selection of sentences for inclusion in a summary. This paper presents our analysis of news-article summaries generated by sentence selection. Sentences are ranked for potential inclusion in the summary using a weighted combination of statistical and linguistic features. The statistical features were adapted from standard IR methods. The potential linguistic ones were derived from an analysis of news-wire summaries. Toevaluate these features we use a normalized version of precision-recall curves, with a baseline of random sentence selection, as well as analyze the properties of such a baseline. We illustrate our discussions with empirical results showing the importance of corpus-dependent baseline summarization standards, compression ratios and carefully crafted long queries.
The Rhetorical Parsing, Summarization, and Generation of Natural Language Texts
, 1997
"... This thesis is an inquiry into the nature of the high-level, rhetorical structure of unrestricted natural language texts, computational means to enable its derivation, and two applications (in automatic summarization and natural language generation) that follow from the ability to build such structu ..."
Abstract
-
Cited by 98 (9 self)
- Add to MetaCart
This thesis is an inquiry into the nature of the high-level, rhetorical structure of unrestricted natural language texts, computational means to enable its derivation, and two applications (in automatic summarization and natural language generation) that follow from the ability to build such structures automatically. The thesis proposes a first-order formalization of the high-level, rhetorical structure of text. The formalization assumes that text can be sequenced into elementary units; that discourse relations hold between textual units of various sizes; that some textual units are more important to the writer's purpose than others; and that trees are a good approximation of the abstract structure of text. The formalization also introduces a linguistically motivated compositionality criterion, which is shown to hold for the text structures that are valid. The thesis proposes, analyzes theoretically, and compares empirically four algorithms for determining the valid text structures of ...
Enriching very large ontologies using the WWW
- Proceedings of the ECAI 2000 workshop “Ontology Learning”
, 2000
"... . This paper explores the possibility to exploit text on the world wide web in order to enrich the concepts in existing ontologies. First, a method to retrieve documents from the WWW related to a concept is described. These document collections are used 1) to construct topic signatures (lists of ..."
Abstract
-
Cited by 83 (4 self)
- Add to MetaCart
. This paper explores the possibility to exploit text on the world wide web in order to enrich the concepts in existing ontologies. First, a method to retrieve documents from the WWW related to a concept is described. These document collections are used 1) to construct topic signatures (lists of topically related words) for each concept in WordNet, and 2) to build hierarchical clusters of the concepts (the word senses) that lexicalize a given word. The overall goal is to overcome two shortcomings of WordNet: the lack of topical links among concepts, and the proliferation of senses. Topic signatures are validated on a word sense disambiguation task with good results, which are improved when the hierarchical clusters are used. 1 INTRODUCTION Knowledge acquisition is a long-standing problem in both Artificial Intelligence and Computational Linguistics. Semantic and world knowledge acquisition pose a problem with no simple answer. Huge efforts and investments have been made to...
Generic text summarization using relevance measure and latent semantic analysis
- in Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
, 2001
"... In this paper, we propose two generic text summarization methods that create text summaries by ranking and extracting sentences from the original documents. The rst method uses standard IR methods to rank sentence relevances, while the second method uses the latent semantic analysis technique to ide ..."
Abstract
-
Cited by 72 (1 self)
- Add to MetaCart
In this paper, we propose two generic text summarization methods that create text summaries by ranking and extracting sentences from the original documents. The rst method uses standard IR methods to rank sentence relevances, while the second method uses the latent semantic analysis technique to identify semantically important sentences, for summary creations. Both methods strive to select sentences that are highly ranked and di erent from each other. This is an attempt to create a summary with a wider coverage of the document's main content and less redundancy. Performance evaluations on the two summarization methods are conducted by comparing their summarization outputs with the manual summaries generated by three independent human evaluators. The evaluations also study the in uence of di erent VSM weighting schemes on the text summarization performances. Finally, the causes of the large disparities in the evaluators ' manual summarization results are investigated, and discussions on human text summarization patterns are presented.
The Automated Acquisition of Topic Signatures for Text Summarization
- Proc. Of the COLING Conference
, 2000
"... In order to produce a good summary, one has to identify the most relevant portions of a given text. We describe in this paper a method for automatically training topic signatures{sets of related words, with associated weights, organized around head topics{and illustrate with signatures we created wi ..."
Abstract
-
Cited by 69 (7 self)
- Add to MetaCart
In order to produce a good summary, one has to identify the most relevant portions of a given text. We describe in this paper a method for automatically training topic signatures{sets of related words, with associated weights, organized around head topics{and illustrate with signatures we created with 6,194 TREC collection texts over 4 selected topics. We describe the possible integration of topic signatures with ontologies and its evaluaton on an automated text summarization system. 1 Introduction This paper describes the automated creation of what we call topic signatures, constructs that can play a central role in automated text summarization and information retrieval. Topic signatures can be used to identify the presence of a complex concept|a concept that consists of several related components in xed relationships. Restaurant-visit, for example, involves at least the concepts menu, eat, pay, and possibly waiter, and Dragon Boat Festival (in Taiwan) involves the concepts calamus...
The Automatic Construction of Large-Scale Corpora for Summarization Research
- University of California, Berkely
, 1999
"... Summarization research is notorious for its lack of adequate corpora: today, there exist only a few small collections of texts whose units have been manually annotated for textual importance. Given the cost and tediousness of the annotation process, it is very unlikely that we will ever manually ann ..."
Abstract
-
Cited by 58 (2 self)
- Add to MetaCart
Summarization research is notorious for its lack of adequate corpora: today, there exist only a few small collections of texts whose units have been manually annotated for textual importance. Given the cost and tediousness of the annotation process, it is very unlikely that we will ever manually annotate for textual importance sufficiently large corpora of texts. To circumvent this problem, we have developed an algorithm that constructs such corpora automatically. Our algorithm takes as input an hAbstract, Texti tuple and generates the corresponding Extract, i.e., the set of clauses (sentences) in the Text that were used to write the Abstract. The performance of the algorithm is shown to be close to that of humans by means of an empirical experiment. The experiment also suggests extraction strategies that could improve the performance of automatic summarization systems. 1 Introduction 1.1 Motivation All research on the automatic generation of generic abstracts assumes that the firs...
Multi-Document Summarization By Sentence Extraction
- In Proceedings of the ANLP/NAACL Workshop on Automatic Summarization
, 2000
"... This paper discusses a text extraction approach to multidocument summarization that builds on single-document summarization methods by using additional, available in-i formation about the document set as a whole and the relationships between the documents. Multi-document summarization differs from ..."
Abstract
-
Cited by 54 (0 self)
- Add to MetaCart
This paper discusses a text extraction approach to multidocument summarization that builds on single-document summarization methods by using additional, available in-i formation about the document set as a whole and the relationships between the documents. Multi-document summarization differs from single in that the issues of compression, speed, redundancy and passage selec- tion are critical in the formation of useful summaries.
Ultra-Summarization: A Statistical Approach to Generating Highly Condensed Non-Extractive Summaries
- In SIGIR99
, 1999
"... Using current extractive summarization techniques, it is impossible to produce a coherent document summary shorter than a single sentence, or to produce a summary that conforms to particular stylistic constraints. Ideally, one would prefer to understand the document, and to generate an appropriate s ..."
Abstract
-
Cited by 41 (0 self)
- Add to MetaCart
Using current extractive summarization techniques, it is impossible to produce a coherent document summary shorter than a single sentence, or to produce a summary that conforms to particular stylistic constraints. Ideally, one would prefer to understand the document, and to generate an appropriate summary directly from the results of that understanding. Absent a comprehensive natural language understanding system, an approximation must be used. This paper presents an alternative statistical model of a summarization process, which jointly applies statistical models of the term selection and term ordering process to produce brief coherent summaries in a style learned from a training corpus. 1 Introduction Summarization is one of the most important capabilities required in writing. Effective summarization, like effective writing, is neither easy nor innate; rather, it is a skill that is developed through instruction and practice [Hidi and Anderson, 1986; Hooper et al., 1994] . Generating...

