We investigate one technique to produce a summary of an original text without requiring its full semantic interpretation, but instead relying on a model of the topic progression in the text derived from lexical chains. We present a new algorithm to compute lexical chains in a text, merging several robust knowledge sources: the WordNet thesaurus, a part-of-speech tagger and shallow parser for the ldentification of nominal groups, and a segmentation algorithm derived from (Hearst, 1994) Summarization proceeds in three steps: the original text m first segmented, lexical chains are constructed, strong chains are identified and significant sentences are extracted from the text. We present in this paper empirical results on the identification of strong chain and of significant sentences.
|
1072
|
Introduction to WordNet: An On-line Lexical Database
– Miller, Beckwith, et al.
- 1990
|
|
491
|
Rhetorical Structure Theory: Toward a functional theory of text organization
– Mann, Thompson
- 1988
|
|
340
|
The automatic creation of literature abstracts
– Luhn
- 1958
|
|
258
|
A trainable document summarizer
– Kupiec, Pedersen, et al.
- 1995
|
|
219
|
Multi-paragraph segmentation of expository text
– Hearst
- 1994
|
|
218
|
Lexical cohesion computed by thesaural relations as an indicator of the structure of text
– Morris, Hirst
- 1991
|
|
161
|
A Simple Rule-Based Part-Of-Speech Tagger
– Brill
- 1996
|
|
145
|
Constructing literature abstracts by computer: techniques and prospects
– Paice
- 1990
|
|
132
|
Lexical chains as representation of context for the detection and correction of malapropisms. Christiane Fellbaum, editor, WordNet: An electronic lexical database
– Hirst, St-Onge
- 1998
|
|
101
|
Rhetorical structure theory: Description and con-struction of texts structures
– Mann, Thompson
- 1987
|
|
97
|
From discourse structures to text summaries
– Marcu
- 1997
|
|
87
|
Automated Text Summarization in SUMMARIST
– Hovy, Lin
- 1999
|
|
80
|
Automatic text structuring and summarization
– Salton, Singhal, et al.
- 1997
|
|
80
|
Estimating upper and lower bounds on the performance of wordsense disambiguation programs
– Gale, Ward, et al.
- 1992
|
|
73
|
Generating summaries of multiple news articles
– McKeown, Klavans, et al.
- 1995
|
|
69
|
Sentence Extraction as a Classification Task
– Teufel, Moens
- 1997
|
|
67
|
Summarization Evaluation Methods: Experiments and Analysis
– Jing, Barzilay, et al.
- 1998
|
|
47
|
Intentionbased segmentation: Human reliability and correlation with linguistic cues
– Passonneau, Litman
- 1993
|
|
38
|
Patterns of Lexis in text
– Hoey
- 1991
|
|
33
|
Abstract Generation based on Rhetorical Structure Extraction
– Ono, Sumita, et al.
- 1994
|
|
24
|
What Might Be in a Summary
– Jones
- 1993
|
|
22
|
Salience-based content characterisation of text documents
– Boguraev, Christopher
- 1997
|
|
19
|
Towards the automatic recognition of anaphoric features in English text: the impersonal pronoun ’it’. Computer Speech and Language
– Paice, Husk
- 1987
|
|
17
|
A Computational Analysis of Lexical Cohesion with Applications in Information Retrieval
– Stairmand
- 1996
|
|
16
|
The Formation of Abstracts by the Selection of Sentences: Part 1: Sentence Selection by Men and
– Rath, Resnick, et al.
- 1961
|
|
7
|
Lexical Chains for Summarization
– Barzilay
- 1997
|
|
7
|
New methods in automatic extracting
– Edmunson
- 1969
|
|
5
|
What might be in summary? Information Retrieval
– Jones
- 1993
|
|
5
|
Seiji Miike. Abstract generation based on rhetorical structure extraction
– Ono, Sumita
- 1994
|
|
5
|
In An Overview of The Third Text Retreival Conference
– Harman
- 1994
|
|
4
|
New methods in automatic abstracting
– Edmunson
- 1969
|
|
2
|
Coherence and coreference
– Jerry
- 1978
|
|
2
|
Parsing, linguistic resources and semantic analysis, for abstracting and categorization
– Black
- 1994
|
|
1
|
Parsing,linguistic resources and semantic analysis, for abstracting and categorization
– Black
- 1994
|
|
1
|
An Overview of The Third Text Retreival Conference
– In
- 1994
|
|
1
|
Diane J.Litman. Intention-based segmentation: human reliability and correlation with linguistic cues
– J
- 1993
|
|
1
|
and Dragomir Radev. Generating summaries of multiple news articles
– McKeown
- 1995
|
|
1
|
Towards the automatic recogmition of anaphoric features in english text: The impersonal pronoun "it". Computer Speech and Language
– Paice, Husk
- 1991
|