Abstract:
When humans produce summaries of documents, they do not simply extract sentences and concatenate them. Rather, they create new sentences that are grammatical, that cohere with one another, and that capture the most salient pieces of information in the original document. Given that large collections of text/abstract pairs are available online, it is now possible to envision algorithms that are trained to mimic this process. In this paper, we focus on sentence compression, a simpler version of this larger challenge. We aim to achieve two goals simultaneously: our compressions should be grammatical, and they should retain the most important pieces of information. These two goals can conflict. We devise both noisy-channel and decision-tree approaches to the problem, and we evaluate results against manual compressions and a simple baseline. Introduction Most of the research in automatic summarization has focused on extraction, i.e., on identifying the most important claus...
Citations
|
3403
|
C4.5: Programs for Machine Learning
– Quinlan
- 1993
|
|
592
|
A stochastic parts program and noun phrase parser for unrestricted text
– Church
- 1988
|
|
565
|
The mathematics of statistical machine translation: Parameter estimation
– Brown, Pietra, et al.
- 1993
|
|
498
|
Statistical Methods for Speech Recognition
– Jelinek
- 1997
|
|
240
|
Statistical decision-tree models for parsing
– Magerman
- 1995
|
|
170
|
Information retrieval as statistical translation
– BERGER, J
- 1999
|
|
140
|
Three generative, lexicalized models for statistical parsing
– Collins
- 1997
|
|
83
|
Information fusion in the context of multi-document summarization
– Barzilay, McKeown, et al.
- 1999
|
|
48
|
Towards multidocument summarization by reformulation: Progress and prospects
– McKeown, Klavans, et al.
- 1999
|
|
36
|
Producing Intelligent Telegraphic Text Reduction to Provide an Audio Scanning Service for the Blind
– Grefenstette
- 1998
|
|
35
|
The Decomposition of HumanWritten Summary Sentences
– Jing, McKeown
- 1999
|
|
34
|
Forest-based statistical sentence generation
– Langkilde
- 2000
|
|
31
|
Improving summaries by revising them
– Mani, Gates, et al.
- 1999
|
|
3
|
Semi-automatic captioning of TV programs, an Australian perspective
– Robert-Ribes, Pfeiffer, et al.
- 1999
|
|
2
|
Closed captioning in America: Looking beyond compliance, in
– Linke-Ellis
- 1999
|