MetaCart Sign in to MyCiteSeerX

Include Citations | Advanced Search | Help

Disambiguated Search | Include Citations | Advanced Search | Help

Statistics-Based Summarization - Step One: Sentence Compression (2000) [52 citations — 2 self]

by Kevin Knight ,  Daniel Marcu
Add To MetaCart

Abstract:

When humans produce summaries of documents, they do not simply extract sentences and concatenate them. Rather, they create new sentences that are grammatical, that cohere with one another, and that capture the most salient pieces of information in the original document. Given that large collections of text/abstract pairs are available online, it is now possible to envision algorithms that are trained to mimic this process. In this paper, we focus on sentence compression, a simpler version of this larger challenge. We aim to achieve two goals simultaneously: our compressions should be grammatical, and they should retain the most important pieces of information. These two goals can conflict. We devise both noisy-channel and decision-tree approaches to the problem, and we evaluate results against manual compressions and a simple baseline. Introduction Most of the research in automatic summarization has focused on extraction, i.e., on identifying the most important claus...

Citations

3403 C4.5: Programs for Machine Learning – Quinlan - 1993
592 A stochastic parts program and noun phrase parser for unrestricted text – Church - 1988
565 The mathematics of statistical machine translation: Parameter estimation – Brown, Pietra, et al. - 1993
498 Statistical Methods for Speech Recognition – Jelinek - 1997
240 Statistical decision-tree models for parsing – Magerman - 1995
170 Information retrieval as statistical translation – BERGER, J - 1999
140 Three generative, lexicalized models for statistical parsing – Collins - 1997
83 Information fusion in the context of multi-document summarization – Barzilay, McKeown, et al. - 1999
48 Towards multidocument summarization by reformulation: Progress and prospects – McKeown, Klavans, et al. - 1999
36 Producing Intelligent Telegraphic Text Reduction to Provide an Audio Scanning Service for the Blind – Grefenstette - 1998
35 The Decomposition of HumanWritten Summary Sentences – Jing, McKeown - 1999
34 Forest-based statistical sentence generation – Langkilde - 2000
31 Improving summaries by revising them – Mani, Gates, et al. - 1999
3 Semi-automatic captioning of TV programs, an Australian perspective – Robert-Ribes, Pfeiffer, et al. - 1999
2 Closed captioning in America: Looking beyond compliance, in – Linke-Ellis - 1999