Results 1 - 10
of
18
A compositional context sensitive multi-document summarizer: exploring the factors that influence summarization
- In Proc. of SIGIR
, 2006
"... The usual approach for automatic summarization is sentence extraction, where key sentences from the input documents are selected based on a suite of features. While word frequency often is used as a feature in summarization, its impact on system performance has not been isolated. In this paper, we s ..."
Abstract
-
Cited by 18 (3 self)
- Add to MetaCart
The usual approach for automatic summarization is sentence extraction, where key sentences from the input documents are selected based on a suite of features. While word frequency often is used as a feature in summarization, its impact on system performance has not been isolated. In this paper, we study the contribution to summarization of three factors related to frequency: content word frequency, composition functions for estimating sentence importance from word frequency, and adjustment of frequency weights based on context. We carry out our analysis using datasets from the Document Understanding Conferences, studying not only the impact of these features on automatic summarizers, but also their role in human summarization. Our research shows that a frequency based summarizer can achieve performance comparable to that of state-of-the-art systems, but only with a good composition function; context sensitivity improves performance and significantly reduces repetition.
Multi-Document Summarization by Maximizing Informative Content-Words
- In Proceedings of IJCAI-07 (The 20th International Joint Conference on Artificial Intelligence
, 2007
"... We show that a simple procedure based on maximizing the number of informative content-words can produce some of the best reported results for multi-document summarization. We first assign a score to each term in the document cluster, using only frequency and position information, and then we find th ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
We show that a simple procedure based on maximizing the number of informative content-words can produce some of the best reported results for multi-document summarization. We first assign a score to each term in the document cluster, using only frequency and position information, and then we find the set of sentences in the document cluster that maximizes the sum of these scores, subject to length constraints. Our overall results are the best reported on the DUC-2004 summarization task for the ROUGE-1 score, and are the best, but not statistically significantly different from the best system in MSE-2005. Our system is also substantially simpler than the previous best system. 1
Multi-stage compaction approach to broadcast news summarisation
- in Proceedings of Eurospeech 2005
, 2005
"... This paper presents a fully automatic, multi-stage compaction approach to broadcast news summarisation, targeting transcripts from automatic speech recognition (ASR) systems. It employs a network of multi-layer perceptrons to remove incorrectly transcribed words based on confidence scores, and to se ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
This paper presents a fully automatic, multi-stage compaction approach to broadcast news summarisation, targeting transcripts from automatic speech recognition (ASR) systems. It employs a network of multi-layer perceptrons to remove incorrectly transcribed words based on confidence scores, and to select significant chunks at multiple stages based on tf.idf scores and named entity frequency. The resulting summaries are assessed using a combination of cross comprehension test and a fluency test, finally compared with an automatic evaluation scheme. The experimental results show the approach can produce summaries with good information content. 1.
Towards holistic summarization – selecting summaries, not sentences
, 2006
"... In this paper we present a novel method for automatic text summarization through text extraction, using computational semantics. The new idea is to view all the extracted text as a whole and compute a score for the total impact of the summary, instead of ranking for instance individual sentences. A ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
In this paper we present a novel method for automatic text summarization through text extraction, using computational semantics. The new idea is to view all the extracted text as a whole and compute a score for the total impact of the summary, instead of ranking for instance individual sentences. A greedy search strategy is used to search through the space of possible summaries and select the summary with the highest score of those found. The aim has been to construct a summarizer that can be quickly assembled, with the use of only a very few basic language tools, for languages that lack large amounts of structured or annotated data or advanced tools for linguistic processing. The proposed method is largely language independent, though we only evaluate it on English in this paper, using ROUGEscores on texts from among others the DUC 2004 task 2. On this task our method performs better than several of the systems evaluated there, but worse than the best systems. 1.
Resource Lean and Portable Automatic Text Summarization
, 2007
"... Today, with digitally stored information available in abundance, even for many minor languages, this information must by some means be filtered and extracted in order to avoid drowning in it. Automatic summarization is one such technique, where a computer summarizes a longer text to a shorter non-re ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Today, with digitally stored information available in abundance, even for many minor languages, this information must by some means be filtered and extracted in order to avoid drowning in it. Automatic summarization is one such technique, where a computer summarizes a longer text to a shorter non-rendundant form. Apart from the major languages of the world there are a lot of languages for which large bodies of data aimed at language technology research to a high degree are lacking. There might also not be resources available to develop such bodies of data, since it is usually time consuming and requires substantial manual labor, hence being expensive. Nevertheless, there will still be a need for automatic text summarization for these languages in order to subdue this constantly increasing amount of electronically produced text. This thesis thus sets the focus on automatic summarization of text and the evaluation of summaries using as few human resources as possible. The resources that are used should to as high extent as possible be already existing, not specifically aimed at summarization or evaluation of summaries and, preferably, created as part of natural literary processes.
Summary in Context: Searching Versus Browsing
"... The use of text summaries in information-seeking research has focused on query-based summaries. Extracting content that resembles the query alone, however, ignores the greater context of the document. Such context may be central to the purpose and meaning of the document. We developed a generic, a q ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
The use of text summaries in information-seeking research has focused on query-based summaries. Extracting content that resembles the query alone, however, ignores the greater context of the document. Such context may be central to the purpose and meaning of the document. We developed a generic, a query-based, and a hybrid summarizer, each with differing amounts of document context. The generic summarizer used a blend of discourse information and information obtained through traditional surface-level analysis. The query-based summarizer used only query-term information, and the hybrid summarizer used some discourse information along with query-term information. The validity of the generic summarizer was shown through an intrinsic evaluation using a wellestablished corpus of human-generated summaries. All three summarizers were then compared in an information-seeking experiment involving 297 subjects. Results from the information-seeking experiment showed that the generic summaries outperformed all others in the browse tasks, while the query-based and hybrid summaries outperformed the generic summary in the search tasks. Thus, the document context of generic summaries helped users browse, while such context was not helpful in search tasks. Such results are interesting given that generic summaries have not been studied in search tasks and the that majority of Internet search engines rely solely on query-based summaries.
Similarity-based Multilingual Multi-Document Summarization
- IEEE Transactions on Information Theory
, 2005
"... We present a new approach for summarizing clusters of documents on the same event, some of which are machine translations of foreign-language documents and some of which are English. Our approach to multilingual multi-document summarization uses text similarity to choose sentences from Englis ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
We present a new approach for summarizing clusters of documents on the same event, some of which are machine translations of foreign-language documents and some of which are English. Our approach to multilingual multi-document summarization uses text similarity to choose sentences from English documents based on the content of the machine translated documents. A manual evaluation shows that 68% of the sentence replacements improve the summary, and the overall summarization approach outperforms first-sentence extraction baselines in automatic ROUGEbased evaluations.
Paraphrastic sentence compression with a character-based metric: Tightening without deletion
- In Proceedings of ACL, Workshop on Monolingual Text-To-Text Generation
, 2011
"... We present a substitution-only approach to sentence compression which “tightens ” a sentence by reducing its character length. Replacing phrases with shorter paraphrases yields paraphrastic compressions as short as 60 % of the original length. In support of this task, we introduce a novel technique ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
We present a substitution-only approach to sentence compression which “tightens ” a sentence by reducing its character length. Replacing phrases with shorter paraphrases yields paraphrastic compressions as short as 60 % of the original length. In support of this task, we introduce a novel technique for re-ranking paraphrases extracted from bilingual corpora. At high compression rates1 paraphrastic compressions outperform a state-of-the-art deletion model in an oracle experiment. For further compression, deleting from oracle paraphrastic compressions preserves more meaning than deletion alone. In either setting, paraphrastic compression shows promise for surpassing deletion-only methods. 1
QARLA: A framework for the evaluation of text summarization systems
- in ACL, Ann Arbor
, 2005
"... This paper presents a probabilistic framework, QARLA, for the evaluation of text summarisation systems. The input of the framework is a set of manual (reference) summaries, a set of baseline (automatic) summaries and a set of similarity metrics between summaries. It provides i) a measure to evaluate ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This paper presents a probabilistic framework, QARLA, for the evaluation of text summarisation systems. The input of the framework is a set of manual (reference) summaries, a set of baseline (automatic) summaries and a set of similarity metrics between summaries. It provides i) a measure to evaluate the quality of any set of similarity metrics, ii) a measure to evaluate the quality of a summary using an optimal set of similarity metrics, and iii) a measure to evaluate whether the set of baseline summaries is reliable or may produce biased results. Compared to previous approaches, our framework is able to combine different metrics and evaluate the quality of a set of metrics without any a-priori weighting of their relative importance. We provide quantitative evidence about the effectiveness of the approach to improve the automatic evaluation of text summarisation systems by combining several similarity metrics. 1
Arabic/English Multi-document Summarization with CLASSY – The Past and the Future
- In A. Gelbukh (Ed.): CICLing 2008, LNCS 4919
, 2008
"... Abstract. Automatic document summarization has become increasingly important due to the quantity of written material generated worldwide. Generating good quality summaries enables users to cope with larger amounts of information. English-document summarization is a difficult task. Yet it is not suff ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. Automatic document summarization has become increasingly important due to the quantity of written material generated worldwide. Generating good quality summaries enables users to cope with larger amounts of information. English-document summarization is a difficult task. Yet it is not sufficient. Environmental, economic, and other global issues make it imperative for English speakers to understand how other countries and cultures perceive and react to important events. CLASSY (Clustering, Linguistics, And Statistics for Summarization Yield) is an automatic, extract-generating, summarization system that uses linguistic trimming and statistical methods to generate generic or topic(/query)-driven summaries for single documents or clusters of documents. CLASSY has performed well in the Document Understanding Conference (DUC) evaluations and the Multi-lingual (Arabic/English) Summarization Evaluations (MSE). We present a description of CLASSY. We follow this with experiments and results from the MSE evaluations and conclude with a discussion of on-going work to improve the quality of the summaries–both Englishonly and multi-lingual–that CLASSY generates. 1

