Results 1 -
9 of
9
A compositional context sensitive multi-document summarizer: exploring the factors that influence summarization
- In Proc. of SIGIR
, 2006
"... The usual approach for automatic summarization is sentence extraction, where key sentences from the input documents are selected based on a suite of features. While word frequency often is used as a feature in summarization, its impact on system performance has not been isolated. In this paper, we s ..."
Abstract
-
Cited by 18 (3 self)
- Add to MetaCart
The usual approach for automatic summarization is sentence extraction, where key sentences from the input documents are selected based on a suite of features. While word frequency often is used as a feature in summarization, its impact on system performance has not been isolated. In this paper, we study the contribution to summarization of three factors related to frequency: content word frequency, composition functions for estimating sentence importance from word frequency, and adjustment of frequency weights based on context. We carry out our analysis using datasets from the Document Understanding Conferences, studying not only the impact of these features on automatic summarizers, but also their role in human summarization. Our research shows that a frequency based summarizer can achieve performance comparable to that of state-of-the-art systems, but only with a good composition function; context sensitivity improves performance and significantly reduces repetition.
Multiple alternative sentence compressions for automatic text summarization
- In Proceedings of the 2007 Document Understanding Conference (DUC-2007) at NLT/NAACL 2007
, 2007
"... We perform multi-document summarization by generating compressed versions of source sentences as summary candidates and using weighted features of these candidates to construct summaries. We combine a parse-and-trim approach with a novel technique for producing multiple alternative compressions for ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
We perform multi-document summarization by generating compressed versions of source sentences as summary candidates and using weighted features of these candidates to construct summaries. We combine a parse-and-trim approach with a novel technique for producing multiple alternative compressions for source sentences. In addition, we use a novel method for tuning the feature weights that maximizes the change in the ROUGE-2 score (∆ROUGE) between the already existing summary state and the new state that results from the addition of the candidate under consideration. We also describe experiments using a new paraphrase-based feature for redundancy checking. Finally, we present the results of our DUC2007 submissions and some ideas for future work. 1
Query Independent Sentence Scoring approach to DUC 2006
- In Document Understanding Conference
, 2006
"... 1 ..."
Multi-Document Summarization by Maximizing Informative Content-Words
- In Proceedings of IJCAI-07 (The 20th International Joint Conference on Artificial Intelligence
, 2007
"... We show that a simple procedure based on maximizing the number of informative content-words can produce some of the best reported results for multi-document summarization. We first assign a score to each term in the document cluster, using only frequency and position information, and then we find th ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
We show that a simple procedure based on maximizing the number of informative content-words can produce some of the best reported results for multi-document summarization. We first assign a score to each term in the document cluster, using only frequency and position information, and then we find the set of sentences in the document cluster that maximizes the sum of these scores, subject to length constraints. Our overall results are the best reported on the DUC-2004 summarization task for the ROUGE-1 score, and are the best, but not statistically significantly different from the best system in MSE-2005. Our system is also substantially simpler than the previous best system. 1
Enhancing Single-document Summarization by Combining RankNet and Third-party Sources
"... We present a new approach to automatic summarization based on neural nets, called NetSum. We extract a set of features from each sentence that helps identify its importance in the document. We apply novel features based on news search query logs and Wikipedia entities. Using the RankNet learning alg ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
We present a new approach to automatic summarization based on neural nets, called NetSum. We extract a set of features from each sentence that helps identify its importance in the document. We apply novel features based on news search query logs and Wikipedia entities. Using the RankNet learning algorithm, we train a pair-based sentence ranker to score every sentence in the document and identify the most important sentences. We apply our system to documents gathered from CNN.com, where each document includes highlights and an article. Our system significantly outperforms the standard baseline in the ROUGE-1 measure on over 70 % of our document set. 1
ABSTRACT Long-Answer Question Answering and Rhetorical-Semantic Relations
, 2007
"... Over the past decade, Question Answering (QA) has generated considerable interest and participation in the fields of Natural Language Processing and Information Retrieval. Conferences such as TREC, CLEF and DUC have examined various aspects of the QA task in the academic community. In the commercial ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Over the past decade, Question Answering (QA) has generated considerable interest and participation in the fields of Natural Language Processing and Information Retrieval. Conferences such as TREC, CLEF and DUC have examined various aspects of the QA task in the academic community. In the commercial world, major search engines from Google, Microsoft and Yahoo have integrated basic QA capabilities into their core web search. These efforts have focused largely on so-called “factoid ” questions seeking a single fact, such as the birthdate of an individual or the capital city of a country. Yet in the past few years, there has been growing recognition of a broad class of “long-answer ” questions which cannot be satisfactorily answered in this framework, such as those seeking a definition, explanation, or other descriptive information in response. In this thesis, we consider the problem of answering such questions, with particular focus on the contribution to be made by integrating rhetorical and semantic models. We present DefScriber, a system for answering definitional (“What is X?”), biographi-cal (“Who is X?”) and other long-answer questions using a hybrid of goal- and data-driven methods. Our goal-driven, or top-down, approach is motivated by a set of definitional pred-
Using dependency-based . . .
, 2006
"... As research in text-to-text paraphrase generation progresses, it has the potential to improve the quality of generated text. However, the use of paraphrase generation methods creates a secondary problem. We must ensure that generated novel sentences are not inconsistent with the text from which it w ..."
Abstract
- Add to MetaCart
As research in text-to-text paraphrase generation progresses, it has the potential to improve the quality of generated text. However, the use of paraphrase generation methods creates a secondary problem. We must ensure that generated novel sentences are not inconsistent with the text from which it was generated. We propose a machine learning approach be used to filter out inconsistent novel sentences, or False Paraphrases. To train such a filter, we use the Microsoft Research Paraphrase corpus and investigate whether features based on syntactic dependencies can aid us in this task. Like Finch et al. (2005), we obtain a classification accuracy of 75.6%, the best known performance for this corpus. We also examine the strengths and weaknesses of dependency based features and conclude that they may be useful in more accurately classifying cases of False Paraphrase.
Capturing Sentence Prior for Query-Based Multi-Document Summarization
"... In this paper, we have considered a real world information synthesis task, generation of a fixed length multi document summary which satisfies a specific information need. This task was mapped to a topic-oriented, informative multi-document summarization. We also tried to estimate, given the human w ..."
Abstract
- Add to MetaCart
In this paper, we have considered a real world information synthesis task, generation of a fixed length multi document summary which satisfies a specific information need. This task was mapped to a topic-oriented, informative multi-document summarization. We also tried to estimate, given the human written reference summaries and the document set, the maximum performance (ROUGE 1 scores) that can be achieved by an extraction-based summarization technique. Motivated by the observation that the current approaches are far behind the estimated maximum performance, we have looked at Information Retrieval techniques to improve the relevance scoring of sentences towards information need. Following information theoretic approach we have identified a measure to capture the notion of importance or prior of a sentence. Following a different decomposition of Probability Ranking Principle, the calculated importance/prior is incorporated into the final sentence scoring by weighted linear combination. In order to evaluate the performance, we have explored information sources like WWW and encyclopedia in computing the information measure in a set of different experiments. The t-test analysis of the improvement on DUC 2 2005 data set is found to be significant (p ∼ 0.05). The same system has outperformed rest of the systems at DUC 2006 challenge in terms of ROUGE scores with a significant margin over the next best system. 1

