Results 1 - 10
of
27
Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization
- In SIGIR ’08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
, 2008
"... Multi-document summarization aims to create a compressed summary while retaining the main characteristics of the original set of documents. Many approaches use statistics and machine learning techniques to extract sentences from documents. In this paper, we propose a new multi-document summarization ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
Multi-document summarization aims to create a compressed summary while retaining the main characteristics of the original set of documents. Many approaches use statistics and machine learning techniques to extract sentences from documents. In this paper, we propose a new multi-document summarization framework based on sentence-level semantic analysis and symmetric non-negative matrix factorization. We first calculate sentence-sentence similarities using semantic analysis and construct the similarity matrix. Then symmetric matrix factorization, which has been shown to be equivalent to normalized spectral clustering, is used to group sentences into clusters. Finally, the most informative sentences are selected from each group to form the summary. Experimental results on DUC2005 and DUC2006 data sets demonstrate the improvement of our proposed framework over the implemented existing summarization systems. A further study on the factors that benefit the high performance is also conducted.
Towards an Iterative Reinforcement Approach for Simultaneous Document Summarization and Keyword Extraction
"... Though both document summarization and keyword extraction aim to extract concise representations from documents, these two tasks have usually been investigated independently. This paper proposes a novel iterative reinforcement approach to simultaneously ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
Though both document summarization and keyword extraction aim to extract concise representations from documents, these two tasks have usually been investigated independently. This paper proposes a novel iterative reinforcement approach to simultaneously
CollabSum: Exploiting Multiple Document Clustering for Collaborative Single Document Summarizations
- In Proc. of ACM SIGIR
, 2007
"... Almost all existing methods conduct the summarization tasks for single documents separately without interactions for each document under the assumption that the documents are considered independent of each other. This paper proposes a novel framework called CollabSum for collaborative single documen ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Almost all existing methods conduct the summarization tasks for single documents separately without interactions for each document under the assumption that the documents are considered independent of each other. This paper proposes a novel framework called CollabSum for collaborative single document summarizations by making use of mutual influences of multiple documents within a cluster context. In this study, CollabSum is implemented by first employing the clustering algorithm to obtain appropriate document clusters and then exploiting the graph-ranking based algorithm for collaborative document summarizations within each cluster. Both the with-document and cross-document relationships between sentences are incorporated in the algorithm. Experiments on the DUC2001 and DUC2002 datasets demonstrate the encouraging performance of the proposed approach. Different clustering algorithms have been investigated and we find that the summarization performance relies positively on the quality of document cluster. Categories and Subject Descriptors: H.3.1 [Information Storage and Retrieval]: Content Analysis
Multi-topic based query-oriented summarization
- SIAM International Conference Data Mining
, 2009
"... Query-oriented summarization aims at extracting an informative summary from a document collection for a given query. It is very useful to help users grasp the main information related to a query. Existing work can be mainly classified into two categories: supervised method and unsupervised method. T ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Query-oriented summarization aims at extracting an informative summary from a document collection for a given query. It is very useful to help users grasp the main information related to a query. Existing work can be mainly classified into two categories: supervised method and unsupervised method. The former requires training examples, which makes the method limited to predefined domains. While the latter usually utilizes clustering algorithms to find ‘centered ’ sentences as the summary. However, the method does not consider the query information, thus the summarization is general about the document collection itself. Moreover, most of existing work assumes that documents related to the query only talks about one topic. Unfortunately, statistics show that a large portion of summarization tasks talk about multiple topics. In this paper, we try to break limitations of the existing methods and study a new setup of the problem of multi-topic based query-oriented summarization. We propose using a probabilistic approach to solve this problem. More specifically, we propose two strategies to incorporate the query information into a probabilistic model. Experimental results on two different genres of data show that our proposed approach can effectively extract a multi-topic summary from a document collection and the summarization performance is better than baseline methods. The approach is quite general and can be applied to many other mining tasks, for example product opinion analysis and question answering. 1
Multi-Document Summarization using Sentence-based Topic Models
"... Most of the existing multi-document summarization methods decompose the documents into sentences and work directly in the sentence space using a term-sentence matrix. However, the knowledge on the document side, i.e. the topics embedded in the documents, can help the context understanding and guide ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Most of the existing multi-document summarization methods decompose the documents into sentences and work directly in the sentence space using a term-sentence matrix. However, the knowledge on the document side, i.e. the topics embedded in the documents, can help the context understanding and guide the sentence selection in the summarization procedure. In this paper, we propose a new Bayesian sentence-based topic model for summarization by making use of both the term-document and term-sentence associations. An efficient variational Bayesian algorithm is derived for model parameter estimation. Experimental results on benchmark data sets show the effectiveness of the proposed model for the multi-document summarization task. 1
Sentence Position revisited: A robust light-weight Update Summarization ‘baseline ’ Algorithm
"... In this paper, we describe a sentence position based summarizer that is built based on a sentence position policy, created from the evaluation testbed of recent summarization tasks at Document Understanding Conferences (DUC). We show that the summarizer thus built is able to outperform most systems ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
In this paper, we describe a sentence position based summarizer that is built based on a sentence position policy, created from the evaluation testbed of recent summarization tasks at Document Understanding Conferences (DUC). We show that the summarizer thus built is able to outperform most systems participating in task focused summarization evaluations at Text Analysis Conferences (TAC) 2008. Our experiments also show that such a method would perform better at producing short summaries (upto 100 words) than longer summaries. Further, we discuss the baselines traditionally used for summarization evaluation and suggest the revival of an old baseline to suit the current summarization task at TAC: the Update Summarization task. 1
Text summarization model based on maximum coverage problem and its variant
- In: Proc. Conf. of the European Chapter of the ACL
, 2009
"... We discuss text summarization in terms of maximum coverage problem and its variant. We explore some decoding algorithms including the ones never used in this summarization formulation, such as a greedy algorithm with performance guarantee, a randomized algorithm, and a branch-andbound method. On the ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
We discuss text summarization in terms of maximum coverage problem and its variant. We explore some decoding algorithms including the ones never used in this summarization formulation, such as a greedy algorithm with performance guarantee, a randomized algorithm, and a branch-andbound method. On the basis of the results of comparative experiments, we also augment the summarization model so that it takes into account the relevance to the document cluster. Through experiments, we showed that the augmented model is superior to the best-performing method of DUC’04 on ROUGE-1 without stopwords. 1
Graph-Based Multi-Modality Learning for Topic-Focused Multi-Document Summarization
"... Graph-based manifold-ranking methods have been successfully applied to topic-focused multi-document summarization. This paper further proposes to use the multi-modality manifold-ranking algorithm for extracting topic-focused summary from multiple documents by considering the within-document sentence ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Graph-based manifold-ranking methods have been successfully applied to topic-focused multi-document summarization. This paper further proposes to use the multi-modality manifold-ranking algorithm for extracting topic-focused summary from multiple documents by considering the within-document sentence relationships and the cross-document sentence relationships as two separate modalities (graphs). Three different fusion schemes, namely linear form, sequential form and score combination form, are exploited in the algorithm. Experimental results on the DUC benchmark datasets demonstrate the effectiveness of the proposed multi-modality learning algorithms with all the three fusion schemes. 1
Multi-Document Summarization by Information Distance
"... Abstract—We are now living in a world where information is growing and updating quickly. Knowledge can be acquired more efficiently with the help of automatic document summarization and updating techniques. This paper describes a novel approach for multi-document update summarization. The best summa ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Abstract—We are now living in a world where information is growing and updating quickly. Knowledge can be acquired more efficiently with the help of automatic document summarization and updating techniques. This paper describes a novel approach for multi-document update summarization. The best summary is defined as one of which has the minimal information distance to the entire document set. And the best update summary has the minimal conditional information distance to a document cluster given that a prior document cluster has already been read. We propose two methods to approximate information distance between two documents, one by compression and the other by the coding theory. Experiments on the DUC 2007 dataset 1 and the TAC 2008 dataset 2 have proved that our method closely correlates with the human-written summaries and outperforms LexRank in many categories under the ROUGE evaluation criterion.
Query-Focused Summaries or Query-Biased Summaries?
"... In the context of the Document Understanding Conferences, the task of Query-Focused Multi-Document Summarization is intended to improve agreement in content among humangenerated model summaries. Query-focus also aids the automated summarizers in directing the summary at specific topics, which may re ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
In the context of the Document Understanding Conferences, the task of Query-Focused Multi-Document Summarization is intended to improve agreement in content among humangenerated model summaries. Query-focus also aids the automated summarizers in directing the summary at specific topics, which may result in better agreement with these model summaries. However, while query focus correlates with performance, we show that highperforming automatic systems produce summaries with disproportionally higher query term density than human summarizers do. Experimental evidence suggests that automatic systems heavily rely on query term occurrence and repetition to achieve good performance. 1

