Results 1 - 10
of
29
Lexrank: Graph-based lexical centrality as salience in text summarization
- Journal of Artificial Intelligence Research
, 2004
"... We introduce a stochastic graph-based method for computing relative importance of textual units for Natural Language Processing. We test the technique on the problem of Text Summarization (TS). Extractive TS relies on the concept of sentence salience to identify the most important sentences in a doc ..."
Abstract
-
Cited by 81 (5 self)
- Add to MetaCart
We introduce a stochastic graph-based method for computing relative importance of textual units for Natural Language Processing. We test the technique on the problem of Text Summarization (TS). Extractive TS relies on the concept of sentence salience to identify the most important sentences in a document or set of documents. Salience is typically defined in terms of the presence of particular important words or in terms of similarity to a centroid pseudo-sentence. We consider a new approach, LexRank, for computing sentence importance based on the concept of eigenvector centrality in a graph representation of sentences. In this model, a connectivity matrix based on intra-sentence cosine similarity is used as the adjacency matrix of the graph representation of sentences. Our system, based on LexRank ranked in first place in more than one task in the recent DUC 2004 evaluation. In this paper we present a detailed analysis of our approach and apply it to a larger data set including data from earlier DUC evaluations. We discuss several methods to compute centrality using the similarity graph. The results show that degree-based methods (including LexRank) outperform both centroid-based methods and other systems participating in DUC in most of the cases. Furthermore, the LexRank with threshold method outperforms the other degree-based techniques including continuous LexRank. We also show that our approach is quite insensitive to the noise in the data that may result from an imperfect topical clustering of documents. 1.
Lexrank: graph-based centrality as salience in text summarization
- Journal of Artificial Intelligence Research (JAIR
, 2004
"... We introduce a stochastic graph-based method for computing relative importance of textual units for Natural Language Processing. We test the technique on the problem of Text Summarization (TS). Extractive TS relies on the concept of sentence salience to identify the most important sentences in a doc ..."
Abstract
-
Cited by 52 (6 self)
- Add to MetaCart
We introduce a stochastic graph-based method for computing relative importance of textual units for Natural Language Processing. We test the technique on the problem of Text Summarization (TS). Extractive TS relies on the concept of sentence salience to identify the most important sentences in a document or set of documents. Salience is typically defined in terms of the presence of particular important words or in terms of similarity to a centroid pseudo-sentence. We consider a new approach, LexRank, for computing sentence importance based on the concept of eigenvector centrality in a graph representation of sentences. In this model, a connectivity matrix based on intra-sentence cosine similarity is used as the adjacency matrix of the graph representation of sentences. Our system, based on LexRank ranked in first place in more than one task in the recent DUC 2004 evaluation. In this paper we present a detailed analysis of our approach and apply it to a larger data set including data from earlier DUC evaluations. We discuss several methods to compute centrality using the similarity graph. The results show that degree-based methods (including LexRank) outperform both centroid-based methods and other systems participating in DUC in most of the cases. Furthermore, the LexRank with threshold method outperforms the other degree-based techniques including continuous LexRank. We also show that our approach is quite insensitive to the noise in the data that may result from an imperfect topical clustering of documents. 1.
Newsinessence: summarizing online news topics
- Communications of the ACM
, 2005
"... These days, Internet users are frequently turning to the Web for news rather than going to traditional sources such as newspapers or television. This trend is likely to continue, according to a recent Forrester report [1], which found that people who have been using the Web for several years are mor ..."
Abstract
-
Cited by 19 (3 self)
- Add to MetaCart
These days, Internet users are frequently turning to the Web for news rather than going to traditional sources such as newspapers or television. This trend is likely to continue, according to a recent Forrester report [1], which found that people who have been using the Web for several years are more likely to cut back on their reading of print newspapers than people with less Internet experience. Already, the New York Times ’ online news source
Customization in a Unified Framework for Summarizing Medical Literature
, 2005
"... Objectives: We present the summarization system in the PERSIVAL medical digital library. Although we discuss the context of our summarization research within the PERSIVAL platform, the primary focus of this article is on strategies to define and generate customized summaries. ..."
Abstract
-
Cited by 16 (2 self)
- Add to MetaCart
Objectives: We present the summarization system in the PERSIVAL medical digital library. Although we discuss the context of our summarization research within the PERSIVAL platform, the primary focus of this article is on strategies to define and generate customized summaries.
Summarization System Evaluation Revisited: N-Gram Graphs
- ACM TRANSACTIONS ON SPEECH AND LANGUAGE PROCESSING
, 2008
"... This article presents a novel automatic method (AutoSummENG) for the evaluation of summarization systems, based on comparing the character n-gram graphs representation of the extracted summaries and a number of model summaries. The presented approach is language neutral, due to its statistical natur ..."
Abstract
-
Cited by 15 (11 self)
- Add to MetaCart
This article presents a novel automatic method (AutoSummENG) for the evaluation of summarization systems, based on comparing the character n-gram graphs representation of the extracted summaries and a number of model summaries. The presented approach is language neutral, due to its statistical nature, and appears to hold a level of evaluation performance that matches and even exceeds other contemporary evaluation methods. Within this study, we measure the effectiveness of different representation methods, namely, word and character n-gram graph and histogram, different n-gram neighborhood indication methods as well as different comparison methods between the supplied representations. A theory for the a priori determination of the methods ’ parameters along with supporting experiments concludes the study to provide a complete alternative to existing methods concerning the automatic summary system evaluation process.
Newsinessence: A system for domain-independent, real-time news clustering and multi-document summarization
- In Proceedings of the Human Language Technology Conference (HLT-01
, 2001
"... NEWSINESSENCE is a system for finding, visualizing and summarizing a topic-based cluster of news stories. In the generic scenario for NEWSINESSENCE, a user selects a single news story from a news Web site. Our system then searches other live sources of ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
NEWSINESSENCE is a system for finding, visualizing and summarizing a topic-based cluster of news stories. In the generic scenario for NEWSINESSENCE, a user selects a single news story from a news Web site. Our system then searches other live sources of
Abstraction Summarization for Managing the Biomedical Research Literature
- IN PROCEEDINGS OF THE HLT/NAACL 2004 WORKSHOP ON COMPUTATIONAL LEXICAL SEMANTICS
, 2004
"... We explore a semantic abstraction approach to automatic summarization in the biomedical domain. The approach relies on a semantic processor that functions as the source interpreter and produces a list of predications. A transformation stage then generalizes and condenses this list, ultimately ..."
Abstract
-
Cited by 13 (6 self)
- Add to MetaCart
We explore a semantic abstraction approach to automatic summarization in the biomedical domain. The approach relies on a semantic processor that functions as the source interpreter and produces a list of predications. A transformation stage then generalizes and condenses this list, ultimately generating a conceptual condensate for a disorder input topic. The final condensate is displayed in graphical form. We provide a set of principles for the transformation stage and describe the application of this approach to multidocument input. Finally, we examine the characteristics and quality of the condensates produced.
Revisions that Improve Cohesion in Multi-document Summaries: A Preliminary Study
- In Proceedings of the Workshop on Automatic Summarization (including DUC 2002). Philadelphia: Association for Computational Linguistics
, 2002
"... Extractive summaries produced from multiple source documents suffer from an array of problems with respect to text cohesion. In this preliminary study, we seek to understand what problems occur in such summaries and how often. ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
Extractive summaries produced from multiple source documents suffer from an array of problems with respect to text cohesion. In this preliminary study, we seek to understand what problems occur in such summaries and how often.
Learning Cross-document Structural Relationships using Boosting
- IN PROCEEDINGS OF IJC-NLP 2004, HAINAN ISLAND
, 2004
"... Multi-document discoure analysis has emerged with the potential of improving various information retrieval applications. Based on the newly proposed Cross-document Structure Theory (CST), this paper describes an empirical study that uses boosting to classify CST relationships between sentence pairs ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
Multi-document discoure analysis has emerged with the potential of improving various information retrieval applications. Based on the newly proposed Cross-document Structure Theory (CST), this paper describes an empirical study that uses boosting to classify CST relationships between sentence pairs extracted from topically related documents. We show that the binary classi er for determining existence of structural relationships signi cantly outperforms the baseline. We also achieve promising results on the multi-class case in which the full taxonomy of relationships are considered.
Single-document and multi-document summary evaluation via relative utility
- In Poster Session, Proceedings of the ACM CIKM conference
, 2003
"... We present a series of experiments to demonstrate the validity of Relative Utility (RU) as a measure for evaluating extractive summarization systems. Like some other evaluation metrics, it compares sentence selection between machine and reference summarizers. Additionally, RU is applicable in both s ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
We present a series of experiments to demonstrate the validity of Relative Utility (RU) as a measure for evaluating extractive summarization systems. Like some other evaluation metrics, it compares sentence selection between machine and reference summarizers. Additionally, RU is applicable in both single-document and multi-document summarization, is extendable to arbitrary compression rates with no extra annotation effort, and takes into account both random system performance and interjudge agreement. RU also provides an option for penalizing summaries that include sentences with redundant information. Our results are based on the JHU summary corpus and indicate that Relative Utility is a reasonable, and often superior alternative to several common summary evaluation metrics. We also give a comparison of RU with some other well-known metrics with respect to the correlation with the human judgements on the DUC corpus. 1

