Results 1 - 10
of
15
Inferring strategies for sentence ordering in multidocument news summarization
- Journal of Artificial Intelligence Research
, 2002
"... The problem of organizing information for multidocument summarization so that the generated summary is coherent has received relatively little attention. While sentence ordering for single document summarization can be determined from the ordering of sentences in the input article, this is not the c ..."
Abstract
-
Cited by 72 (7 self)
- Add to MetaCart
The problem of organizing information for multidocument summarization so that the generated summary is coherent has received relatively little attention. While sentence ordering for single document summarization can be determined from the ordering of sentences in the input article, this is not the case for multidocument summarization where summary sentences may be drawn from different input articles. In this paper, we propose a methodology for studying the properties of ordering information in the news genre and describe experiments done on a corpus of multiple acceptable orderings we developed for the task. Based on these experiments, we implemented a strategy for ordering information that combines constraints from chronological order of events and topical relatedness. Evaluation of our augmented algorithm shows a significant improvement of the ordering over two baseline strategies. 1.
Tracking and Summarizing News on a Daily Basis with Columbia's Newsblaster
, 2002
"... Recently, there have been significant advances in several areas of language technology, including clustering, text categorization, and summarization. However, efforts to combine technology from these areas in a practical system for information access have been limited. In this paper, we present Colu ..."
Abstract
-
Cited by 50 (14 self)
- Add to MetaCart
Recently, there have been significant advances in several areas of language technology, including clustering, text categorization, and summarization. However, efforts to combine technology from these areas in a practical system for information access have been limited. In this paper, we present Columbia's Newsblaster system for online news summarization. Many of the tools developed at Columbia over the years are combined together to produce a system that crawls the web for news articles, clusters them on specific topics and produces multidocument summaries for each cluster.
Sentence Fusion for Multidocument News Summarization
- Lexical cohesion, the thesaurus, and the structure of text. Computational Linguistics, 17(1):21–48. Nenkova, Ani
, 1991
"... A system that can produce informative summaries, highlighting common information found in many online documents, will help Web users to pinpoint information that they need without extensive reading. In this article, we introduce sentence fusion, a novel text-to-text generation technique for synthesi ..."
Abstract
-
Cited by 49 (3 self)
- Add to MetaCart
A system that can produce informative summaries, highlighting common information found in many online documents, will help Web users to pinpoint information that they need without extensive reading. In this article, we introduce sentence fusion, a novel text-to-text generation technique for synthesizing common information across documents. Sentence fusion involves bottom-up local multisequence alignment to identify phrases conveying similar information and statistical generation to combine common phrases into a sentence. Sentence fusion moves the summarization field from the use of purely extractive methods to the generation of abstracts that contain sentences not found in any of the input documents and can synthesize information across sources. 1.
Newsjunkie: Providing Personalized Newsfeeds via Analysis of Information Novelty
- In WWW2004
, 2004
"... We present a principled methodology for filtering news stories by formal measures of information novelty, and show how the techniques can be used to custom-tailor newsfeeds based on information that a user has already reviewed. We review methods for analyzing novelty and then describe Newsjunkie, a ..."
Abstract
-
Cited by 44 (4 self)
- Add to MetaCart
We present a principled methodology for filtering news stories by formal measures of information novelty, and show how the techniques can be used to custom-tailor newsfeeds based on information that a user has already reviewed. We review methods for analyzing novelty and then describe Newsjunkie, a system that personalizes news for users by identifying the novelty of stories in the context of stories they have already reviewed. Newsjunkie employs novelty-analysis algorithms that represent articles as words and named entities. The algorithms analyze inter- and intra- document dynamics by considering how information evolves over time from article to article, as well as within individual articles. We review the results of a user study undertaken to gauge the value of the approach over legacy time-based review of newsfeeds, and also to compare the performance of alternate distance metrics that are used to estimate the dissimilarity between candidate new articles and sets of previously reviewed articles.
Multi-document summarization of evaluative text
- In EACL ’06: Proceedings of the 11th Conference of the European Chapter of the ACL
, 2006
"... We present and compare two approaches to the task of summarizing evaluative arguments. The first is a sentence extractionbased approach while the second is a language generation-based approach. We evaluate these approaches in a user study and find that they quantitatively perform equally well. Quali ..."
Abstract
-
Cited by 20 (0 self)
- Add to MetaCart
We present and compare two approaches to the task of summarizing evaluative arguments. The first is a sentence extractionbased approach while the second is a language generation-based approach. We evaluate these approaches in a user study and find that they quantitatively perform equally well. Qualitatively, however, we find that they perform well for different but complementary reasons. We conclude that an effective method for summarizing evaluative arguments must effectively synthesize the two approaches. 1
A compositional context sensitive multi-document summarizer: exploring the factors that influence summarization
- In Proc. of SIGIR
, 2006
"... The usual approach for automatic summarization is sentence extraction, where key sentences from the input documents are selected based on a suite of features. While word frequency often is used as a feature in summarization, its impact on system performance has not been isolated. In this paper, we s ..."
Abstract
-
Cited by 18 (3 self)
- Add to MetaCart
The usual approach for automatic summarization is sentence extraction, where key sentences from the input documents are selected based on a suite of features. While word frequency often is used as a feature in summarization, its impact on system performance has not been isolated. In this paper, we study the contribution to summarization of three factors related to frequency: content word frequency, composition functions for estimating sentence importance from word frequency, and adjustment of frequency weights based on context. We carry out our analysis using datasets from the Document Understanding Conferences, studying not only the impact of these features on automatic summarizers, but also their role in human summarization. Our research shows that a frequency based summarizer can achieve performance comparable to that of state-of-the-art systems, but only with a good composition function; context sensitivity improves performance and significantly reduces repetition.
Interactive multimedia summaries of evaluative text
- In IUI ’06: Proceedings of the 11th international conference on Intelligent user interfaces
, 2006
"... We present an interactive multimedia interface for automatically summarizing large corpora of evaluative text (e.g. online product reviews). We rely on existing techniques for extracting knowledge from the corpora but present a novel approach for conveying that knowledge to the user. Our system pres ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
We present an interactive multimedia interface for automatically summarizing large corpora of evaluative text (e.g. online product reviews). We rely on existing techniques for extracting knowledge from the corpora but present a novel approach for conveying that knowledge to the user. Our system presents the extracted knowledge in a hierarchical visualization mode as well as in a natural language summary. We propose a method for reasoning about the extracted knowledge so that the natural language summary can include only the most important information from the corpus. Our approach is interactive in that it allows the user to explore in the original dataset through intuitive visual and textual methods. Results of a formative evaluation of our interface show general satisfaction among users with our approach. 1.
Language Models for Hierarchical Summarization
, 2003
"... Hierarchies have long been used for organization, summarization, and access to information. In this dissertation we define summarization in terms of a probabilistic language model and use this definition to explore a new technique for automatically generating topic hierarchies. We use the language ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
Hierarchies have long been used for organization, summarization, and access to information. In this dissertation we define summarization in terms of a probabilistic language model and use this definition to explore a new technique for automatically generating topic hierarchies. We use the language model to characterize the documents that will be summarized and then apply a graph-theoretic algorithm to determine the best topic words for the hierarchical summary. This work is very different from previous attempts to generate topic hierarchies because it relies on statistical analysis and language modeling to identify descriptive words for a document and organize the words in a hierarchical structure. We compare
The (non)utility of linguistic features for predicting prominence in spontaneous speech
- in IEEE/ACL 2006 Workshop on Spoken Language Technology
, 2006
"... Conversational speech is characterized by prosodic variability which makes pitch accent prediction for this genre especially difficult. The linguistic literature points out that complex features such as information status, contrast and animacy help predict pitch accent placement. In this paper, we u ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
Conversational speech is characterized by prosodic variability which makes pitch accent prediction for this genre especially difficult. The linguistic literature points out that complex features such as information status, contrast and animacy help predict pitch accent placement. In this paper, we use a corpus annotated for such features to determine if they improve prominence prediction over traditional shallow features such as frequency and part-of-speech, or over new ones that we introduce. We demonstrate that while correlated with prominence, complex linguistic features do not improve prediction accuracy. Furthermore, the performance of our classifier is quite close to the ceiling defined by variability in human accent placement. An oracle experiment demonstrates, though, that at least some accuracy improvement is still possible. Index Terms — prosody, prominence, givenness, contrast, animacy 1.

