Results 1 - 10
of
66
Using Lexical Chains for Text Summarization
, 1997
"... We investigate one technique to produce a summary of an original text without requiring its full semantic interpretation, but instead relying on a model of the topic progression in the text derived from lexical chains. We present a new algorithm to compute lexical chains in a text, merging several r ..."
Abstract
-
Cited by 276 (7 self)
- Add to MetaCart
We investigate one technique to produce a summary of an original text without requiring its full semantic interpretation, but instead relying on a model of the topic progression in the text derived from lexical chains. We present a new algorithm to compute lexical chains in a text, merging several robust knowledge sources: the WordNet thesaurus, a part-of-speech tagger and shallow parser for the ldentification of nominal groups, and a segmentation algorithm derived from (Hearst, 1994) Summarization proceeds in three steps: the original text m first segmented, lexical chains are constructed, strong chains are identified and significant sentences are extracted from the text. We present in this paper empirical results on the identification of strong chain and of significant sentences.
Automated Text Summarization in SUMMARIST
, 1999
"... SUMMARIST is an attempt to create a robust automated text summarization system, based on the equation: summarization = topic identification interpretation generation. Each of these stages contains several independent modules, many of them trained on large corpora of text. We describe the systems ..."
Abstract
-
Cited by 112 (10 self)
- Add to MetaCart
SUMMARIST is an attempt to create a robust automated text summarization system, based on the equation: summarization = topic identification interpretation generation. Each of these stages contains several independent modules, many of them trained on large corpora of text. We describe the systems architecture and provide details of some of its modules.
The Rhetorical Parsing, Summarization, and Generation of Natural Language Texts
, 1997
"... This thesis is an inquiry into the nature of the high-level, rhetorical structure of unrestricted natural language texts, computational means to enable its derivation, and two applications (in automatic summarization and natural language generation) that follow from the ability to build such structu ..."
Abstract
-
Cited by 98 (9 self)
- Add to MetaCart
This thesis is an inquiry into the nature of the high-level, rhetorical structure of unrestricted natural language texts, computational means to enable its derivation, and two applications (in automatic summarization and natural language generation) that follow from the ability to build such structures automatically. The thesis proposes a first-order formalization of the high-level, rhetorical structure of text. The formalization assumes that text can be sequenced into elementary units; that discourse relations hold between textual units of various sizes; that some textual units are more important to the writer's purpose than others; and that trees are a good approximation of the abstract structure of text. The formalization also introduces a linguistically motivated compositionality criterion, which is shown to hold for the text structures that are valid. The thesis proposes, analyzes theoretically, and compares empirically four algorithms for determining the valid text structures of ...
Functional Analysis and
- Semi-Groups, Amer. Math. Soc. Colloq. Publ
, 1957
"... In this paper we describe a Cross Document Summarizer XDoX designed specifically to summarize large document sets (50-500 documents and more). Such sets of documents are typically obtained from routing or filtering systems run against a continuous stream of data, such as a newswire. XDoX works by id ..."
Abstract
-
Cited by 70 (1 self)
- Add to MetaCart
In this paper we describe a Cross Document Summarizer XDoX designed specifically to summarize large document sets (50-500 documents and more). Such sets of documents are typically obtained from routing or filtering systems run against a continuous stream of data, such as a newswire. XDoX works by identifying the most salient themes within the set (at the granularity level that is regulated by the user) and composing an extraction summary, which reflects these main themes. In the current version, XDoX is not optimized to produce a summary based on a few unrelated documents; indeed, such summaries are best obtained simply by concatenating summaries of individual documents. We show examples of summaries obtained in our tests as well as from our participation in the first Document
Multi-document summarization by graph search and matching
- In Proceedings of the Fifteenth National Conference on Artificial Intelligence (AAAI-97
, 1997
"... We describe a new method for summarizing similarities and differences in a pair of related documents using a graph representation for text. Concepts denoted by words, phrases, and proper names in the document are represented positionally as nodes in the graph along with edges corresponding to semant ..."
Abstract
-
Cited by 62 (1 self)
- Add to MetaCart
We describe a new method for summarizing similarities and differences in a pair of related documents using a graph representation for text. Concepts denoted by words, phrases, and proper names in the document are represented positionally as nodes in the graph along with edges corresponding to semantic relations between items. Given a perspective in terms of which the pair of documents is to be summarized, the algorithm first uses a spreading activation technique to discover, in each document, nodes semantically related to the topic. The activated graphs of each document are then matched to yield a graph corresponding to similarities and differences between the pair, which is rendered in natural language. An evaluation of these techniques has been carried out.
Temporal Summaries of News Topics
, 2001
"... We discuss technology to help a person monitor changes in news coverage over time. We define temporal summaries of news stories as extracting a single sentence from each event within a news topic, where the stories are presented one at a time and sentences from a story must be ranked before the next ..."
Abstract
-
Cited by 60 (3 self)
- Add to MetaCart
We discuss technology to help a person monitor changes in news coverage over time. We define temporal summaries of news stories as extracting a single sentence from each event within a news topic, where the stories are presented one at a time and sentences from a story must be ranked before the next story can be considered. We explain a method for evaluation, and describe an evaluation corpus that we have built. We also propose several methods for constructing temporal summaries and evaluate their effectiveness in comparison to degenerate cases. We show that simple approaches are effective, but that the problem is far from solved. Keywords Summarization, Experimental Design and Metrics 1.
Training a Selection Function for Extraction
, 1999
"... In this paper we compare performance of several heuristics in generating informative generic/query-oriented extracts for newspaper articles in order to learn how topic prominence affects the performance of each heuristic. We study how different query types can affect the performance of each heur ..."
Abstract
-
Cited by 33 (1 self)
- Add to MetaCart
In this paper we compare performance of several heuristics in generating informative generic/query-oriented extracts for newspaper articles in order to learn how topic prominence affects the performance of each heuristic. We study how different query types can affect the performance of each heuristic and discuss the possibility of using machine learning algorithms to automatically learn good combination functions to combine several heuristics. We also briefly describe the design, implementation, and performance of a multilingual text summarization system SUMMARIST. Keywords Automated text summarization, topic extraction, summary evaluation. 1.
Salience-based Content Characterisation of Text Documents
- Advances in Automatic Text Summarization
, 1997
"... Tradiiaonally, the document summansahon task has been tackled other as a natural language processmg problem, with an mstanhated meanrag tnplate being rndeocl mto cohent prose, or as a passage xiractlon problem, where cetam fragments 0ypcally sentences) of the source document am deemed to be h ..."
Abstract
-
Cited by 27 (3 self)
- Add to MetaCart
Tradiiaonally, the document summansahon task has been tackled other as a natural language processmg problem, with an mstanhated meanrag tnplate being rndeocl mto cohent prose, or as a passage xiractlon problem, where cetam fragments 0ypcally sentences) of the source document am deemed to be highly represeniative'of its content, and thus dehverod as rneanmgful "apprommahons "of It Balanong the coflctmg requu ments d depth and accurecy of a sunwnary, on h one hand, and document and domam mo dependence on the other, has proven a very hard problem Tius pez descnbes a novel approach to content charactensahon of text documents It domam- and genre-mdependent, by vu-tue of not requmng an m-depth armlyss of the full raean- mg At the same tune, l! remams closer to the core meamu blt chonsm a ddferent granularity of its repmsentahons (phrasal expressions rather than senience or paragraphs), by explotmg a notion of &scourse conhgmty and coherence for the purposes oi'undorm coverage and context mamtenance, and by uhhsmg a strong lagrustic notion of sahence, as a more appropriate and mpmsenta- bye meastue of a documents "aboutne" 1 Capsule overviews The matonty of techmques for "summansaton", as apphed to average-length documents, fall wlthm two, broad categones those that rely on template mstantmton and those that rely on passage extraction Work ut the former framework tracs its mots to some ploneenng research by DeJong [Y]. and Tait [29], more recentl th DARPA-Sponsored TIPSTER programme ([2])---and, m particular, the meage understandrag conferencas {Ivc e g [6] and [1])-have provided fettle ground for such work, by placmg the emphass of document analyss to the identification and extraction of cer- tam core entihes and facts m a document, which are "pac...

