Results 1 - 10
of
42
Using Lexical Chains for Text Summarization
, 1997
"... We investigate one technique to produce a summary of an original text without requiring its full semantic interpretation, but instead relying on a model of the topic progression in the text derived from lexical chains. We present a new algorithm to compute lexical chains in a text, merging several r ..."
Abstract
-
Cited by 276 (7 self)
- Add to MetaCart
We investigate one technique to produce a summary of an original text without requiring its full semantic interpretation, but instead relying on a model of the topic progression in the text derived from lexical chains. We present a new algorithm to compute lexical chains in a text, merging several robust knowledge sources: the WordNet thesaurus, a part-of-speech tagger and shallow parser for the ldentification of nominal groups, and a segmentation algorithm derived from (Hearst, 1994) Summarization proceeds in three steps: the original text m first segmented, lexical chains are constructed, strong chains are identified and significant sentences are extracted from the text. We present in this paper empirical results on the identification of strong chain and of significant sentences.
The Rhetorical Parsing, Summarization, and Generation of Natural Language Texts
, 1997
"... This thesis is an inquiry into the nature of the high-level, rhetorical structure of unrestricted natural language texts, computational means to enable its derivation, and two applications (in automatic summarization and natural language generation) that follow from the ability to build such structu ..."
Abstract
-
Cited by 98 (9 self)
- Add to MetaCart
This thesis is an inquiry into the nature of the high-level, rhetorical structure of unrestricted natural language texts, computational means to enable its derivation, and two applications (in automatic summarization and natural language generation) that follow from the ability to build such structures automatically. The thesis proposes a first-order formalization of the high-level, rhetorical structure of text. The formalization assumes that text can be sequenced into elementary units; that discourse relations hold between textual units of various sizes; that some textual units are more important to the writer's purpose than others; and that trees are a good approximation of the abstract structure of text. The formalization also introduces a linguistically motivated compositionality criterion, which is shown to hold for the text structures that are valid. The thesis proposes, analyzes theoretically, and compares empirically four algorithms for determining the valid text structures of ...
Modeling local coherence: An entity-based approach
- In Proceedings of ACL 2005
, 2005
"... This paper considers the problem of automatic assessment of local coherence. We present a novel entity-based representation of discourse which is inspired by Centering Theory and can be computed automatically from raw text. We view coherence assessment as a ranking learning problem and show that the ..."
Abstract
-
Cited by 70 (5 self)
- Add to MetaCart
This paper considers the problem of automatic assessment of local coherence. We present a novel entity-based representation of discourse which is inspired by Centering Theory and can be computed automatically from raw text. We view coherence assessment as a ranking learning problem and show that the proposed discourse representation supports the effective learning of a ranking function. Our experiments demonstrate that the induced model achieves significantly higher accuracy than a state-of-the-art coherence model. 1
Improving word sense disambiguation in lexical chaining
- In Proceedings of IJCAI
, 2003
"... Previous algorithms to compute lexical chains suffer either from a lack of accuracy in word sense disambiguation (WSD) or from computational inefficiency. In this paper, we present a new lineartime algorithm for lexical chaining that adopts the assumption of one sense per discourse. Our results show ..."
Abstract
-
Cited by 37 (0 self)
- Add to MetaCart
Previous algorithms to compute lexical chains suffer either from a lack of accuracy in word sense disambiguation (WSD) or from computational inefficiency. In this paper, we present a new lineartime algorithm for lexical chaining that adopts the assumption of one sense per discourse. Our results show an improvement over previous algorithms when evaluated on a WSD task. 1
Cohesion and Collocation: Using Context Vectors in Text Segmentation
- In Proceedings of the 37th Annual Meeting of the Association of for computational Linguistics (Student Session
, 1999
"... Collocational word similarity is considered a source of text cohesion that is hard to measure and quantify. The work presented here explores the use of information from a training corpus in measuring word similarity and evaluates the method in the text segmentation task. An implementation, the VetTi ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
Collocational word similarity is considered a source of text cohesion that is hard to measure and quantify. The work presented here explores the use of information from a training corpus in measuring word similarity and evaluates the method in the text segmentation task. An implementation, the VetTile system, produces similarity curves over.texts using pre-compiled vector representations of the contextual behavior of words. The performance of this system is shown to improve over that of the purely string-based TextTiling algorithm (Hearst, 1997). I Background The notion of text cohesion rests on the intuition that a text is "held together" by a variety of internal forces. Much of the relevant linguistic literature is indebted to Halliday and Hasan (1976), where co- hesion is defined as a network of relationships be- tween locations in the text, arising from (i) grammatical factors (co-reference, use of pro-forms, ellipsis and sentential connectives), and (ii) lexical factors (reiteration and collocation). Subsequent work has further developed this taxonomy (Hoey, 1991) and explored its implications in such areas as paragraphing (Longacre, 1979; Bond and Hayes, 1984; Stark, 1988), relevance (Sperber and Wilson, 1995) and discourse structure (Grosz and Sidner, 1986).
Semantic Density Analysis: Comparing word meaning across time and phonetic space
"... This paper presents a new statistical method for detecting and tracking changes in word meaning, based on Latent Semantic Analysis. By comparing the density of semantic vector clusters this method allows researchers to make statistical inferences on questions such as whether the meaning of a word ch ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
This paper presents a new statistical method for detecting and tracking changes in word meaning, based on Latent Semantic Analysis. By comparing the density of semantic vector clusters this method allows researchers to make statistical inferences on questions such as whether the meaning of a word changed across time or if a phonetic cluster is associated with a specific meaning. Possible applications of this method are then illustrated in tracing the semantic change of „dog‟, „do‟, and „deer ‟ in early English and examining and comparing phonaesthemes. 1
GistSumm: A Summarization Tool Based on a New Extractive Method
, 2003
"... This paper presents a new extractive approach to automatic summarization based on the gist of the source text. The gist-based system, called GistSumm (GIST SUMMarizer), uses the gist as a guideline to identify and select text segments to include in the final extract. Automatically produced extrac ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
This paper presents a new extractive approach to automatic summarization based on the gist of the source text. The gist-based system, called GistSumm (GIST SUMMarizer), uses the gist as a guideline to identify and select text segments to include in the final extract. Automatically produced extracts have been evaluated under the light of gist preservation and textuality.
Lexical Cohesion, Discourse Segmentation and Document Summarization
- In RIAO-2000, Content-Based Multimedia Information Access
, 2000
"... Summaries automatically derived by sentence extraction are known to exhibit some coherence degradation, readability deterioration, and topical under-representation. We propose a strategy for improving upon these problems, aiming to generate more cohesive summaries by analyzing the lexical cohesion ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Summaries automatically derived by sentence extraction are known to exhibit some coherence degradation, readability deterioration, and topical under-representation. We propose a strategy for improving upon these problems, aiming to generate more cohesive summaries by analyzing the lexical cohesion factors in the source document texts. As an initial experiment, we have looked at one particular factor, lexical repetition, which is instrumental to the topical make-up of a text. We have developed a framework for integrating a lexical repetition-based model of discourse segmentation capable of detecting shifts in topic, with a linguistically-aware summarizer which utilizes notions of salience and dynamically-adjustable size of the resulting summaries. We show that even by utilizing lexical repetition alone, summaries are of comparable, and under certain conditions better, quality than those delivered by a state-of-the-art sentence-based summarizer. This is encouraging for a broad pla...
Cut-and-Paste Text Summarization
, 2001
"... Automatic text summarization provides a concise summary for a document. In this thesis, we present a cut-and-paste approach to addressing the text generation problem in domain-independent, single-document summarization. We found that professional abstractors often reuse the text in an original docu- ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Automatic text summarization provides a concise summary for a document. In this thesis, we present a cut-and-paste approach to addressing the text generation problem in domain-independent, single-document summarization. We found that professional abstractors often reuse the text in an original docu-ment for producing the text in a summary. But rather than simply extracting the original text, as in most existing automatic summarizers, humans often edit the extracted sen-tences. We call such editing operations “revision operations”. Our summarizer simu-lates two revision operations that are frequently used by humans: sentence reduction and sentence combination. Sentence reduction removes inessential phrases from sentences and sentence combination merges sentences and phrases together. The sentence reduc-tion algorithm we propose relies on multiple sources of knowledge to decide when it is appropriate to delete a phrase from a sentence, including linguistic knowledge, prob-abilities trained from corpus examples, and context information. The sentence combi-nation module relies on a set of rules to decide how to combine sentences and phrases and when to combine them. Sentence reduction aims to improve the conciseness of
Second-order Cohesion
- Computational Intelligence
, 2000
"... Similarity in contextual behavior between words is considered a source of “lexical cohesion, ” which is otherwise hard to measure or quantify. Such contextual similarity is used by an implementation for text segmentation, the VecTile system, which uses precompiled vector representations of words to ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Similarity in contextual behavior between words is considered a source of “lexical cohesion, ” which is otherwise hard to measure or quantify. Such contextual similarity is used by an implementation for text segmentation, the VecTile system, which uses precompiled vector representations of words to produce similarity curves over texts. The performance of this system is shown to improve over that of the TextTiling algorithm of Hearst (1997). Key words: text segmentation, information retrieval, cohesion, singular value decomposition. 1.

