Results 1 - 10
of
18
TextTiling: Segmenting text into multi-paragraph subtopic passages
- Computational Linguistics
, 1997
"... TextTiling is a technique for subdividing texts into multi-paragraph units that represent passages, or subtopics. The discourse cues for identifying major subtopic shifts are patterns of lexical co-occurrence and distribution. The algorithm is fully implemented and is shown to produce segmentation t ..."
Abstract
-
Cited by 275 (1 self)
- Add to MetaCart
TextTiling is a technique for subdividing texts into multi-paragraph units that represent passages, or subtopics. The discourse cues for identifying major subtopic shifts are patterns of lexical co-occurrence and distribution. The algorithm is fully implemented and is shown to produce segmentation that corresponds well to human judgments of the subtopic boundaries of 12 texts. Multi-paragraph subtopic segmentation should be useful for many text analysis tasks, including information retrieval and summarization. 1.
The Rhetorical Parsing, Summarization, and Generation of Natural Language Texts
, 1997
"... This thesis is an inquiry into the nature of the high-level, rhetorical structure of unrestricted natural language texts, computational means to enable its derivation, and two applications (in automatic summarization and natural language generation) that follow from the ability to build such structu ..."
Abstract
-
Cited by 98 (9 self)
- Add to MetaCart
This thesis is an inquiry into the nature of the high-level, rhetorical structure of unrestricted natural language texts, computational means to enable its derivation, and two applications (in automatic summarization and natural language generation) that follow from the ability to build such structures automatically. The thesis proposes a first-order formalization of the high-level, rhetorical structure of text. The formalization assumes that text can be sequenced into elementary units; that discourse relations hold between textual units of various sizes; that some textual units are more important to the writer's purpose than others; and that trees are a good approximation of the abstract structure of text. The formalization also introduces a linguistically motivated compositionality criterion, which is shown to hold for the text structures that are valid. The thesis proposes, analyzes theoretically, and compares empirically four algorithms for determining the valid text structures of ...
Latent Semantic Analysis for Text Segmentation
- In Proceedings of EMNLP
, 2001
"... This paper describes a method for linear text segmentation that is more accurate or at least as accurate as state-of-the-art methods (Utiyama and Isahara, 2001 ..."
Abstract
-
Cited by 44 (1 self)
- Add to MetaCart
This paper describes a method for linear text segmentation that is more accurate or at least as accurate as state-of-the-art methods (Utiyama and Isahara, 2001
Segmentation of Expository Texts by Hierarchical Agglomerative Clustering
, 1997
"... We propose a method for segmentation of expository texts based on hierarchical agglomerative clustering. The method uses paragraphs as the basic segments for identifying hierarchical discourse structure in the text, applying lexical similarity between them as the proximity test. Linear segmentation ..."
Abstract
-
Cited by 32 (0 self)
- Add to MetaCart
We propose a method for segmentation of expository texts based on hierarchical agglomerative clustering. The method uses paragraphs as the basic segments for identifying hierarchical discourse structure in the text, applying lexical similarity between them as the proximity test. Linear segmentation can be induced from the identified structure through application of two simple rules. However the hierarchy can be used also for intelligent exploration of the text. The proposed segmentation algorithm is evaluated against an accepted linear segmentation method and shows comparable results.
Filled Pauses As Markers Of Discourse Structure
, 1996
"... This study aims to test quantitatively whether #lled pauses #FPs# may highlight discourse structure. More speci#cally, it is #rst investigated whether FPs are more typical in the vicinity of major discourse boundaries. Secondly, the FPs are analyzed acoustically, to check whether those occurring at ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
This study aims to test quantitatively whether #lled pauses #FPs# may highlight discourse structure. More speci#cally, it is #rst investigated whether FPs are more typical in the vicinity of major discourse boundaries. Secondly, the FPs are analyzed acoustically, to check whether those occurring at major discourse boundaries are segmentally and prosodically di#erent from those at shallower breaks. Analyses of twelve spontaneous monologues #Dutch# show that phrases following major discourse boundaries more often contain FPs. Additionally, FPs after stronger breaks tend to occur phraseinitially, whereas the majority of the FPs after weak boundaries are in phrase-internal position. Also, acoustic observations reveal that FPs at major discourse boundaries are both segmentally and prosodically distinct. They also di#er with respect to the distribution of neighbouring silent pauses.
Semantics of paragraphs
- Computational Linguistics
, 1991
"... We present a computational theory of the paragraph. Within it we formally define coherence, give semantics to the adversative conjunction "but " and to the Gricean maxim of quantity, and present some new methods for anaphora resolution. The theory precisely characterizes the relationship b ..."
Abstract
-
Cited by 17 (3 self)
- Add to MetaCart
We present a computational theory of the paragraph. Within it we formally define coherence, give semantics to the adversative conjunction "but " and to the Gricean maxim of quantity, and present some new methods for anaphora resolution. The theory precisely characterizes the relationship between the content of the paragraph and background knowledge needed for its understanding. This is achieved by introducing a new type of logical theory consisting of an object level, corresponding to the content of the paragraph, a referential level, which is a new logical level encoding background knowledge, and a metalevel containing constraints on models of discourse (e.g. a formal version of Gricean maxims). We propose also specific mechanisms of interaction between these levels, resembling both classical provability and abduction. Paragraphs are then represented by a class of structures called p-models. 1.
Cohesion and Collocation: Using Context Vectors in Text Segmentation
- In Proceedings of the 37th Annual Meeting of the Association of for computational Linguistics (Student Session
, 1999
"... Collocational word similarity is considered a source of text cohesion that is hard to measure and quantify. The work presented here explores the use of information from a training corpus in measuring word similarity and evaluates the method in the text segmentation task. An implementation, the VetTi ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
Collocational word similarity is considered a source of text cohesion that is hard to measure and quantify. The work presented here explores the use of information from a training corpus in measuring word similarity and evaluates the method in the text segmentation task. An implementation, the VetTile system, produces similarity curves over.texts using pre-compiled vector representations of the contextual behavior of words. The performance of this system is shown to improve over that of the purely string-based TextTiling algorithm (Hearst, 1997). I Background The notion of text cohesion rests on the intuition that a text is "held together" by a variety of internal forces. Much of the relevant linguistic literature is indebted to Halliday and Hasan (1976), where co- hesion is defined as a network of relationships be- tween locations in the text, arising from (i) grammatical factors (co-reference, use of pro-forms, ellipsis and sentential connectives), and (ii) lexical factors (reiteration and collocation). Subsequent work has further developed this taxonomy (Hoey, 1991) and explored its implications in such areas as paragraphing (Longacre, 1979; Bond and Hayes, 1984; Stark, 1988), relevance (Sperber and Wilson, 1995) and discourse structure (Grosz and Sidner, 1986).
Using Cohesion and Coherence Models for Text Summarization
- AAAI Symposium Technical Report SS-989-06
, 1998
"... In this paper we investigate two classes of techniques to determine what is salient in a text, as a means of deciding whether that information should be included in a summary. Weintroduce three methods based on text cohesion, which models text in terms of relations between words or referring express ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
In this paper we investigate two classes of techniques to determine what is salient in a text, as a means of deciding whether that information should be included in a summary. Weintroduce three methods based on text cohesion, which models text in terms of relations between words or referring expressions, to help determine how tightly connected the text is. We also describe a method based on text coherence, which models text in terms of macro-level relations between clauses or sentences to help determine the overall argumentative structure of the text. The paper compares salience scores produced by the cohesion and coherence methods and compares them with human judgments. The results show that while the coherence method beats the cohesion methods in accuracy of determining clause salience, the best cohesion method can reach 76% of the accuracy levels of the coherence method in determining salience. Further, two of the cohesion methods each yield signi cant positive correlations with the human salience judgments. We also compare the types of discourse-related text structure discovered by cohesion and coherence methods.
A model of revision in natural language generation
- In 24th Annual Meeting of the Association for Computational Linguistics
, 1986
"... We outline a model of generation with revision, focusing on improving textual coherence. We argue that high quality text is more easily produced by iteratively revising and regenerating, as people do, rather than by using an architecturally more complex single pass generator. As a general area of st ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
We outline a model of generation with revision, focusing on improving textual coherence. We argue that high quality text is more easily produced by iteratively revising and regenerating, as people do, rather than by using an architecturally more complex single pass generator. As a general area of study, the revision process presents interesting problems: Recognition of flaws in text requires a descriptive theory of what constitutes well written prose and a parser which can build a representation in those terms. Improving text requires associating flaws with strategies for improvement. The strategies, in turn, need to know what adjustments to the decisions made during the initial generation will produce appropriate modifications to the text. We compare our treatment of revision with those of Mann and Moore (1981), Gabriel (1984), and Mann (1983). 1.

