Results 1 -
6 of
6
Tracking Point of View in Narrative
- Computational Linguistics
, 1994
"... This paper presents this algorithm, gives demonstrations of an implemented system, and describes the results of some preliminary empirical studies, which lend support to the algorithm ..."
Abstract
-
Cited by 49 (10 self)
- Add to MetaCart
This paper presents this algorithm, gives demonstrations of an implemented system, and describes the results of some preliminary empirical studies, which lend support to the algorithm
Cohesion and Collocation: Using Context Vectors in Text Segmentation
- In Proceedings of the 37th Annual Meeting of the Association of for computational Linguistics (Student Session
, 1999
"... Collocational word similarity is considered a source of text cohesion that is hard to measure and quantify. The work presented here explores the use of information from a training corpus in measuring word similarity and evaluates the method in the text segmentation task. An implementation, the VetTi ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
Collocational word similarity is considered a source of text cohesion that is hard to measure and quantify. The work presented here explores the use of information from a training corpus in measuring word similarity and evaluates the method in the text segmentation task. An implementation, the VetTile system, produces similarity curves over.texts using pre-compiled vector representations of the contextual behavior of words. The performance of this system is shown to improve over that of the purely string-based TextTiling algorithm (Hearst, 1997). I Background The notion of text cohesion rests on the intuition that a text is "held together" by a variety of internal forces. Much of the relevant linguistic literature is indebted to Halliday and Hasan (1976), where co- hesion is defined as a network of relationships be- tween locations in the text, arising from (i) grammatical factors (co-reference, use of pro-forms, ellipsis and sentential connectives), and (ii) lexical factors (reiteration and collocation). Subsequent work has further developed this taxonomy (Hoey, 1991) and explored its implications in such areas as paragraphing (Longacre, 1979; Bond and Hayes, 1984; Stark, 1988), relevance (Sperber and Wilson, 1995) and discourse structure (Grosz and Sidner, 1986).
The influence of layout on the interpretation of referring expressions
- Multidisciplinary Approaches to Discourse. Amsterdam & Nodus Publications, pages 133--141. Presented at the Multidisciplinary approaches to discourse (MAD) workshop, August 2001
, 2001
"... The division of text into visual segments such as sentences, paragraphs and sections achieves many ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
The division of text into visual segments such as sentences, paragraphs and sections achieves many
Automatic paragraph identification: A study across languages and domains
- In Proceedings of the Conference on Empirical Methods in Natural Language Processing
, 2004
"... In this paper we investigate whether paragraphs can be identified automatically in different languages and domains. We propose a machine learning approach which exploits textual and discourse cues and we assess how well humans perform on this task. Our best models achieve an accuracy that is signifi ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
In this paper we investigate whether paragraphs can be identified automatically in different languages and domains. We propose a machine learning approach which exploits textual and discourse cues and we assess how well humans perform on this task. Our best models achieve an accuracy that is significantly higher than the best baseline and, for most data sets, comes to within 6 % of human performance. 1
Broad coverage paragraph segmentation across languages and domains
- ACM Trans. Speech Lang. Process
, 2006
"... This paper considers the problem of automatic paragraph segmentation. The task is relevant for speech-to-text applications whose output transcipts do not usually contain punctuation or paragraph indentation and are naturally difficult to read and process. Text-to-text generation applications (e.g., ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This paper considers the problem of automatic paragraph segmentation. The task is relevant for speech-to-text applications whose output transcipts do not usually contain punctuation or paragraph indentation and are naturally difficult to read and process. Text-to-text generation applications (e.g., summarisation) could also benefit from an automatic paragaraph segementation mechanism which indicates topic shifts and provides visual targets to the reader. We present a paragraph segmentation model which exploits a variety of knowledge sources (including textual cues, syntactic and discourse related information) and evaluate its performance in different languages and domains. Our experiments demonstrate that the proposed approach significantly outperforms our baselines and in many cases comes to within a few percent of human performance. Finally, we integrate our method with a single document summariser and show that it is useful for structuring the output of automatically generated text.

