Results 1 -
4 of
4
Document Structure
- COMPUTATIONAL LINGUISTICS
, 2003
"... ... document structure can be seen as an extension of Nunberg's `text-grammar'; it is also closely related to `logical' mark-up in languages like HTML and LATEX. We show that by using this intermediate representation, several subtasks in language generation and language understanding can be defined ..."
Abstract
-
Cited by 30 (8 self)
- Add to MetaCart
... document structure can be seen as an extension of Nunberg's `text-grammar'; it is also closely related to `logical' mark-up in languages like HTML and LATEX. We show that by using this intermediate representation, several subtasks in language generation and language understanding can be defined more cleanly.
Probabilistic Head-Driven Parsing for Discourse Structure
, 2005
"... We describe a data-driven approach to building interpretable discourse structures for appointment scheduling dialogues. We represent discourse structures as headed trees and model them with probabilistic head-driven parsing techniques. We show that dialogue-based features regarding turn-takin ..."
Abstract
-
Cited by 13 (4 self)
- Add to MetaCart
We describe a data-driven approach to building interpretable discourse structures for appointment scheduling dialogues. We represent discourse structures as headed trees and model them with probabilistic head-driven parsing techniques. We show that dialogue-based features regarding turn-taking and domain specific goals have a large positive impact on performance.
Annotation for and Robust Parsing of Discourse Structure on Unrestricted Texts
"... Abstract Predicting discourse structure on naturally occurring texts and dialogs is challenging and computationally intensive. Attempts to construct hand-built systems have run into problems both in how to specify the required knowledge and how to perform the necessary computations in an efficient m ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Abstract Predicting discourse structure on naturally occurring texts and dialogs is challenging and computationally intensive. Attempts to construct hand-built systems have run into problems both in how to specify the required knowledge and how to perform the necessary computations in an efficient manner. Data-driven approaches have recently shown to be successful for handling challenging aspects of discourse without using lots of fine-grained semantic detail, but they require annotated material for training. We describe our effort to annotate Segmented Discourse Representation Structures on Wall Street Journal texts, arguing that graph-based representations are necessary for adequately capturing the dependencies found in the data. We then explore two data-driven parsing strategies for recovering discourse structures. We show that the generative PCFG model of B&L is inherently limited by its inability to incorporate new features when learning from small data sets, and we show how recent developments in dependency parsing and discriminative learning can be utilized to get around this problem and thereby improve parsing accuracy. Results from exploratory experiments on Verbmobil dialogs and our annotated news wire texts are given; these results suggest that these methods do indeed enhance performance and have the potential for significant further improvements by developing richer feature sets.
Automatic Classification of Discourse Markers on the Basis of Their Co-Occurrences
, 2003
"... A long-standing linguistic hypothesis asserts that the meanings of words are related to the contexts in which they appear (Miller and Charles 1991). This paper explores this hypothesis by showing that co-occurrences of discourse markers reflect the meanings of the discourse markers themselves. An ..."
Abstract
- Add to MetaCart
A long-standing linguistic hypothesis asserts that the meanings of words are related to the contexts in which they appear (Miller and Charles 1991). This paper explores this hypothesis by showing that co-occurrences of discourse markers reflect the meanings of the discourse markers themselves. An experiment in classifying discourse markers by their semantic class, e.g. temporal or causal, was carried out, achieving an accuracy level of approximately 75%. Analysis shows that performance differs widely across classes: temporal and negative polarity markers are classified most accurately, while additive and hypothetical markers are classified poorly.

