Results 1 - 10
of
63
Unsupervised Learning of Narrative Event Chains
"... Hand-coded scripts were used in the 1970-80s as knowledge backbones that enabled inference and other NLP tasks requiring deep semantic knowledge. We propose unsupervised induction of similar schemata called narrative event chains from raw newswire text. A narrative event chain is a partially ordered ..."
Abstract
-
Cited by 29 (3 self)
- Add to MetaCart
Hand-coded scripts were used in the 1970-80s as knowledge backbones that enabled inference and other NLP tasks requiring deep semantic knowledge. We propose unsupervised induction of similar schemata called narrative event chains from raw newswire text. A narrative event chain is a partially ordered set of events related by a common protagonist. We describe a three step process to learning narrative event chains. The first uses unsupervised distributional methods to learn narrative relations between events sharing coreferring arguments. The second applies a temporal classifier to partially order the connected events. Finally, the third prunes and clusters self-contained chains from the space of events. We introduce two evaluations: the narrative cloze to evaluate event relatedness, and an order coherence task to evaluate narrative order. We show a 36 % improvement over baseline for narrative prediction and 25 % for temporal coherence. 1
Learning Semantic Correspondences with Less Supervision
"... A central problem in grounded language acquisition is learning the correspondences between a rich world state and a stream of text which references that world state. To deal with the high degree of ambiguity present in this setting, we present a generative model that simultaneously segments the text ..."
Abstract
-
Cited by 25 (3 self)
- Add to MetaCart
A central problem in grounded language acquisition is learning the correspondences between a rich world state and a stream of text which references that world state. To deal with the high degree of ambiguity present in this setting, we present a generative model that simultaneously segments the text into utterances and maps each utterance to a meaning representation grounded in the world state. We show that our model generalizes across three domains of increasing difficulty—Robocup sportscasting, weather forecasts (a new domain), and NFL recaps. 1
Discourse generation using utility-trained coherence models
- In Proc. ACL-06
, 2006
"... We describe a generic framework for integrating various stochastic models of discourse coherence in a manner that takes advantage of their individual strengths. An integral part of this framework are algorithms for searching and training these stochastic coherence models. We evaluate the performance ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
We describe a generic framework for integrating various stochastic models of discourse coherence in a manner that takes advantage of their individual strengths. An integral part of this framework are algorithms for searching and training these stochastic coherence models. We evaluate the performance of our models and algorithms and show empirically that utilitytrained log-linear coherence models outperform each of the individual coherence models considered. 1
The Hong Kong Polytechnic University at DUC2005
- In Proceedings of the Document Understanding Conference (DUC 2005
, 2005
"... This paper discusses the query-based multidocument summarization techniques implemented by the Hong Kong Polytechnic University at DUC 2005. The summarization system is built under the framework of MEAD. In addition to borrow the features provided by MEAD for text summarization, including centroid a ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
This paper discusses the query-based multidocument summarization techniques implemented by the Hong Kong Polytechnic University at DUC 2005. The summarization system is built under the framework of MEAD. In addition to borrow the features provided by MEAD for text summarization, including centroid and sentence length etc., we also introduce the entity-based, pattern-based, termbased and semantic-based features in particular for query relevance judgment. This is our first time to participate in DUC. However, the evaluation results are encouraging. Our system ranks competitively in DUC 2005, especially in ROUGE evaluations. 1
The embra system at duc 2005: Query-oriented multi-document summarization with a very large latent semantic space
- in Proceedings of the Document Understanding Conference (DUC) 2005
, 2005
"... We present the Embra system, a first-time entry to DUC for 2005 which performed at or above median for the manual assessment of responsiveness and on 4 out of 5 linguistic quality questions. The system takes a novel approach to relevance and redundancy, modeling sentence similarity using a latent se ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
We present the Embra system, a first-time entry to DUC for 2005 which performed at or above median for the manual assessment of responsiveness and on 4 out of 5 linguistic quality questions. The system takes a novel approach to relevance and redundancy, modeling sentence similarity using a latent semantic space constructed over a very large corpus. We present a simple approach to modeling specificity based on named entities which shows a small improvement over baseline. Finally, we discuss coherence and present a sentence reordering algorithm with a componentlevel evaluation demonstrating a positive effect. 1
A Unified Local and Global Model for Discourse Coherence
"... NOTE TO READERS: We have recently detected a software bug which affects the results of our standalone entity grid experiments. (The bug was in our syntactic analysis code, which incorrectly failed to label the second object of a conjoint VP; in the phrase “wash the dishes and clean the sink”, ‘dishe ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
NOTE TO READERS: We have recently detected a software bug which affects the results of our standalone entity grid experiments. (The bug was in our syntactic analysis code, which incorrectly failed to label the second object of a conjoint VP; in the phrase “wash the dishes and clean the sink”, ‘dishes ’ would be correctly labeled as O but ‘sink ’ mislabeled as X.) This bug happened to have an unfortunate interaction with the ”This is preliminary information ” preamble mentioned in section 5. The results in table 2 above the line are incorrect; our relaxed entity grid does not outperform the naive grid on the discriminative test. This implies that our argument motivating the relaxed model at the end of section 2 is misguided. The design and performance of the joint model is unaffected. We present a model for discourse coherence which combines the local entitybased approach of (Barzilay and Lapata, 2005) and the HMM-based content model
Learning to say it well: Reranking realizations by predicted synthesis quality
- In Proceedings of COLING-ACL 2006
, 2006
"... This paper presents a method for adapting a language generator to the strengths and weaknesses of a synthetic voice, thereby improving the naturalness of synthetic speech in a spoken language dialogue system. The method trains a discriminative reranker to select paraphrases that are predicted to sou ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
This paper presents a method for adapting a language generator to the strengths and weaknesses of a synthetic voice, thereby improving the naturalness of synthetic speech in a spoken language dialogue system. The method trains a discriminative reranker to select paraphrases that are predicted to sound natural when synthesized. The ranker is trained on realizer and synthesizer features in supervised fashion, using human judgements of synthetic voice quality on a sample of the paraphrases representative of the generator’s capability. Results from a cross-validation study indicate that discriminative paraphrase reranking can achieve substantial improvements in naturalness on average, ameliorating the problem of highly variable synthesis quality typically encountered with today’s unit selection synthesizers. 1
Revisiting Readability: A Unified Framework for Predicting Text Quality
"... We combine lexical, syntactic, and discourse features to produce a highly predictive model of human readers ’ judgments of text readability. This is the first study to take into account such a variety of linguistic factors and the first to empirically demonstrate that discourse relations are strongl ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
We combine lexical, syntactic, and discourse features to produce a highly predictive model of human readers ’ judgments of text readability. This is the first study to take into account such a variety of linguistic factors and the first to empirically demonstrate that discourse relations are strongly associated with the perceived quality of text. We show that various surface metrics generally expected to be related to readability are not very good predictors of readability judgments in our Wall Street Journal corpus. We also establish that readability predictors behave differently depending on the task: predicting text readability or ranking the readability. Our experiments indicate that discourse relations are the one class of features that exhibits robustness across these two tasks. 1
A review of recent corpus-based methods for evaluating information ordering in text production
- In Proceedings of Corpus Linguistics 2005 Workshop on Using Corpora for NLG
, 2005
"... This paper surveys the corpus-based methods for evaluating Information Ordering (IO) which emerged recently in the literature on text production. First, we discuss how different assumptions about the input to IO make the preparation of corpora suitable for evaluation more challenging in Natural Lang ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
This paper surveys the corpus-based methods for evaluating Information Ordering (IO) which emerged recently in the literature on text production. First, we discuss how different assumptions about the input to IO make the preparation of corpora suitable for evaluation more challenging in Natural Language Generation than in Automatic Multidocument Summarisation. Then, we present the types of corpora and performance measures employed by the reviewed work emphasising the considerable consensus that has emerged in these two aspects of automatic evaluation of IO. 1
First steps towards dialogue modeling from an un-annotated human-human corpus,” in 5th Workshop on knowledge and reasoning in practical dialogue systems
, 2007
"... Virtual human characters equipped with natural language dialogue capability have proved useful in many fields like simulation training and interactive games. Generally behind such dialogue managers lies a complex knowledge-rich rule-based system. Building such system involves meticulous annotation o ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Virtual human characters equipped with natural language dialogue capability have proved useful in many fields like simulation training and interactive games. Generally behind such dialogue managers lies a complex knowledge-rich rule-based system. Building such system involves meticulous annotation of data and hand autoring of rules. In this paper we build a statistical dialogue model from roleplay and wizard of oz dialog corpus with virtually no annotation. We compare these methods with the traditional approaches. We have evaluated these systems for perceived appropriateness of response and the results are presented here. 1

