Results 11 - 20
of
63
Supplementing Entity Coherence with Local Rhetorical Relations for Information Ordering
"... This paper investigates whether the model of local rhetorical coherence suggested in Knott et al. (2001) can boost the performance of the Centering-based metrics of entity coherence employed by Karamanis et al. (2004) for the task of information ordering. Rhetorical coherence is integrated into the ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This paper investigates whether the model of local rhetorical coherence suggested in Knott et al. (2001) can boost the performance of the Centering-based metrics of entity coherence employed by Karamanis et al. (2004) for the task of information ordering. Rhetorical coherence is integrated into the way Centering’s basic data structures are derived from the annotated features of the GNOME corpus. The results indicate that (a) the simplest metric continues to perform better than its competitors even when local rhetorical coherence is taken into account, and (b) this extra coherence constraint decreases its performance. Keywords: Information Ordering, Centering Theory, Rhetorical Coherence. 1.
Content Modeling Using Latent Permutations
"... We present a novel Bayesian topic model for learning discourse-level document structure. Our model leverages insights from discourse theory to constrain latent topic assignments in a way that reflects the underlying organization of document topics. We propose a global model in which both topic selec ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
We present a novel Bayesian topic model for learning discourse-level document structure. Our model leverages insights from discourse theory to constrain latent topic assignments in a way that reflects the underlying organization of document topics. We propose a global model in which both topic selection and ordering are biased to be similar across a collection of related documents. We show that this space of orderings can be effectively represented using a distribution over permutations called the Generalized Mallows Model. We apply our method to three complementary discourse-level tasks: cross-document alignment, document segmentation, and information ordering. Our experiments show that incorporating our permutation-based model in these applications yields substantial improvements in performance over previously proposed methods. 1 1.
Coreference Systems based on Kernels Methods
"... Various types of structural information-e.g., about the type of constructions in which binding constraints apply, or about the structure of names- play a central role in coreference resolution, often in combination with lexical information (as in expletive detection). Kernel functions appear to be a ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Various types of structural information-e.g., about the type of constructions in which binding constraints apply, or about the structure of names- play a central role in coreference resolution, often in combination with lexical information (as in expletive detection). Kernel functions appear to be a promising candidate to capture structure-sensitive similarities and complex feature combinations, but care is required to ensure they are exploited in the best possible fashion. In this paper we propose kernel functions for three subtasks of coreference resolution- binding constraint detection, expletive identification, and aliasing- together with an architecture to integrate them within the standard framework for coreference resolution. 1
Automatic Evaluation of Linguistic Quality in Multi-Document Summarization
"... To date, few attempts have been made to develop and validate methods for automatic evaluation of linguistic quality in text summarization. We present the first systematic assessment of several diverse classes of metrics designed to capture various aspects of well-written text. We train and test ling ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
To date, few attempts have been made to develop and validate methods for automatic evaluation of linguistic quality in text summarization. We present the first systematic assessment of several diverse classes of metrics designed to capture various aspects of well-written text. We train and test linguistic quality models on consecutive years of NIST evaluation data in order to show the generality of results. For grammaticality, the best results come from a set of syntactic features. Focus, coherence and referential clarity are best evaluated by a class of features measuring local coherence on the basis of cosine similarity between sentences, coreference information, and summarization specific features. Our best results are 90 % accuracy for pairwise comparisons of competing systems over a test set of several inputs and 70% for ranking summaries of a specific input. 1
Entity-based local coherence modelling using topological fields
"... One goal of natural language generation is to produce coherent text that presents information in a logical order. In this paper, we show that topological fields, which model high-level clausal structure, are an important component of local coherence in German. First, we show in a sentence ordering e ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
One goal of natural language generation is to produce coherent text that presents information in a logical order. In this paper, we show that topological fields, which model high-level clausal structure, are an important component of local coherence in German. First, we show in a sentence ordering experiment that topological field information improves the entity grid model of Barzilay and Lapata (2008) more than grammatical role and simple clausal order information do, particularly when manual annotations of this information are not available. Then, we incorporate the model enhanced with topological fields into a natural language generation system that generates constituent orders for German text, and show that the added coherence component improves performance slightly, though not statistically significantly. 1
Cognitively motivated features for readability assessment
- In European Conference for Computational Linguistics (EACL
, 2009
"... We investigate linguistic features that correlate with the readability of texts for adults with intellectual disabilities (ID). Based on a corpus of texts (including some experimentally measured for comprehension by adults with ID), we analyze the significance of novel discourselevel features relate ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
We investigate linguistic features that correlate with the readability of texts for adults with intellectual disabilities (ID). Based on a corpus of texts (including some experimentally measured for comprehension by adults with ID), we analyze the significance of novel discourselevel features related to the cognitive factors underlying our users ’ literacy challenges. We develop and evaluate a tool for automatically rating the readability of texts for these users. Our experiments show that our discourselevel, cognitively-motivated features improve automatic readability assessment. 1
Data-Driven Response Generation in Social Media
"... We present a data-driven approach to generating responses to Twitter status posts, based on phrase-based Statistical Machine Translation. We find that mapping conversational stimuli onto responses is more difficult than translating between languages, due to the wider range of possible responses, the ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
We present a data-driven approach to generating responses to Twitter status posts, based on phrase-based Statistical Machine Translation. We find that mapping conversational stimuli onto responses is more difficult than translating between languages, due to the wider range of possible responses, the larger fraction of unaligned words/phrases, and the presence of large phrase pairs whose alignment cannot be further decomposed. After addressing these challenges, we compare approaches based on SMT and Information Retrieval in a human evaluation. We show that SMT outperforms IR on this task, and its output is preferred over actual human responses in 15 % of cases. As far as we are aware, this is the first work to investigate the use of phrase-based SMT to directly translate a linguistic stimulus into an appropriate response. 1
Deciding on Units of Analysis within Centering Theory ∗
"... Many efforts in corpora annotation start with segmenting discourse into units of analysis. In this paper, we present a method for deciding on segmentation units within Centering Theory (Grosz, Joshi, & Weinstein, 1995). We survey the different existing methods to break down discourse into utterance ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Many efforts in corpora annotation start with segmenting discourse into units of analysis. In this paper, we present a method for deciding on segmentation units within Centering Theory (Grosz, Joshi, & Weinstein, 1995). We survey the different existing methods to break down discourse into utterances and discuss the results of a comparison study among them. The contribution of our study is that it was carried out with spoken data and in two different languages (English and Spanish). Our comparison suggests that the best unit of analysis for Centering-based annotation is the finite clause. The final result is a set of guidelines for how to segment discourse for Centering analysis, which is also potentially applicable to other analyses. 1
Incremental Text Structuring with Online Hierarchical Ranking
"... Many emerging applications require documents to be repeatedly updated. Such documents include newsfeeds, webpages, and shared community resources such as Wikipedia. In this paper we address the task of inserting new information into existing texts. In particular, we wish to determine the best locati ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Many emerging applications require documents to be repeatedly updated. Such documents include newsfeeds, webpages, and shared community resources such as Wikipedia. In this paper we address the task of inserting new information into existing texts. In particular, we wish to determine the best location in a text for a given piece of new information. For this process to succeed, the insertion algorithm should be informed by the existing document structure. Lengthy real-world texts are often hierarchically organized into chapters, sections, and paragraphs. We present an online ranking model which exploits this hierarchical structure – representationally in its features and algorithmically in its learning procedure. When tested on a corpus of Wikipedia articles, our hierarchically informed model predicts the correct insertion paragraph more accurately than baseline methods. 1
Automatic Factual Question Generation from Text
"... Texts with potential educational value are becoming available through the Internet (e.g., Wikipedia, news services). However, using these new texts in classrooms introduces many challenges, one of which is that they usually lack practice exercises and assessments. Here, we address part of this chall ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Texts with potential educational value are becoming available through the Internet (e.g., Wikipedia, news services). However, using these new texts in classrooms introduces many challenges, one of which is that they usually lack practice exercises and assessments. Here, we address part of this challenge by automating the creation of a specific type of assessment item. Specifically, we focus on automatically generating factual WH questions. Our goal is to create an automated system that can take as input a text and produce as output questions for assessing a reader’s knowledge of the information in the text. The questions could then be presented to a teacher, who could select and revise the ones that he or she judges to be useful. After introducing the problem, we describe some of the computational and linguistic challenges presented by factual question generation. We then present an implemented system that leverages existing natural language processing techniques to address some of these challenges. The system uses a combination of manually encoded transformation rules and a statistical question ranker trained on a tailored dataset of labeled system output. We present experiments that evaluate individual components of the system as well as the system as a whole. We found, among other things, that the question ranker roughly doubled the acceptability

