Results 1 -
9 of
9
A Common Theory of Information Fusion from Multiple Text Sources Step One: Cross-Document Structure
, 2000
"... We introduce CST (cross-document structure theory), a paradigm for multi-document analysis. CST takes into account the rhetorical structure of dusters of related textual documents. We present a taxonomy of cross-document relationships. We argue that CST can be the basis for multi-document summarizat ..."
Abstract
-
Cited by 41 (11 self)
- Add to MetaCart
We introduce CST (cross-document structure theory), a paradigm for multi-document analysis. CST takes into account the rhetorical structure of dusters of related textual documents. We present a taxonomy of cross-document relationships. We argue that CST can be the basis for multi-document summarization guided by user preferences for summary length, information provenmace, cross-source agreement, and chronological ordering of facts.
Learning content selection rules for generating object descriptions in dialogue
- Journal of Artificial Intelligence Research
, 2005
"... A fundamental requirement of any task-oriented dialogue system is the ability to generate object descriptions that refer to objects in the task domain. The subproblem of content selection for object descriptions in task-oriented dialogue has been the focus of much previous work and a large number of ..."
Abstract
-
Cited by 30 (1 self)
- Add to MetaCart
A fundamental requirement of any task-oriented dialogue system is the ability to generate object descriptions that refer to objects in the task domain. The subproblem of content selection for object descriptions in task-oriented dialogue has been the focus of much previous work and a large number of models have been proposed. In this paper, we use the annotated coconut corpus of task-oriented design dialogues to develop feature sets based on Dale and Reiter’s (1995) incremental model, Brennan and Clark’s (1996) conceptual pact model, and Jordan’s (2000b) intentional influences model, and use these feature sets in a machine learning experiment to automatically learn a model of content selection for object descriptions. Since Dale and Reiter’s model requires a representation of discourse structure, the corpus annotations are used to derive a representation based on Grosz and Sidner’s (1986) theory of the intentional structure of discourse, as well as two very simple representations of discourse structure based purely on recency. We then apply the rule-induction program ripper to train and test the content selection component of an object description generator on a set of 393 object descriptions from the corpus. To our
Customization in a Unified Framework for Summarizing Medical Literature
, 2005
"... Objectives: We present the summarization system in the PERSIVAL medical digital library. Although we discuss the context of our summarization research within the PERSIVAL platform, the primary focus of this article is on strategies to define and generate customized summaries. ..."
Abstract
-
Cited by 16 (2 self)
- Add to MetaCart
Objectives: We present the summarization system in the PERSIVAL medical digital library. Although we discuss the context of our summarization research within the PERSIVAL platform, the primary focus of this article is on strategies to define and generate customized summaries.
Corpus-Trained Text Generation for Summarization
, 2002
"... We explore how machine learning can be employed to learn rulesets for the traditional modules of content planning and surface realization. Our approach takes advantage of semantically annotated corpora to induce preferences for content planning and constraints on realizations of these plans. ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
We explore how machine learning can be employed to learn rulesets for the traditional modules of content planning and surface realization. Our approach takes advantage of semantically annotated corpora to induce preferences for content planning and constraints on realizations of these plans. We applied this methodology to an annotated corpus of indicative summaries to derive constraint rules that can assist in generating summaries for new, unseen material.
Semi-Supervised Named Entity Recognition: Learning to Recognize 100 Entity Types with Little Supervision
"... Table of contents List of tables........................................................................................................................ iv List of figures....................................................................................................................... v Abstrac ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Table of contents List of tables........................................................................................................................ iv List of figures....................................................................................................................... v Abstract............................................................................................................................... vi
Report on the CONALD Workshop on Learning from Text and the Web
- of Intelligent Systems, J. Stefan Inst., Jamova
, 1998
"... Moo], organization and presentation of documents in information retrieval systems [GS, Hof], collaborative filtering [dVN], lexicon learning [GBGH], query reformulation [KK], text generation [Rad] and analysis of the statistical properties of text [MA]. In short, the state of the art in learning fro ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Moo], organization and presentation of documents in information retrieval systems [GS, Hof], collaborative filtering [dVN], lexicon learning [GBGH], query reformulation [KK], text generation [Rad] and analysis of the statistical properties of text [MA]. In short, the state of the art in learning from text and the web is that a broad range of methods are currently being applied to many important and interesting tasks. There remain numerous open research questions, however. Broadly, the goals of the work presented at the workshop fall into two overlapping categories: (i) making textual information available in a structured format so that it can be used for complex queries and problem solving, and (ii) assisting users in finding, organizing and managing information represented in text sources. As an example of research aimed at the former goal, Muslea, Minton and Knoblock [MMK] have developed an approach to learning wrappers for semi-structured Web sources, such as restau
Which Session: G
, 2000
"... Under consideration for other conferences (specify)? NO We introduce CST (cross-document structure theory), a paradigm for multi-document analysis. CST takes into account the rhetorical structure of clusters of related textual documents. We present a taxonomy of cross-document relationships. We argu ..."
Abstract
- Add to MetaCart
Under consideration for other conferences (specify)? NO We introduce CST (cross-document structure theory), a paradigm for multi-document analysis. CST takes into account the rhetorical structure of clusters of related textual documents. We present a taxonomy of cross-document relationships. We argue that CST can be the basis for multi-document summarization guided by user preferences for summary length, information provenance, crosssource agreement, and chronological ordering of facts. ACL-411
A Description Of The Cidr System As Used For Tdt-2
- In DARPA Broadcast News Workshop
, 1999
"... We describe several experimental parameters and a parallelization technique used in our online document clustering system, CIDR. These modifications were introduced into CIDR to reduce the running time so that incoming documents be clustered in almost real time. We discuss how several of these param ..."
Abstract
- Add to MetaCart
We describe several experimental parameters and a parallelization technique used in our online document clustering system, CIDR. These modifications were introduced into CIDR to reduce the running time so that incoming documents be clustered in almost real time. We discuss how several of these parameters are justified on linguistic grounds and report preliminary quantitative results on the effects that these parameters have on speed and accuracy. 1. INTRODUCTION We report our experience with the development and testing of CIDR, a system for the automated placement of text documents into topical clusters. Our focus in CIDR is somewhat unusual. We have started from the assumption that our clustering system should aim for maximal efficiency, so that it will be able to classify tens of thousands of documents in real time. This puts a premium on operational speed rather than classification accuracy, and raises a number of interesting research questions, namely, what modifications to a sta...

