• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Learning correlations between linguistic indicators and semantic constraints: Reuse of context-dependent descriptions of entities (1998)

by Dragomir R Radev
Venue:In Proc. of COLING/ACL 98
Add To MetaCart

Tools

Sorted by:
Results 1 - 9 of 9

A Common Theory of Information Fusion from Multiple Text Sources Step One: Cross-Document Structure

by Dragomir R. Radev, We Introduce Cst (cross-document , 2000
"... We introduce CST (cross-document structure theory), a paradigm for multi-document analysis. CST takes into account the rhetorical structure of dusters of related textual documents. We present a taxonomy of cross-document relationships. We argue that CST can be the basis for multi-document summarizat ..."
Abstract - Cited by 41 (11 self) - Add to MetaCart
We introduce CST (cross-document structure theory), a paradigm for multi-document analysis. CST takes into account the rhetorical structure of dusters of related textual documents. We present a taxonomy of cross-document relationships. We argue that CST can be the basis for multi-document summarization guided by user preferences for summary length, information provenmace, cross-source agreement, and chronological ordering of facts.

Learning content selection rules for generating object descriptions in dialogue

by Pamela W. Jordan, Marilyn A. Walker - Journal of Artificial Intelligence Research , 2005
"... A fundamental requirement of any task-oriented dialogue system is the ability to generate object descriptions that refer to objects in the task domain. The subproblem of content selection for object descriptions in task-oriented dialogue has been the focus of much previous work and a large number of ..."
Abstract - Cited by 30 (1 self) - Add to MetaCart
A fundamental requirement of any task-oriented dialogue system is the ability to generate object descriptions that refer to objects in the task domain. The subproblem of content selection for object descriptions in task-oriented dialogue has been the focus of much previous work and a large number of models have been proposed. In this paper, we use the annotated coconut corpus of task-oriented design dialogues to develop feature sets based on Dale and Reiter’s (1995) incremental model, Brennan and Clark’s (1996) conceptual pact model, and Jordan’s (2000b) intentional influences model, and use these feature sets in a machine learning experiment to automatically learn a model of content selection for object descriptions. Since Dale and Reiter’s model requires a representation of discourse structure, the corpus annotations are used to derive a representation based on Grosz and Sidner’s (1986) theory of the intentional structure of discourse, as well as two very simple representations of discourse structure based purely on recency. We then apply the rule-induction program ripper to train and test the content selection component of an object description generator on a set of 393 object descriptions from the corpus. To our

Customization in a Unified Framework for Summarizing Medical Literature

by N. Elhadad, M.-Y. Kan, J.L. Klavans, K.R. McKeown , 2005
"... Objectives: We present the summarization system in the PERSIVAL medical digital library. Although we discuss the context of our summarization research within the PERSIVAL platform, the primary focus of this article is on strategies to define and generate customized summaries. ..."
Abstract - Cited by 16 (2 self) - Add to MetaCart
Objectives: We present the summarization system in the PERSIVAL medical digital library. Although we discuss the context of our summarization research within the PERSIVAL platform, the primary focus of this article is on strategies to define and generate customized summaries.

Corpus-Trained Text Generation for Summarization

by Min-Yen Kan , Kathleen R. McKeown , 2002
"... We explore how machine learning can be employed to learn rulesets for the traditional modules of content planning and surface realization. Our approach takes advantage of semantically annotated corpora to induce preferences for content planning and constraints on realizations of these plans. ..."
Abstract - Cited by 7 (0 self) - Add to MetaCart
We explore how machine learning can be employed to learn rulesets for the traditional modules of content planning and surface realization. Our approach takes advantage of semantically annotated corpora to induce preferences for content planning and constraints on realizations of these plans. We applied this methodology to an annotated corpus of indicative summaries to derive constraint rules that can assist in generating summaries for new, unseen material.

Semi-Supervised Named Entity Recognition: Learning to Recognize 100 Entity Types with Little Supervision

by David Nadeau
"... Table of contents List of tables........................................................................................................................ iv List of figures....................................................................................................................... v Abstrac ..."
Abstract - Cited by 4 (0 self) - Add to MetaCart
Table of contents List of tables........................................................................................................................ iv List of figures....................................................................................................................... v Abstract............................................................................................................................... vi

Report on the CONALD Workshop on Learning from Text and the Web

by Jaime Carbonell, Mark Craven, Steve Fienberg, Tom Mitchell, Yiming Yang - of Intelligent Systems, J. Stefan Inst., Jamova , 1998
"... Moo], organization and presentation of documents in information retrieval systems [GS, Hof], collaborative filtering [dVN], lexicon learning [GBGH], query reformulation [KK], text generation [Rad] and analysis of the statistical properties of text [MA]. In short, the state of the art in learning fro ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
Moo], organization and presentation of documents in information retrieval systems [GS, Hof], collaborative filtering [dVN], lexicon learning [GBGH], query reformulation [KK], text generation [Rad] and analysis of the statistical properties of text [MA]. In short, the state of the art in learning from text and the web is that a broad range of methods are currently being applied to many important and interesting tasks. There remain numerous open research questions, however. Broadly, the goals of the work presented at the workshop fall into two overlapping categories: (i) making textual information available in a structured format so that it can be used for complex queries and problem solving, and (ii) assisting users in finding, organizing and managing information represented in text sources. As an example of research aimed at the former goal, Muslea, Minton and Knoblock [MMK] have developed an approach to learning wrappers for semi-structured Web sources, such as restau

Which Session: G

by Dragomir R. Radev, Word Count , 2000
"... Under consideration for other conferences (specify)? NO We introduce CST (cross-document structure theory), a paradigm for multi-document analysis. CST takes into account the rhetorical structure of clusters of related textual documents. We present a taxonomy of cross-document relationships. We argu ..."
Abstract - Add to MetaCart
Under consideration for other conferences (specify)? NO We introduce CST (cross-document structure theory), a paradigm for multi-document analysis. CST takes into account the rhetorical structure of clusters of related textual documents. We present a taxonomy of cross-document relationships. We argue that CST can be the basis for multi-document summarization guided by user preferences for summary length, information provenance, crosssource agreement, and chronological ordering of facts. ACL-411

A Description Of The Cidr System As Used For Tdt-2

by Dragomir Radev Vasileios, Vasileios Hatzivassiloglou, Kathleen R. Mckeown - In DARPA Broadcast News Workshop , 1999
"... We describe several experimental parameters and a parallelization technique used in our online document clustering system, CIDR. These modifications were introduced into CIDR to reduce the running time so that incoming documents be clustered in almost real time. We discuss how several of these param ..."
Abstract - Add to MetaCart
We describe several experimental parameters and a parallelization technique used in our online document clustering system, CIDR. These modifications were introduced into CIDR to reduce the running time so that incoming documents be clustered in almost real time. We discuss how several of these parameters are justified on linguistic grounds and report preliminary quantitative results on the effects that these parameters have on speed and accuracy. 1. INTRODUCTION We report our experience with the development and testing of CIDR, a system for the automated placement of text documents into topical clusters. Our focus in CIDR is somewhat unusual. We have started from the assumption that our clustering system should aim for maximal efficiency, so that it will be able to classify tens of thousands of documents in real time. This puts a premium on operational speed rather than classification accuracy, and raises a number of interesting research questions, namely, what modifications to a sta...

Acknowledgements Participants

by Robert Dale, Michael White
"... iii ..."
Abstract - Add to MetaCart
Abstract not found
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University