Results 1 - 10
of
75
Summarizing Scientific Articles - Experiments with Relevance and Rhetorical Status
- Computational Linguistics
, 2002
"... this paper we argue that scientific articles require a different summarization strategy than, for instance, news articles. We propose a strategy which concentrates on the rhetorical status of statements in the article: Material for summaries is selected in such a way that summaries can highlight the ..."
Abstract
-
Cited by 103 (2 self)
- Add to MetaCart
this paper we argue that scientific articles require a different summarization strategy than, for instance, news articles. We propose a strategy which concentrates on the rhetorical status of statements in the article: Material for summaries is selected in such a way that summaries can highlight the new contribution of the source paper and situate it with respect to earlier work. We provide a gold standard for summaries of this kind consisting of a substantial corpus of conference articles in computational linguistics with human judgements of rhetorical status and relevance. We present several experiments measuring our judges' agreement on these annotations. We also present an algorithm which, on the basis of the annotated training material, selects content and classifies it into a fixed set of seven rhetorical categories. The output of this extraction and classification system can be viewed as a single-document summary in its own right; alternatively, it can be used to generate task-oriented and user-tailored summaries designed to give users an overview of a scientific field.
Discourse segmentation of multi-party conversation
- in 41st Annual Meeting of ACL
, 2003
"... We present a domain-independent topic segmentation algorithm for multi-party speech. Our feature-based algorithm combines knowledge about content using a text-based algorithm as a feature and about form using linguistic and acoustic cues about topic shifts extracted from speech. This segmentation al ..."
Abstract
-
Cited by 65 (1 self)
- Add to MetaCart
We present a domain-independent topic segmentation algorithm for multi-party speech. Our feature-based algorithm combines knowledge about content using a text-based algorithm as a feature and about form using linguistic and acoustic cues about topic shifts extracted from speech. This segmentation algorithm uses automatically induced decision rules to combine the different features. The embedded text-based algorithm builds on lexical cohesion and has performance comparable to state-of-the-art algorithms based on lexical information. A significant error reduction is obtained by combining the two knowledge sources. 1
Minimum cut model for spoken lecture segmentation
- In Proceedings of the Annual Meeting of the Association for Computational Linguistics (COLING-ACL 2006
, 2006
"... We consider the task of unsupervised lecture segmentation. We formalize segmentation as a graph-partitioning task that optimizes the normalized cut criterion. Our approach moves beyond localized comparisons and takes into account longrange cohesion dependencies. Our results demonstrate that global a ..."
Abstract
-
Cited by 35 (7 self)
- Add to MetaCart
We consider the task of unsupervised lecture segmentation. We formalize segmentation as a graph-partitioning task that optimizes the normalized cut criterion. Our approach moves beyond localized comparisons and takes into account longrange cohesion dependencies. Our results demonstrate that global analysis improves the segmentation accuracy and is robust in the presence of speech recognition errors. 1
From frequency to meaning : Vector space models of semantics
- Journal of Artificial Intelligence Research
, 2010
"... Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are begi ..."
Abstract
-
Cited by 34 (0 self)
- Add to MetaCart
Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are beginning to address these limits. This paper surveys the use of VSMs for semantic processing of text. We organize the literature on VSMs according to the structure of the matrix in a VSM. There are currently three broad classes of VSMs, based on term–document, word–context, and pair–pattern matrices, yielding three classes of applications. We survey a broad range of applications in these three categories and we take a detailed look at a specific open source project in each category. Our goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs for those who are already familiar with the area, and to provide pointers into the literature for those who are less familiar with the field. 1.
A Statistical Model for Domain-Independent Text Segmentation
- In Proceedings of the 9th Conference of the European Chapter of the Association for Computational Linguistics
, 2001
"... We propose a statistical method that finds the maximum-probability segmentation of a given text. This method does not require training data because it estimates probabilities from the given text. Therefore, it can be applied to any text in any domain. An experiment showed that the method is m ..."
Abstract
-
Cited by 28 (0 self)
- Add to MetaCart
We propose a statistical method that finds the maximum-probability segmentation of a given text. This method does not require training data because it estimates probabilities from the given text. Therefore, it can be applied to any text in any domain. An experiment showed that the method is more accurate than or at least as accurate as a state-of-the-art text segmentation system.
Generating overview summaries of ongoing email thread discussions
- In Proceedings of COLING 2004
, 2004
"... The tedious task of responding to a backlog of email is one which is familiar to many researchers. As a subset of email management, we address the problem of constructing a summary of email discussions. Specifically, we examine ongoing discussions which will ultimately culminate in a consensus in a ..."
Abstract
-
Cited by 25 (0 self)
- Add to MetaCart
The tedious task of responding to a backlog of email is one which is familiar to many researchers. As a subset of email management, we address the problem of constructing a summary of email discussions. Specifically, we examine ongoing discussions which will ultimately culminate in a consensus in a decision-making process. Our summary provides a snapshot of the current stateof-affairs of the discussion and facilitates a speedy response from the user, who might be the bottleneck in some matter being resolved. We present a method which uses the structure of the thread dialogue and word vector techniques to determine which sentence in the thread should be extracted as the main issue. Our solution successfully identifies the sentence containing the issue of the thread being discussed, potentially more informative than subject line. 1
Topic-based document segmentation with probabilistic latent semantic analysis
- In Proceedings of CIKM (McLean
, 2002
"... ..."
Summarising Scientific Articles - Experiments with Relevance and Rhetorical Status
- Computational Linguistics
"... Machine (COLING94), S.Tojo 28 9411023 Abstract Generation Based on Rhetorical Structure Extraction (COLING94), K.Ono et al. ..."
Abstract
-
Cited by 17 (3 self)
- Add to MetaCart
Machine (COLING94), S.Tojo 28 9411023 Abstract Generation Based on Rhetorical Structure Extraction (COLING94), K.Ono et al.
Meeting Structure Annotation: Data and Tools
- In Proceedings of the SIGdial Workshop on Discourse and Dialogue
, 2005
"... We present a set of annotations of hierarchical topic segmentations and action item subdialogues collected over 65 meetings from the ICSI and ISL meeting corpora, designed to support automatic meeting understanding and analysis. We describe an architecture for representing, annotating, and analyzing ..."
Abstract
-
Cited by 17 (8 self)
- Add to MetaCart
We present a set of annotations of hierarchical topic segmentations and action item subdialogues collected over 65 meetings from the ICSI and ISL meeting corpora, designed to support automatic meeting understanding and analysis. We describe an architecture for representing, annotating, and analyzing multi-party discourse, including: an ontology of multimodal discourse, a programming interface for that ontology, and an audiovisual toolkit which facilitates browsing and annotating discourse, as well as visualizing and adjusting features for machine learning tasks. 1
Bayesian Unsupervised Topic Segmentation
"... This paper describes a novel Bayesian approach to unsupervised topic segmentation. Unsupervised systems for this task are driven by lexical cohesion: the tendency of wellformed segments to induce a compact and consistent lexical distribution. We show that lexical cohesion can be placed in a Bayesian ..."
Abstract
-
Cited by 17 (3 self)
- Add to MetaCart
This paper describes a novel Bayesian approach to unsupervised topic segmentation. Unsupervised systems for this task are driven by lexical cohesion: the tendency of wellformed segments to induce a compact and consistent lexical distribution. We show that lexical cohesion can be placed in a Bayesian context by modeling the words in each topic segment as draws from a multinomial language model associated with the segment; maximizing the observation likelihood in such a model yields a lexically-cohesive segmentation. This contrasts with previous approaches, which relied on hand-crafted cohesion metrics. The Bayesian framework provides a principled way to incorporate additional features such as cue phrases, a powerful indicator of discourse structure that has not been previously used in unsupervised segmentation systems. Our model yields consistent improvements over an array of state-of-the-art systems on both text and speech datasets. We also show that both an entropy-based analysis and a well-known previous technique can be derived as special cases of the Bayesian framework. 1 1

