Results 1 - 10
of
21
Unsupervised topic modelling for multi-party spoken discourse
- In COLING-ACL 2006
, 2006
"... We present a method for unsupervised topic modelling which adapts methods used in document classification (Blei et al., 2003; Griffiths and Steyvers, 2004) to unsegmented multi-party discourse transcripts. We show how Bayesian inference in this generative model can be used to simultaneously address ..."
Abstract
-
Cited by 42 (8 self)
- Add to MetaCart
We present a method for unsupervised topic modelling which adapts methods used in document classification (Blei et al., 2003; Griffiths and Steyvers, 2004) to unsegmented multi-party discourse transcripts. We show how Bayesian inference in this generative model can be used to simultaneously address the problems of topic segmentation and topic identification: automatically segmenting multi-party meetings into topically coherent segments with performance which compares well with previous unsupervised segmentation-only methods (Galley et al., 2003) while simultaneously extracting topics which rate highly when assessed for coherence by human judges. We also show that this method appears robust in the face of off-topic dialogue and speech recognition errors. 1
Minimum cut model for spoken lecture segmentation
- In Proceedings of the Annual Meeting of the Association for Computational Linguistics (COLING-ACL 2006
, 2006
"... We consider the task of unsupervised lecture segmentation. We formalize segmentation as a graph-partitioning task that optimizes the normalized cut criterion. Our approach moves beyond localized comparisons and takes into account longrange cohesion dependencies. Our results demonstrate that global a ..."
Abstract
-
Cited by 35 (7 self)
- Add to MetaCart
We consider the task of unsupervised lecture segmentation. We formalize segmentation as a graph-partitioning task that optimizes the normalized cut criterion. Our approach moves beyond localized comparisons and takes into account longrange cohesion dependencies. Our results demonstrate that global analysis improves the segmentation accuracy and is robust in the presence of speech recognition errors. 1
Meeting Structure Annotation: Data and Tools
- In Proceedings of the SIGdial Workshop on Discourse and Dialogue
, 2005
"... We present a set of annotations of hierarchical topic segmentations and action item subdialogues collected over 65 meetings from the ICSI and ISL meeting corpora, designed to support automatic meeting understanding and analysis. We describe an architecture for representing, annotating, and analyzing ..."
Abstract
-
Cited by 17 (8 self)
- Add to MetaCart
We present a set of annotations of hierarchical topic segmentations and action item subdialogues collected over 65 meetings from the ICSI and ISL meeting corpora, designed to support automatic meeting understanding and analysis. We describe an architecture for representing, annotating, and analyzing multi-party discourse, including: an ontology of multimodal discourse, a programming interface for that ontology, and an audiovisual toolkit which facilitates browsing and annotating discourse, as well as visualizing and adjusting features for machine learning tasks. 1
Bayesian Unsupervised Topic Segmentation
"... This paper describes a novel Bayesian approach to unsupervised topic segmentation. Unsupervised systems for this task are driven by lexical cohesion: the tendency of wellformed segments to induce a compact and consistent lexical distribution. We show that lexical cohesion can be placed in a Bayesian ..."
Abstract
-
Cited by 17 (3 self)
- Add to MetaCart
This paper describes a novel Bayesian approach to unsupervised topic segmentation. Unsupervised systems for this task are driven by lexical cohesion: the tendency of wellformed segments to induce a compact and consistent lexical distribution. We show that lexical cohesion can be placed in a Bayesian context by modeling the words in each topic segment as draws from a multinomial language model associated with the segment; maximizing the observation likelihood in such a model yields a lexically-cohesive segmentation. This contrasts with previous approaches, which relied on hand-crafted cohesion metrics. The Bayesian framework provides a principled way to incorporate additional features such as cue phrases, a powerful indicator of discourse structure that has not been previously used in unsupervised segmentation systems. Our model yields consistent improvements over an array of state-of-the-art systems on both text and speech datasets. We also show that both an entropy-based analysis and a well-known previous technique can be derived as special cases of the Bayesian framework. 1 1
Global models of document structure using latent permutations
- In NAACL’09
, 2009
"... We present a novel Bayesian topic model for learning discourse-level document structure. Our model leverages insights from discourse theory to constrain latent topic assignments in a way that reflects the underlying organization of document topics. We propose a global model in which both topic selec ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
We present a novel Bayesian topic model for learning discourse-level document structure. Our model leverages insights from discourse theory to constrain latent topic assignments in a way that reflects the underlying organization of document topics. We propose a global model in which both topic selection and ordering are biased to be similar across a collection of related documents. We show that this space of orderings can be elegantly represented using a distribution over permutations called the generalized Mallows model. Our structureaware approach substantially outperforms alternative approaches for cross-document comparison and single-document segmentation. 1 1
Exploiting Conversation Structure in Unsupervised Topic Segmentation for
"... This work concerns automatic topic segmentation of email conversations. We present a corpus of email threads manually annotated with topics, and evaluate annotator reliability. To our knowledge, this is the first such email corpus. We show how the existing topic segmentation models (i.e., Lexical Ch ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
This work concerns automatic topic segmentation of email conversations. We present a corpus of email threads manually annotated with topics, and evaluate annotator reliability. To our knowledge, this is the first such email corpus. We show how the existing topic segmentation models (i.e., Lexical Chain Segmenter (LCSeg) and Latent Dirichlet Allocation (LDA)) which are solely based on lexical information, can be applied to emails. By pointing out where these methods fail and what any desired model should consider, we propose two novel extensions of the models that not only use lexical information but also exploit finer level conversation structure in a principled way. Empirical evaluation shows that LCSeg is a better model than LDA for segmenting an email thread into topical clusters and incorporating conversation structure into these models improves the performance significantly. 1
Using Linguistically Motivated Features for Paragraph Boundary Identification
, 2006
"... In this paper we propose a machine-learning approach to paragraph boundary identification which utilizes linguistically motivated features. We investigate the relation between paragraph boundaries and discourse cues, pronominalization and information structure. We test our algorithm on German data a ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
In this paper we propose a machine-learning approach to paragraph boundary identification which utilizes linguistically motivated features. We investigate the relation between paragraph boundaries and discourse cues, pronominalization and information structure. We test our algorithm on German data and report improvements over three baselines including a reimplementation of Sporleder & Lapata’s (2006) work on paragraph segmentation. An analysis of the features’ contribution suggests an interpretation of what paragraph boundaries indicate and what they depend on.
Prosodic Cues to Discourse Segment Boundaries in Human-Computer Dialogue
- In Proc. of SIGdial
, 2004
"... Theories of discourse structure hypothesize a hierarchical structure of discourse segments, typically tree-structured. While substantial work has been done on identifying and automatically recognizing the textual and prosodic correlates of discourse structure in monologue, comparable cues for dialog ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Theories of discourse structure hypothesize a hierarchical structure of discourse segments, typically tree-structured. While substantial work has been done on identifying and automatically recognizing the textual and prosodic correlates of discourse structure in monologue, comparable cues for dialogue or multiparty conversation, and in particular humancomputer dialogue remain relatively less studied. In this paper, we explore prosodic cues to discourse segmentation in humancomputer dialogue. Using data drawn from 60 hours of interactions with a voice-only conversational spoken language system, we identify pitch and intensity features that signal segment boundaries. Specifically, based on 473 pairs of segment-final and segmentinitiating utterances, we find significant increases for segment-initial utterances in maximum pitch, average pitch, and average intensity, while segment-final utterances show significantly lower minimum pitch. These results suggest that even in the artificial environment of human-computer dialogue, prosodic cues robustly signal discourse segment structure, comparably to the contrastive uses of pitch and amplitude identified in natural monologues.
User Requirements Analysis for Meeting Information Retrieval Based on Query Elicitation
"... We present a user requirements study for Question Answering on meeting records that assesses the difficulty of users questions in terms of what type of knowledge is required in order to provide the correct answer. We grounded our work on the empirical analysis of elicited user queries. We found that ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
We present a user requirements study for Question Answering on meeting records that assesses the difficulty of users questions in terms of what type of knowledge is required in order to provide the correct answer. We grounded our work on the empirical analysis of elicited user queries. We found that the majority of elicited queries (around 60%) pertain to argumentative processes and outcomes. Our analysis also suggests that standard keyword-based Information Retrieval can only deal successfully with less than 20 % of the queries, and that it must be complemented with other types of metadata and inference. 1
Participant Subjectivity and Involvement as a Basis for Discourse Segmentation
"... We propose a framework for analyzing episodic conversational activities in terms of expressed relationships between the participants and utterance content. We test the hypothesis that linguistic features which express such properties, e.g. tense, aspect, and person deixis, are a useful basis for aut ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We propose a framework for analyzing episodic conversational activities in terms of expressed relationships between the participants and utterance content. We test the hypothesis that linguistic features which express such properties, e.g. tense, aspect, and person deixis, are a useful basis for automatic intentional discourse segmentation. We present a novel algorithm and test our hypothesis on a set of intentionally segmented conversational monologues. Our algorithm performs better than a simple baseline and as well as or better than well-known lexical-semantic segmentation methods. 1

