Results 1 - 10
of
34
The ICSI Meeting Recorder Dialog Act (MRDA) Corpus
- IN PROC. 5TH SIGDIAL WORKSHOP ON DISCOURSE AND DIALOGUE
, 2004
"... We describe a new corpus of over 180,000 handannotated dialog act tags and accompanying adjacency pair annotations for roughly 72 hours of speech from 75 naturally-occurring meetings. We provide a brief summary of the annotation system and labeling procedure, inter-annotator reliability statistics, ..."
Abstract
-
Cited by 67 (8 self)
- Add to MetaCart
We describe a new corpus of over 180,000 handannotated dialog act tags and accompanying adjacency pair annotations for roughly 72 hours of speech from 75 naturally-occurring meetings. We provide a brief summary of the annotation system and labeling procedure, inter-annotator reliability statistics, overall distributional statistics, a description of auxiliary files distributed with the corpus, and information on how to obtain the data.
Automatic Dialog Act Segmentation and Classification in Multiparty Meetings
- in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP
, 2005
"... We explore the two related tasks of dialog act (DA) segmentation and DA classification for speech from the ICSI Meeting Corpus. We employ simple lexical and prosodic knowledge sources, and compare results for human-transcribed versus automatically recognized words. Since there is little previous wor ..."
Abstract
-
Cited by 58 (10 self)
- Add to MetaCart
We explore the two related tasks of dialog act (DA) segmentation and DA classification for speech from the ICSI Meeting Corpus. We employ simple lexical and prosodic knowledge sources, and compare results for human-transcribed versus automatically recognized words. Since there is little previous work on DA segmentation and classification in the meeting domain, our study provides baseline performance rates for both tasks. We introduce a range of metrics for use in evaluation, each of which measures different aspects of interest. Results show that both tasks are difficult, particularly for a fully automatic system. We find that a very simple prosodic model aids performance over lexical information alone, especially for segmentation. Both tasks, but particularly word-based segmentation, are degraded by word recognition errors. Finally, while classification results for meeting data show some similarities to previous results for telephone conversations, findings also suggest a potential difference with respect to the effect of modeling DA context.
Identifying agreement and disagreement in conversational speech: Use of Bayesian networks to model pragmatic dependencies
- In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL
, 2004
"... We describe a statistical approach for modeling agreements and disagreements in conversational interaction. Our approach first identifies adjacency pairs using maximum entropy ranking based on a set of lexical, durational, and structural features that look both forward and backward in the discourse. ..."
Abstract
-
Cited by 48 (3 self)
- Add to MetaCart
We describe a statistical approach for modeling agreements and disagreements in conversational interaction. Our approach first identifies adjacency pairs using maximum entropy ranking based on a set of lexical, durational, and structural features that look both forward and backward in the discourse. We then classify utterances as agreement or disagreement using these adjacency pairs and features that represent various pragmatic influences of previous agreement or disagreement on the current utterance. Our approach achieves 86.9 % accuracy, a 4.9 % increase over previous work. 1
Unsupervised Modeling of Twitter Conversations
, 2010
"... We propose the first unsupervised approach to the problem of modeling dialogue acts in an open domain. Trained on a corpus of noisy Twitter conversations, our method discovers dialogue acts by clustering raw utterances. Because it accounts for the sequential behaviour of these acts, the learned mode ..."
Abstract
-
Cited by 24 (2 self)
- Add to MetaCart
We propose the first unsupervised approach to the problem of modeling dialogue acts in an open domain. Trained on a corpus of noisy Twitter conversations, our method discovers dialogue acts by clustering raw utterances. Because it accounts for the sequential behaviour of these acts, the learned model can provide insight into the shape of communication in a new medium. We address the challenge of evaluating the emergent model with a qualitative visualization and an intrinsic conversation ordering task. This work is inspired by a corpus of 1.3 million Twitter conversations, which will be made publicly available. This huge amount of data, available only because Twitter blurs the line between chatting and publishing, highlights the need to be able to adapt quickly to a new medium. 1
Multi-level Dialogue Act Tags
- IN PROCEEDINGS OF SIGDIAL ’04 (5 TH SIGDIAL WORKSHOP ON DISCOURSE AND DIALOG
, 2004
"... In this paper we discuss the use of multi-layered tagsets for dialogue acts, in the context of dialogue understanding for multiparty meeting recording and retrieval applications. We discuss ..."
Abstract
-
Cited by 21 (2 self)
- Add to MetaCart
In this paper we discuss the use of multi-layered tagsets for dialogue acts, in the context of dialogue understanding for multiparty meeting recording and retrieval applications. We discuss
The ICSI Meeting Project: Resources and Research
- in Proc. of ICASSP 2004 Meeting Recognition Workshop
, 2004
"... This paper provides a progress report on ICSI’s Meeting Project, including both the data collected and annotated as part of the project, as well as the research lines such materials support. We include a general description of the official “ICSI Meeting Corpus”, as currently available through the Li ..."
Abstract
-
Cited by 20 (4 self)
- Add to MetaCart
This paper provides a progress report on ICSI’s Meeting Project, including both the data collected and annotated as part of the project, as well as the research lines such materials support. We include a general description of the official “ICSI Meeting Corpus”, as currently available through the Linguistic Data Consortium, discuss some of the existing and planned annotations which augment the basic transcripts provided there, and describe several research efforts that make use of these materials. The corpus supports wideranging efforts, from low-level processing of the audio signal (including automatic speech transcription, speaker tracking, and work on far-field acoustics) to higher-level analyses of meeting structure, content, and interactions (such as topic and sentence segmentation, and automatic detection of dialogue acts and meeting “hot spots”). 1.
D.: Shallow dialogue processing using machine learning algorithms (or not
, 2005
"... Abstract. This paper presents a shallow dialogue analysis model, aimed at human-human dialogues in the context of staff or business meetings. Four components of the model are defined, and several machine learning techniques are used to extract features from dialogue transcripts: maximum entropy clas ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
Abstract. This paper presents a shallow dialogue analysis model, aimed at human-human dialogues in the context of staff or business meetings. Four components of the model are defined, and several machine learning techniques are used to extract features from dialogue transcripts: maximum entropy classifiers for dialogue acts, latent semantic analysis for topic segmentation, or decision tree classifiers for discourse markers. A rule-based approach is proposed for solving cross-modal references to meeting documents. The methods are trained and evaluated thanks to a common data set and annotation format. The integration of the components into an automated shallow dialogue parser opens the way to multimodal meeting processing and retrieval applications. 1
2006. Overlap in Meetings: ASR Effects and Analysis by Dialog Factors, Speakers, and Collection Site
- Proc. of MLMI2006 (Springer Lecture Notes in Computer Science 4299
, 2004
"... Abstract. We analyze speaker overlap in multiparty meetings both in terms of automatic speech recognition (ASR) performance, and in terms of distribution of overlap with respect to various factors (collection site, speakers, dialog acts, and hot spots). Unlike most previous work on overlap or crosst ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Abstract. We analyze speaker overlap in multiparty meetings both in terms of automatic speech recognition (ASR) performance, and in terms of distribution of overlap with respect to various factors (collection site, speakers, dialog acts, and hot spots). Unlike most previous work on overlap or crosstalk, our ASR error analysis uses an approach that allows comparison of the same foreground speech with and without naturally occurring overlap, using a state-of-the-art meeting recognition system. We examine 101 meetings. For analysis of ASR, we use 26 meetings from the NIST meeting transcription evaluations, and discover a number of interesting phenomena. First, overlaps tend to occur at high-perplexity regions in the foreground talker’s speech. Second, overlap regions tend to have higher perplexity than those in nonoverlaps, if trigrams or 4-grams are used, but unigram perplexity within overlaps is considerably lower than that of nonoverlaps. Third, word error rate (WER) after overlaps is consistently lower than that before the overlap, apparently because the foreground speaker reduces perplexity shortly after being overlapped. These appear to be robust findings, because they hold in general across meetings from different collection sites, even though meeting style and absolute rates of overlap vary by site. Further analyses of overlap with respect to speakers and meeting content were conducted on a set of 75 additional meetings collected and annotated at ICSI. These analyses reveal interesting relationships between overlap and dialog acts, as well as between overlap and “hot spots ” (points of increased participant involvement). Finally, results from this larger data set show that individual speakers have widely varying rates of being overlapped. 1
Designing Focused and Efficient Annotation Tools
"... The creation of large, richly annotated, multimodal corpora of human interactions is an expensive and time consuming task. Support from annotation tools that make the annotation process more efficient is required, especially if the annotation effort involves really large amounts of data. Therefore w ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
The creation of large, richly annotated, multimodal corpora of human interactions is an expensive and time consuming task. Support from annotation tools that make the annotation process more efficient is required, especially if the annotation effort involves really large amounts of data. Therefore we investigated how different properties of specific annotation tasks can have an impact on the design of a tool focused on that general class of tasks. In this paper we present our view on the considerations that should drive the design of new tools geared to specific tasks. The main dimensions that we consider are: observation vs interpretation, explicit and implicit input layers, segmentation, feedback, constraints, relations and the content of the annotation elements.
Semi-supervised Speech Act Recognition in Emails and Forums
"... In this paper, we present a semi-supervised method for automatic speech act recognition in email and forums. The major challenge of this task is due to lack of labeled data in these two genres. Our method leverages labeled data in the Switchboard-DAMSL and the Meeting Recorder Dialog Act database an ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
In this paper, we present a semi-supervised method for automatic speech act recognition in email and forums. The major challenge of this task is due to lack of labeled data in these two genres. Our method leverages labeled data in the Switchboard-DAMSL and the Meeting Recorder Dialog Act database and applies simple domain adaptation techniques over a large amount of unlabeled email and forum data to address this problem. Our method uses automatically extracted features such as phrases and dependency trees, called subtree features, for semi-supervised learning. Empirical results demonstrate that our model is effective in email and forum speech act recognition. 1

