Results 1 - 10
of
12
Towards empirical dialog-state modeling and its use in language modeling
- in Interspeech
, 2012
"... Inspired by the goal of modeling the dialog state and the speaker’s mental state, moment by moment, we apply Principal Component Analysis to a vector of 76 prosodic features spanning 6 seconds of context. This gives a multidimensional representation of the current state. We find that word probabilit ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
(Show Context)
Inspired by the goal of modeling the dialog state and the speaker’s mental state, moment by moment, we apply Principal Component Analysis to a vector of 76 prosodic features spanning 6 seconds of context. This gives a multidimensional representation of the current state. We find that word probabilities vary strongly with several of these dimensions, that the use of this information in a language model gives a 27 % reduction in perplexity, and that many of the dimensions do relate to aspects of mental state and dialog state.
USING DIALOG-ACTIVITY SIMILARITY FOR SPOKEN INFORMATION RETRIEVAL
"... We want to enable users to locate desired information in spoken audio documents using not only the words, but also dialog activities. Following previous research, we infer this information from prosodic features, however, instead of retrieval by matching to a predefined finite set of activities, we ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
(Show Context)
We want to enable users to locate desired information in spoken audio documents using not only the words, but also dialog activities. Following previous research, we infer this information from prosodic features, however, instead of retrieval by matching to a predefined finite set of activities, we estimate similarity using a vector space representation. Utterances close in this vector space are frequently similar not only pragmatically, but also topically. Using this we implemented a dialog-based query-by-example function and built it into an interface for use in combination with normal lexical search. Evaluating its utility by an experiment with four searchers doing twenty tasks each, we found that searchers used the new feature and considered it helpful, but only for some search tasks. 1. Two Views of Audio Search
Data collection for the Similar Segments in Social Speech task.
, 2013
"... Information retrieval systems rely heavily on models of similarity, but for spoken dialog such models currently use mostly standard textual-content similarity. As part of the MediaEval Benchmarking Initiative, we have created a new corpus to support development of similarity models for spoken dialo ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Information retrieval systems rely heavily on models of similarity, but for spoken dialog such models currently use mostly standard textual-content similarity. As part of the MediaEval Benchmarking Initiative, we have created a new corpus to support development of similarity models for spoken dialog. This corpus includes 26 casual dialogs among members of two semi-cohesive groups, totaling about 5 hours, with 1889 labeled regions associated into 227 sets which annotators judged to be similar enough to share a tag. This technical report brings together information about this corpus and its intended uses, previously only available on the project website.
Patterns of Importance Variation in Spoken Dialog
"... Some things people say are more important, and some less so. The ability to automatically judge this, even approximately, would be a useful front end for many applications. This paper empirically examines importance as it varies from moment to moment in spoken dialog. Contextual prosodic features ar ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
Some things people say are more important, and some less so. The ability to automatically judge this, even approximately, would be a useful front end for many applications. This paper empirically examines importance as it varies from moment to moment in spoken dialog. Contextual prosodic features are informative, and importance is frequently associated with specific patterns of interaction that involve both participants and stretch over several seconds. A simple linear regression model gave importance estimates that correlated well, 0.83, with human judgments.
Lexical and Prosodic Indicators of Importance in Spoken Dialog
, 2013
"... in Spoken Dialog [Ward and Richart-Ruiz, 2013], by providing additional evidence for the claims, additional findings, and more analysis. In particular, we report more on inter-annotator disagreement, on words that correlate with importance, on prosodic features and patterns that correlate with impor ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
in Spoken Dialog [Ward and Richart-Ruiz, 2013], by providing additional evidence for the claims, additional findings, and more analysis. In particular, we report more on inter-annotator disagreement, on words that correlate with importance, on prosodic features and patterns that correlate with importance, and on how our predictive model of importance might be improved.
Evaluating Prosody-Based Similarity Models for Information Retrieval
"... Prosody is important in spoken language, and especially in dialog, but its utility for search in dialog archives has remained an open question. Using prosody-based measures of similarity, which also roughly correlate with dialog-activity similarity and topic similarity, we built support for “retriev ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Prosody is important in spoken language, and especially in dialog, but its utility for search in dialog archives has remained an open question. Using prosody-based measures of similarity, which also roughly correlate with dialog-activity similarity and topic similarity, we built support for “retrieve more like this ” searches. Performance on the Similar Segments in Social Speech Task at MediaEval 2013 was well above baseline, showing the value of prosody for search. 1.
Where in Dialog Space does Uh-huh Occur?
"... In what dialog situations and contexts do backchannels commonly occur? This paper examines this question using a newly developed notion of dialog space, defined by orthogonal, prosody-derived dimensions. Taking 3363 instances of uh-huh, found in the Switchboard corpus, we examine where in this space ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
In what dialog situations and contexts do backchannels commonly occur? This paper examines this question using a newly developed notion of dialog space, defined by orthogonal, prosody-derived dimensions. Taking 3363 instances of uh-huh, found in the Switchboard corpus, we examine where in this space they tend to occur. While the results largely agree with previous descriptions and observations, we find several novel aspects, relating to rhythm, polarity, and the details of the low-pitch cue. Index Terms: backchannels, feedback, prosody, context, principal component analysis, dimensions, dialog activities
Challenges for robust prosody-based affect recognition
- in Proceedings of Speech Prosody
"... Prosody-based affect recognition has great potential impact for building adaptive speech interfaces. For example, in intelligent systems for personalized learning, sensing a student’s level of certainty, which is often signaled prosodically, is one of the most interesting states to interpret and res ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Prosody-based affect recognition has great potential impact for building adaptive speech interfaces. For example, in intelligent systems for personalized learning, sensing a student’s level of certainty, which is often signaled prosodically, is one of the most interesting states to interpret and respond to. However, ro-bust uncertainty recognition faces several challenges, including the lack of gold-standard labels, and differences in expressivity among speakers. In this paper we explore the intersection of these two issues. We have collected a corpus of spontaneous speech in a question-answering task. Three kinds of certainty labels are associated with each utterance. First, speakers rated their own level of certainty. Second, a panel of listeners rated how certain the speaker sounded. Third, an externally crowd-sourced difficulty score is generated for each stimulus (the ques-tion). We present a word-level prosodic analysis of individual speaking styles, as they relate to these three different measure-ments of certainty. Our results suggest that instead of learning one-size-fits-all prosodic models of affect, we might find im-provement from learning multiple models corresponding to dif-ferent speaking styles. Index Terms: Uncertainty, affect recognition, affect labels, speaking style.
A prosody-based vectorspace model of dialog activity for information retrieval
- Speech Communication
, 2015
"... Abstract Search in audio archives is a challenging problem. Using prosodic information to help find relevant content has been proposed as a complement to word-based retrieval, but its utility has been an open question. We propose a new way to use prosodic information in search, based on a vector-sp ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract Search in audio archives is a challenging problem. Using prosodic information to help find relevant content has been proposed as a complement to word-based retrieval, but its utility has been an open question. We propose a new way to use prosodic information in search, based on a vector-space model, where each point in time maps to a point in a vector space whose dimensions are derived from numerous prosodic features of the local context. Point pairs that are close in this vector space are frequently similar, not only in terms of the dialog activities, but also in topic. Using proximity in this space as an indicator of similarity, we built support for a query-by-example function. Searchers were happy to use this function, and it provided value on a large testset. Prosody-based retrieval did not perform as well as word-based retrieval, but the two sources of information were often non-redundant and in combination they sometimes performed better than either separately.
Aspectual Properties of Conversational Activities
"... Segmentation of spoken discourse into distinct conversational activities has been applied to broadcast news, meetings, monologs, and two-party dialogs. This paper considers the aspectual properties of discourse segments, meaning how they transpire in time. Classifiers were con-structed to distinguis ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Segmentation of spoken discourse into distinct conversational activities has been applied to broadcast news, meetings, monologs, and two-party dialogs. This paper considers the aspectual properties of discourse segments, meaning how they transpire in time. Classifiers were con-structed to distinguish between segment boundaries and non-boundaries, where the sizes of utterance spans to represent data instances were varied, and the locations of segment boundaries relative to these in-stances. Classifier performance was better for representations that included the end of one discourse segment combined with the beginning of the next. In addition, classi-fication accuracy was better for segments in which speakers accomplish goals with distinctive start and end points. 1