• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

A Bottom-Up Exploration of the Dimensions of Dialog State in Spoken Interaction

by Nigel G. Ward, Alejandro Vega
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 12
Next 10 →

Towards empirical dialog-state modeling and its use in language modeling

by Nigel G. Ward, Ro Vega - in Interspeech , 2012
"... Inspired by the goal of modeling the dialog state and the speaker’s mental state, moment by moment, we apply Principal Component Analysis to a vector of 76 prosodic features spanning 6 seconds of context. This gives a multidimensional representation of the current state. We find that word probabilit ..."
Abstract - Cited by 6 (3 self) - Add to MetaCart
Inspired by the goal of modeling the dialog state and the speaker’s mental state, moment by moment, we apply Principal Component Analysis to a vector of 76 prosodic features spanning 6 seconds of context. This gives a multidimensional representation of the current state. We find that word probabilities vary strongly with several of these dimensions, that the use of this information in a language model gives a 27 % reduction in perplexity, and that many of the dimensions do relate to aspects of mental state and dialog state.
(Show Context)

Citation Context

..., to go from the raw, messily correlated features to a “latent” orthogonal set of components encoding the same information. While PCA has previously been applied to prosodic features, as discussed in =-=[16]-=- and the references therein, it has not before been used in language modeling. 3. The Features, the Data, and Principal Component Analysis Our feature set consisted of 76 prosodic features, taken from...

USING DIALOG-ACTIVITY SIMILARITY FOR SPOKEN INFORMATION RETRIEVAL

by Nigel G. Ward, Steven D. Werner
"... We want to enable users to locate desired information in spoken audio documents using not only the words, but also dialog activities. Following previous research, we infer this information from prosodic features, however, instead of retrieval by matching to a predefined finite set of activities, we ..."
Abstract - Cited by 4 (4 self) - Add to MetaCart
We want to enable users to locate desired information in spoken audio documents using not only the words, but also dialog activities. Following previous research, we infer this information from prosodic features, however, instead of retrieval by matching to a predefined finite set of activities, we estimate similarity using a vector space representation. Utterances close in this vector space are frequently similar not only pragmatically, but also topically. Using this we implemented a dialog-based query-by-example function and built it into an interface for use in combination with normal lexical search. Evaluating its utility by an experiment with four searchers doing twenty tasks each, we found that searchers used the new feature and considered it helpful, but only for some search tasks. 1. Two Views of Audio Search
(Show Context)

Citation Context

... finite taxonomy of dialog activities can support most search needs. We therefore chose instead to use an empirically-derived representation of dialog activities. This representation, as described in =-=[14]-=-, is derived by the application of Principal Component Analysis to 76 local prosodic features. While using the common features pitch height, pitch range, speaking rate, and volume, this feature set is...

Data collection for the Similar Segments in Social Speech task.

by Nigel G Ward , Steven D Werner , 2013
"... Information retrieval systems rely heavily on models of similarity, but for spoken dialog such models currently use mostly standard textual-content similarity. As part of the MediaEval Benchmarking Initiative, we have created a new corpus to support development of similarity models for spoken dialo ..."
Abstract - Cited by 3 (3 self) - Add to MetaCart
Information retrieval systems rely heavily on models of similarity, but for spoken dialog such models currently use mostly standard textual-content similarity. As part of the MediaEval Benchmarking Initiative, we have created a new corpus to support development of similarity models for spoken dialog. This corpus includes 26 casual dialogs among members of two semi-cohesive groups, totaling about 5 hours, with 1889 labeled regions associated into 227 sets which annotators judged to be similar enough to share a tag. This technical report brings together information about this corpus and its intended uses, previously only available on the project website.

Patterns of Importance Variation in Spoken Dialog

by Nigel G. Ward, Karen A. Richart-ruiz
"... Some things people say are more important, and some less so. The ability to automatically judge this, even approximately, would be a useful front end for many applications. This paper empirically examines importance as it varies from moment to moment in spoken dialog. Contextual prosodic features ar ..."
Abstract - Cited by 2 (1 self) - Add to MetaCart
Some things people say are more important, and some less so. The ability to automatically judge this, even approximately, would be a useful front end for many applications. This paper empirically examines importance as it varies from moment to moment in spoken dialog. Contextual prosodic features are informative, and importance is frequently associated with specific patterns of interaction that involve both participants and stretch over several seconds. A simple linear regression model gave importance estimates that correlated well, 0.83, with human judgments.
(Show Context)

Citation Context

...king importance vaguely somewhere in the area, but more precisely indicating important and unimportant moments. To explore this we used Principal Components Analysis (PCA), as described in detail in (=-=Ward and Vega, 2012-=-). In short, this method finds patterns of prosodic features which co-occur frequently in the data, and thus provides an unsupervised way to discover the latent structure underlying the observed regul...

Lexical and Prosodic Indicators of Importance in Spoken Dialog

by Nigel G. Ward, Karen A. Richart-ruiz , 2013
"... in Spoken Dialog [Ward and Richart-Ruiz, 2013], by providing additional evidence for the claims, additional findings, and more analysis. In particular, we report more on inter-annotator disagreement, on words that correlate with importance, on prosodic features and patterns that correlate with impor ..."
Abstract - Cited by 2 (2 self) - Add to MetaCart
in Spoken Dialog [Ward and Richart-Ruiz, 2013], by providing additional evidence for the claims, additional findings, and more analysis. In particular, we report more on inter-annotator disagreement, on words that correlate with importance, on prosodic features and patterns that correlate with importance, and on how our predictive model of importance might be improved.

Evaluating Prosody-Based Similarity Models for Information Retrieval

by Steven D. Werner, Nigel G. Ward
"... Prosody is important in spoken language, and especially in dialog, but its utility for search in dialog archives has remained an open question. Using prosody-based measures of similarity, which also roughly correlate with dialog-activity similarity and topic similarity, we built support for “retriev ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
Prosody is important in spoken language, and especially in dialog, but its utility for search in dialog archives has remained an open question. Using prosody-based measures of similarity, which also roughly correlate with dialog-activity similarity and topic similarity, we built support for “retrieve more like this ” searches. Performance on the Similar Segments in Social Speech Task at MediaEval 2013 was well above baseline, showing the value of prosody for search. 1.
(Show Context)

Citation Context

...ime maps to a point in this space. This representation is obtained by applying Principal Component Analysis to 78 local prosodic features computed every 10ms calculated over a 6 second sliding window =-=[2]-=-. This feature set was choosen for simplicity of computation and for providing coverage of most of the prosodic aspects known to be most relevent for dialog. It resembles that used in [2], but with mo...

Where in Dialog Space does Uh-huh Occur?

by Nigel G. Ward, David G. Novick, Ro Vega
"... In what dialog situations and contexts do backchannels commonly occur? This paper examines this question using a newly developed notion of dialog space, defined by orthogonal, prosody-derived dimensions. Taking 3363 instances of uh-huh, found in the Switchboard corpus, we examine where in this space ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
In what dialog situations and contexts do backchannels commonly occur? This paper examines this question using a newly developed notion of dialog space, defined by orthogonal, prosody-derived dimensions. Taking 3363 instances of uh-huh, found in the Switchboard corpus, we examine where in this space they tend to occur. While the results largely agree with previous descriptions and observations, we find several novel aspects, relating to rhythm, polarity, and the details of the low-pitch cue. Index Terms: backchannels, feedback, prosody, context, principal component analysis, dimensions, dialog activities
(Show Context)

Citation Context

...pical dialog situations where backchannels occur, we need to start with a way to describe dialog situations. While there are many taxonomic systems to choose from, here we use a new, empirical method =-=[7]-=-. Reasoning that the local prosody is a good indicator of dialog activities and states, we started with 76 local prosodic features, consisting of pitch height, pitch range, speaking rate, and volume, ...

Challenges for robust prosody-based affect recognition

by Heather Pon-barry, Arun Reddy Nelakurthi - in Proceedings of Speech Prosody
"... Prosody-based affect recognition has great potential impact for building adaptive speech interfaces. For example, in intelligent systems for personalized learning, sensing a student’s level of certainty, which is often signaled prosodically, is one of the most interesting states to interpret and res ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Prosody-based affect recognition has great potential impact for building adaptive speech interfaces. For example, in intelligent systems for personalized learning, sensing a student’s level of certainty, which is often signaled prosodically, is one of the most interesting states to interpret and respond to. However, ro-bust uncertainty recognition faces several challenges, including the lack of gold-standard labels, and differences in expressivity among speakers. In this paper we explore the intersection of these two issues. We have collected a corpus of spontaneous speech in a question-answering task. Three kinds of certainty labels are associated with each utterance. First, speakers rated their own level of certainty. Second, a panel of listeners rated how certain the speaker sounded. Third, an externally crowd-sourced difficulty score is generated for each stimulus (the ques-tion). We present a word-level prosodic analysis of individual speaking styles, as they relate to these three different measure-ments of certainty. Our results suggest that instead of learning one-size-fits-all prosodic models of affect, we might find im-provement from learning multiple models corresponding to dif-ferent speaking styles. Index Terms: Uncertainty, affect recognition, affect labels, speaking style.
(Show Context)

Citation Context

...answers to questions that varying in the speaker’s level of certainty. Similar to recent work that has applied principal components analysis (PCA) to large sets of low-level acousticprosodic features =-=[14]-=-, we identify a set of 10 principal components from a large set of word-level prosodic features. We then use the smaller set of prosodic features to learn several decision trees for each speaker and a...

A prosody-based vectorspace model of dialog activity for information retrieval

by Nigel G Ward , Steven D Werner , Fernando Garcia , Emilio Sanchis - Speech Communication , 2015
"... Abstract Search in audio archives is a challenging problem. Using prosodic information to help find relevant content has been proposed as a complement to word-based retrieval, but its utility has been an open question. We propose a new way to use prosodic information in search, based on a vector-sp ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Abstract Search in audio archives is a challenging problem. Using prosodic information to help find relevant content has been proposed as a complement to word-based retrieval, but its utility has been an open question. We propose a new way to use prosodic information in search, based on a vector-space model, where each point in time maps to a point in a vector space whose dimensions are derived from numerous prosodic features of the local context. Point pairs that are close in this vector space are frequently similar, not only in terms of the dialog activities, but also in topic. Using proximity in this space as an indicator of similarity, we built support for a query-by-example function. Searchers were happy to use this function, and it provided value on a large testset. Prosody-based retrieval did not perform as well as word-based retrieval, but the two sources of information were often non-redundant and in combination they sometimes performed better than either separately.
(Show Context)

Citation Context

...mation. The specific feature set was chosen for simplicity of computation, for providing coverage of most of the prosodic aspects known to be most relevant for dialog, and for symmetry, as shown in Table 1. Surprisingly, the dimensions resulting from PCA turned out to be meaningful: each of the top couple dozen turned out to align with some known aspects of dialog, such as grounding, turn-taking, seeking and expressing sympathy, degrees of novelty and interest, topic shifts and closings, emphasis, explanations, humor versus regret, personal versus impersonal topics, and facts versus opinions (Ward and Vega, 2012; Ward, 2014). Table 2 shows our tentative interpretations of the top twenty dimensions. We therefore can refer to the space defined by these dimensions as “dialog-activity space.” In a dialog every point in time maps to a point in this 76-dimensional space. 3.2. Similarity and Proximity in this Space Proximity in this vector-space model can serve as a measure of similarity. Initially we used simple Euclidian distance, over all 76 dimensions. As a preliminary exploration, we selected a few seeds and examined what positions the model found as most similar. As 4 1 this speaker talking vs. other ...

Aspectual Properties of Conversational Activities

by Rebecca J. Passonneau, Boxuan Guan, Cho Ho Yeung, Yuan Du, Emma Conner
"... Segmentation of spoken discourse into distinct conversational activities has been applied to broadcast news, meetings, monologs, and two-party dialogs. This paper considers the aspectual properties of discourse segments, meaning how they transpire in time. Classifiers were con-structed to distinguis ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
Segmentation of spoken discourse into distinct conversational activities has been applied to broadcast news, meetings, monologs, and two-party dialogs. This paper considers the aspectual properties of discourse segments, meaning how they transpire in time. Classifiers were con-structed to distinguish between segment boundaries and non-boundaries, where the sizes of utterance spans to represent data instances were varied, and the locations of segment boundaries relative to these in-stances. Classifier performance was better for representations that included the end of one discourse segment combined with the beginning of the next. In addition, classi-fication accuracy was better for segments in which speakers accomplish goals with distinctive start and end points. 1
(Show Context)

Citation Context

... solely on acoustic information has been applied to importance prediction at a very fine granularity (Ward and Richart-Ruiz, 2013). Four basic classes of prosodic features derived from PCA were used (=-=Ward and Vega, 2012-=-): volume, pitch height, pitch range and speaking rate cross various widths of time intervals. The data was labeled by annotators using an importance scale of 1 to 5, and linear regression was used to...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University