Results 1 - 10
of
10
Stability and accuracy in incremental speech recognition.
- In Proceedings of the SIGdial
, 2011
"... Abstract Conventional speech recognition approaches usually wait until the user has finished talking before returning a recognition hypothesis. This results in spoken dialogue systems that are unable to react while the user is still speaking. Incremental Speech Recognition (ISR), where partial phra ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
Abstract Conventional speech recognition approaches usually wait until the user has finished talking before returning a recognition hypothesis. This results in spoken dialogue systems that are unable to react while the user is still speaking. Incremental Speech Recognition (ISR), where partial phrase results are returned during user speech, has been used to create more reactive systems. However, ISR output is unstable and so prone to revision as more speech is decoded. This paper tackles the problem of stability in ISR. We first present a method that increases the stability and accuracy of ISR output, without adding delay. Given that some revisions are unavoidable, we next present a pair of methods for predicting the stability and accuracy of ISR results. Taken together, we believe these approaches give ISR more utility for real spoken dialogue systems.
Recognizing Authority in Dialogue with an Integer Linear Programming Constrained Model
"... We present a novel computational formulation of speaker authority in discourse. This notion, which focuses on how speakers position themselves relative to each other in discourse, is first developed into a reliable coding scheme (0.71 agreement between human annotators). We also provide a computatio ..."
Abstract
-
Cited by 10 (6 self)
- Add to MetaCart
(Show Context)
We present a novel computational formulation of speaker authority in discourse. This notion, which focuses on how speakers position themselves relative to each other in discourse, is first developed into a reliable coding scheme (0.71 agreement between human annotators). We also provide a computational model for automatically annotating text using this coding scheme, using supervised learning enhanced by constraints implemented with Integer Linear Programming. We show that this constrained model’s analyses of speaker authority correlates very strongly with expert human judgments (r 2 coefficient of 0.947). 1
Decisions about Turns in Multiparty Conversation: From Perception to Action
"... We present a decision-theoretic approach for guiding turn taking in a spoken dialog system operating in multiparty settings. The proposed methodology couples inferences about multiparty conversational dynamics with assessed costs of different outcomes, to guide turn-taking decisions. Beyond consider ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
(Show Context)
We present a decision-theoretic approach for guiding turn taking in a spoken dialog system operating in multiparty settings. The proposed methodology couples inferences about multiparty conversational dynamics with assessed costs of different outcomes, to guide turn-taking decisions. Beyond considering uncertainties about outcomes arising from evidential reasoning about the state of a conversation, we endow the system with awareness and methods for handling uncertainties stemming from computational delays in its own perception and production. We illustrate via sample cases how the proposed approach makes decisions, and we investigate the behaviors of the proposed methods via a retrospective analysis on logs collected in a multiparty interaction study.
Continuously predicting and processing barge-in during a live spoken dialogue task
- In Proc. of SIGDIAL
, 2013
"... Abstract Barge-in enables the user to provide input during system speech, facilitating a more natural and efficient interaction. Standard methods generally focus on singlestage barge-in detection, applying the dialogue policy irrespective of the barge-in context. Unfortunately, this approach perfor ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
Abstract Barge-in enables the user to provide input during system speech, facilitating a more natural and efficient interaction. Standard methods generally focus on singlestage barge-in detection, applying the dialogue policy irrespective of the barge-in context. Unfortunately, this approach performs poorly when used in challenging environments. We propose and evaluate a barge-in processing method that uses a prediction strategy to continuously decide whether to pause, continue, or resume the prompt. This model has greater task success and efficiency than the standard approach when evaluated in a public spoken dialogue system.
Inverse Reinforcement Learning for Micro-Turn Management
"... Existing spoken dialogue systems are typically not de-signed to provide natural interaction since they impose a strict turn-taking regime in which a dialogue consists of interleaved system and user turns. To allow more responsive and natural interaction, this paper describes a system in which turn-t ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
Existing spoken dialogue systems are typically not de-signed to provide natural interaction since they impose a strict turn-taking regime in which a dialogue consists of interleaved system and user turns. To allow more responsive and natural interaction, this paper describes a system in which turn-taking decisions are taken at a more fine-grained micro-turn level. A decision-theoretic approach is then applied to optimise turn-taking control. Inverse reinforcement learning is used to cap-ture the complex but natural behaviours from human-human di-alogues and optimise interaction without specifying a reward function manually. Using a corpus of human-human interac-tion, experiments show that IRL is able to learn an effective reward function which outperforms a comparable handcrafted policy. Index Terms: dialogue management, spoken dialogue systems, inverse reinforcement learning, Markov decision processes
Learning Turn, Attention, and Utterance Decisions in a Negotiative Slot-Filling Domain
, 2011
"... Abstract—Mixed-Initiative dialogue systems must be effective and natural, and both turn-taking and attention play an important role in meeting these goals. We present the Tau Architecture which separates Turn, Attention, and Utterance decisions, and uses Reinforcement Learning to jointly optimize th ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract—Mixed-Initiative dialogue systems must be effective and natural, and both turn-taking and attention play an important role in meeting these goals. We present the Tau Architecture which separates Turn, Attention, and Utterance decisions, and uses Reinforcement Learning to jointly optimize them. The development of sophisticated dialogue managers using Reinforcement Learning requires a simulation domain before any human evaluation, and we describe the Negotiative Slot-Filling domain. This domain is a closer approximation to true mixedinitiative dialogue than any previously used to train dialogue managers. We then detail the Tau implementation in the domain, and demonstrate both. I.
Is it really worth it? Cost-based selection of system responses
"... to speech-in-overlap ..."
(Show Context)
Dialog Goes Pervasive Until recently, many dialog
"... systems were information retrieval systems. For example, using a telephone-based interactive response system a US-based user can find flights from United (1-800-UNITED-1), get movie schedules (1-800-777-FILM), or get bus information (Black et al., 2011). These systems save companies money and help u ..."
Abstract
- Add to MetaCart
(Show Context)
systems were information retrieval systems. For example, using a telephone-based interactive response system a US-based user can find flights from United (1-800-UNITED-1), get movie schedules (1-800-777-FILM), or get bus information (Black et al., 2011). These systems save companies money and help users access information 24/7. However, the interaction between user and system is tightly constrained. For the most part, each system only deals with one domain, so the task models are typically flat slot-filling models (Allen et al., 2001b). Also, the dialogs are very structured, with system initiative and short user responses, giving limited scope to study important phenomena such as coreference. Smart phones and other mobile devices make possible pervasive human-computer spoken dialog. For example, the Vlingo system lets users do web searches (information retrieval), but also connects calls, opens other apps, and permits voice dictation of emails or social media updates 1. Siri can also help users make reservations and schedule meetings 2. These new dialog systems are different from traditional ones in several ways; they are multi-task, asynchronous, can involve rich context modeling, and have side effects in the “real world”: Multi-task – The system interacts with the user to accomplish a series of (possibly related) tasks. For example, a user might use the system to order a book and then say schedule it for book club- a different task (e.g. requiring different backend DB lookups) but related to the previous one by the book informa-1 www.vlingo.com
Open Dialogue Management for Relational Databases
"... We present open dialogue management and its application to relational databases. An open dialogue manager generates dialogue states, actions, and default strategies from the semantics of its application domain. We define three open dialogue management tasks. First, vocabulary selection finds the int ..."
Abstract
- Add to MetaCart
We present open dialogue management and its application to relational databases. An open dialogue manager generates dialogue states, actions, and default strategies from the semantics of its application domain. We define three open dialogue management tasks. First, vocabulary selection finds the intelligible attributes in each database table. Second, focus discovery selects candidate dialogue foci, tables that have the most potential to address basic user goals. Third, a focus agent is instantiated for each dialogue focus with a default dialogue strategy governed by efficiency. We demonstrate the portability of open dialogue management on three very different databases. Evaluation of our system with simulated users shows that users with realistically limited domain knowledge have dialogues nearly as efficient as those of users with complete domain knowledge. 1
Turn-Taking Cues in a Human Tutoring Corpus
"... Most spoken dialogue systems are still lacking in their ability to accurately model the complex process that is human turntaking. This research analyzes a humanhuman tutoring corpus in order to identify prosodic turn-taking cues, with the hopes that they can be used by intelligent tutoring systems t ..."
Abstract
- Add to MetaCart
Most spoken dialogue systems are still lacking in their ability to accurately model the complex process that is human turntaking. This research analyzes a humanhuman tutoring corpus in order to identify prosodic turn-taking cues, with the hopes that they can be used by intelligent tutoring systems to predict student turn boundaries. Results show that while there was variation between subjects, three features were significant turn-yielding cues overall. In addition, a positive relationship between the number of cues present and the probability of a turn yield was demonstrated. 1