Results 1 - 10
of
99
Partially observable markov decision processes with continuous observations for dialogue management
- Computer Speech and Language
, 2005
"... This work shows how a dialogue model can be represented as a Partially Observable Markov Decision Process (POMDP) with observations composed of a discrete and continuous component. The continuous component enables the model to directly incorporate a confidence score for automated planning. Using a t ..."
Abstract
-
Cited by 217 (52 self)
- Add to MetaCart
(Show Context)
This work shows how a dialogue model can be represented as a Partially Observable Markov Decision Process (POMDP) with observations composed of a discrete and continuous component. The continuous component enables the model to directly incorporate a confidence score for automated planning. Using a testbed simulated dialogue management problem, we show how recent optimization techniques are able to find a policy for this continuous POMDP which outperforms a traditional MDP approach. Further, we present a method for automatically improving handcrafted dialogue managers by incorporating POMDP belief state monitoring, including confidence score information. Experiments on the testbed system show significant improvements for several example handcrafted dialogue managers across a range of operating conditions. 1
Learning User Simulations for Information State Update Dialogue Systems
- in Eurospeech
, 2005
"... This paper describes and compares two methods for simulating user behaviour in spoken dialogue systems. User simulations are important for automatic dialogue strategy learning and the evaluation of competing strategies. Our methods are designed for use with "Information State Update" (ISU) ..."
Abstract
-
Cited by 68 (21 self)
- Add to MetaCart
(Show Context)
This paper describes and compares two methods for simulating user behaviour in spoken dialogue systems. User simulations are important for automatic dialogue strategy learning and the evaluation of competing strategies. Our methods are designed for use with "Information State Update" (ISU)-based dialogue systems. The first method is based on supervised learning using linear feature combination and a normalised exponential output function. The user is modelled as a stochastic process which selects user actions ( pairs) based on features of the current dialogue state, which encodes the whole history of the dialogue. The second method uses n-grams of speech act, task pairs, restricting the length of the history considered by the order of the n-gram. Both models were trained and evaluated on a subset of the COMMUNICATOR corpus, to which we added annotations for user actions and Information States. The model based on linear feature combination has a perplexity of 2.08 whereas the best n-gram (4-gram) has a perplexity of 3.58. Each one of the user models ran against a system policy trained on the same corpus with a method similar to the one used for our linear feature combination model. The quality of the simulated dialogues produced was then measured as a function of the filled slots, confirmed slots, and number of actions performed by the system in each dialogue. In this experiment both the linear feature combination model and the best n-grams (5-gram and 4-gram) produced similar quality simulated dialogues.
User simulation for spoken dialogue systems: Learning and evaluation
- in Interspeech/ICSLP
, 2006
"... We propose the “advanced ” n-grams as a new technique for simulating user behaviour in spoken dialogue systems, and we compare it with two methods used in our prior work, i.e. linear feature combination and “normal ” n-grams. All methods operate on the intention level and can incorporate speech reco ..."
Abstract
-
Cited by 53 (17 self)
- Add to MetaCart
(Show Context)
We propose the “advanced ” n-grams as a new technique for simulating user behaviour in spoken dialogue systems, and we compare it with two methods used in our prior work, i.e. linear feature combination and “normal ” n-grams. All methods operate on the intention level and can incorporate speech recognition and understanding errors. In the linear feature combination model user actions (lists of 〈 speech act, task 〉 pairs) are selected, based on features of the current dialogue state which encodes the whole history of the dialogue. The user simulation based on “normal ” n-grams treats a dialogue as a sequence of lists of 〈 speech act, task 〉 pairs. Here the length of the history considered is restricted by the order of the n-gram. The “advanced ” n-grams are a variation of the normal ngrams, where user actions are conditioned not only on speech acts and tasks but also on the current status of the tasks, i.e. whether
Learning the structure of task-driven human-human dialogs
- in Proceedings of ACL
, 2006
"... Data-driven techniques have been used for many computational linguistics tasks. Models derived from data are generally more robust than hand-crafted systems since they better reflect the distribution of the phenomena being modeled. With the availability of large corpora of spoken dialog, dialog mana ..."
Abstract
-
Cited by 49 (4 self)
- Add to MetaCart
(Show Context)
Data-driven techniques have been used for many computational linguistics tasks. Models derived from data are generally more robust than hand-crafted systems since they better reflect the distribution of the phenomena being modeled. With the availability of large corpora of spoken dialog, dialog management is now reaping the benefits of data-driven techniques. In this paper, we compare two approaches to modeling subtask structure in dialog: a chunk-based model of subdialog sequences, and a parse-based, or hierarchical, model. We evaluate these models using customer agent dialogs from a catalog service domain. 1
Effects of the User Model on Simulation-Based Learning of Dialogue Strategies
- In Proc. of ASRU
, 2005
"... Over the past decade, a variety of user models have been proposed for user simulation-based reinforcement-learning of dialogue strategies. However, the strategies learned with these models are rarely evaluated in actual user trials and it remains unclear how the choice of user model affects the qual ..."
Abstract
-
Cited by 48 (9 self)
- Add to MetaCart
(Show Context)
Over the past decade, a variety of user models have been proposed for user simulation-based reinforcement-learning of dialogue strategies. However, the strategies learned with these models are rarely evaluated in actual user trials and it remains unclear how the choice of user model affects the quality of the learned strategy. In particular, the degree to which strategies learned with a user model generalise to real user populations has not be investigated. This paper presents a series of experiments that qualitatively and quantitatively examine the effect of the user model on the learned strategy. Our results show that the performance and characteristics of the strategy are in fact highly dependent on the user model. Furthermore, a policy trained with a poor user model may appear to perform well when tested with the same model, but fail when tested with a more sophisticated user model. This raises significant doubts about the current practice of learning and evaluating strategies with the same user model. The paper further investigates a new technique for testing and comparing strategies directly on real human-machine dialogues, thereby avoiding any evaluation bias introduced by the user model. 1.
An ISU Dialogue System Exhibiting Reinforcement Learning of Dialogue Policies: Generic Slot-filling in the TALK In-car System
- In Proceedings of EACL
, 2006
"... We demonstrate a multimodal dialogue system using reinforcement learning for in-car scenarios, developed at Edinburgh University and Cambridge University for the TALK project . ..."
Abstract
-
Cited by 46 (13 self)
- Add to MetaCart
(Show Context)
We demonstrate a multimodal dialogue system using reinforcement learning for in-car scenarios, developed at Edinburgh University and Cambridge University for the TALK project .
Learning more effective dialogue strategies using limited dialogue move features
- in Proceedings of ACL
, 2006
"... We explore the use of restricted dialogue contexts in reinforcement learning (RL) of effective dialogue strategies for information seeking spoken dialogue systems (e.g. COMMUNICATOR (Walker et al., 2001)). The contexts we use are richer than previous research in this area, e.g. (Levin and Pieraccini ..."
Abstract
-
Cited by 26 (6 self)
- Add to MetaCart
(Show Context)
We explore the use of restricted dialogue contexts in reinforcement learning (RL) of effective dialogue strategies for information seeking spoken dialogue systems (e.g. COMMUNICATOR (Walker et al., 2001)). The contexts we use are richer than previous research in this area, e.g. (Levin and Pieraccini, 1997; Scheffler and Young, 2001; Singh et al., 2002; Pietquin, 2004), which use only slot-based information, but are much less complex than the full dialogue “Information States ” explored in (Henderson et al., 2005), for which tractabe learning is an issue. We explore how incrementally adding richer features allows learning of more effective dialogue strategies. We use 2 user simulations learned from COMMUNICATOR data (Walker et al., 2001; Georgila et al., 2005b) to explore the effects of different features on learned dialogue strategies. Our results show that adding the dialogue moves of the last system and user turns increases the average reward of the automatically learned strategies by 65.9 % over the original (hand-coded) COMMUNI-CATOR systems, and by 7.8 % over a baseline RL policy that uses only slot-status features. We show that the learned strategies exhibit an emergent “focus switching” strategy and effective use of the ‘give help ’ action. 1
Automatic annotation of COMMUNICATOR dialogue data for learnin dialogue strategies and user simulationsg
- In Ninth Workshop on the Semantics and Pragmatics of Dialogue (SEMDIAL: DIALOR
, 2005
"... We present and evaluate an automatic annotation system which builds "Information State Update" (ISU) representations of dialogue context for the COMMUNICATOR (2000 and 2001) corpora of humanmachine dialogues (approx 2300 dialogues) . The purposes of this annotation are to generate ..."
Abstract
-
Cited by 26 (15 self)
- Add to MetaCart
(Show Context)
We present and evaluate an automatic annotation system which builds "Information State Update" (ISU) representations of dialogue context for the COMMUNICATOR (2000 and 2001) corpora of humanmachine dialogues (approx 2300 dialogues) . The purposes of this annotation are to generate training data for reinforcement learning (RL) of dialogue policies, to generate data for building user simulations, and to evaluate different dialogue strategies against a baseline.
Learning effective multimodal dialogue strategies from Wizard-Of-Oz data: bootstrapping and evaluation
"... We address two problems in the field of automatic optimization of dialogue strategies: learning effective dialogue strategies when no initial data or system exists, and evaluating the result with real users. We use Reinforcement Learning (RL) to learn multimodal dialogue strategies by interaction wi ..."
Abstract
-
Cited by 21 (6 self)
- Add to MetaCart
(Show Context)
We address two problems in the field of automatic optimization of dialogue strategies: learning effective dialogue strategies when no initial data or system exists, and evaluating the result with real users. We use Reinforcement Learning (RL) to learn multimodal dialogue strategies by interaction with a simulated environment which is “bootstrapped ” from small amounts of Wizard-of-Oz (WOZ) data. This use of WOZ data allows development of optimal strategies for domains where no working prototype is available. We compare the RL-based strategy against a supervised strategy which mimics the wizards ’ policies. This comparison allows us to measure relative improvement over the training data. Our results show that RL significantly outperforms Supervised Learning when interacting in simulation as well as for interactions with real users. The RL-based policy gains on average 50-times more reward when tested in simulation, and almost 18-times more reward when interacting with real users. Users also subjectively rate the RL-based policy on average 10 % higher. 1
Reinforcement Learning for Dialog Management using Least-Squares Policy Iteration and Fast Feature Selection
"... Reinforcement learning (RL) is a promising technique for creating a dialog manager. RL accepts features of the current dialog state and seeks to find the best action given those features. Although it is often easy to posit a large set of potentially useful features, in practice, it is difficult to f ..."
Abstract
-
Cited by 21 (1 self)
- Add to MetaCart
(Show Context)
Reinforcement learning (RL) is a promising technique for creating a dialog manager. RL accepts features of the current dialog state and seeks to find the best action given those features. Although it is often easy to posit a large set of potentially useful features, in practice, it is difficult to find the subset which is large enough to contain useful information yet compact enough to reliably learn a good policy. In this paper, we propose a method for RL optimization which automatically performs feature selection. The algorithm is based on least-squares policy iteration, a state-of-the-art RL algorithm which is highly sampleefficient and can learn from a static corpus or on-line. Experiments in dialog simulation show it is more stable than a baseline RL algorithm taken from a working dialog system.