DMCA
Hybrid Reinforcement/Supervised Learning for Dialogue Policies from COMMUNICATOR data (2005)
Cached
Download Links
Venue: | In IJCAI workshop on Knowledge and Reasoning in Practical Dialogue Systems |
Citations: | 99 - 26 self |
Citations
1232 | Reinforcement Learning - Sutton, Barto - 1998 |
222 | The Open Agent Architecture
- Cheyer, Martin
- 2001
(Show Context)
Citation Context ...1,683 dialogues, and 125,388 total states, two thirds of which result from system actions and one third from user actions. The annotation system is implemented using DIPPER (Bos et al. 2003) and OAA (=-=Cheyer and Martin 2001-=-), using several OAA agents (see Georgila, Lemon, and Henderson, 2005, and Georgila et al., submitted, for more details). Following the ISU approach, we represented states using Information States, wh... |
217 | Partially observable markov decision processes for spoken dialog systems. Computer Speech and Language, 21:231–422. Appendix A: Evaluation questionnaire - Williams, Young - 2007 |
210 | A stochastic model of human-machine interaction for learning dialog strategies, - Levin, Pieraccini, et al. - 2000 |
180 | and Nello Cristianini. Kernel Methods for Pattern Analysis - Shawe-Taylor - 2004 |
163 | Optimizing dialogue management with reinforcement leaning: experiments with the NJFun system.
- Singh, Litman, et al.
- 2002
(Show Context)
Citation Context ...ns and limited sets of actions to choose among (Walker, Fromer, and Narayanan 1998; Goddeau and Pineau 2000; Levin, Pieraccini, and Eckert 2000; Roy, Pineau, and Thrun 2000; Scheffler and Young 2002; =-=Singh et al. 2002-=-; Williams and Young 2005; Williams, Poupart, and Young 2005a). Much of the prior work in RL for dialogue management focuses on the problem of choosing among a particular limited set of actions (e.g.,... |
126 | Towards developing general models of usability with PARADISE. - Walker, Kamm, et al. - 2000 |
104 | Reinforcement learning for spoken dialogue systems. - Singh, Kearns, et al. - 1999 |
74 | A stochastic model of computer-human interaction for learning dialogue strategies,”
- Levin, Pieraccini
- 1997
(Show Context)
Citation Context ...formation States is shown in Figure 1, including filled slots, confirmed slots, and previous speech acts. Previous work has raised the question of whether dialogue management policies can be learned (=-=Levin and Pieraccini 1997-=-) for systems that have only a limited view of the dialogue context, for example, not including prior speech act history (see the following). One prominent representation of the set of possible system... |
73 | DIPPER: Description and Formalisation of an Information-State Update Dialogue System Architecture.
- Bos, Klein, et al.
- 2003
(Show Context)
Citation Context ...1 data has eight systems, 1,683 dialogues, and 125,388 total states, two thirds of which result from system actions and one third from user actions. The annotation system is implemented using DIPPER (=-=Bos et al. 2003-=-) and OAA (Cheyer and Martin 2001), using several OAA agents (see Georgila, Lemon, and Henderson, 2005, and Georgila et al., submitted, for more details). Following the ISU approach, we represented st... |
70 | A probabilistic framework for dialog simulation and optimal strategy learning, - Pietquin, Dutoit - 2006 |
68 | Learning User Simulations for Information State Update Dialogue Systems, - Georgila, Henderson, et al. - 2005 |
66 | Learning optimal dialogue strategies: a case study of a spoken dialogue agent for email. In: - Walker, Fromer, et al. - 1998 |
64 | S.: Quantitative Evaluation of User Simulation Techniques for Spoken Dialogue Systems. 6th SIGdial,
- Schatzmann, Georgila, et al.
- 2005
(Show Context)
Citation Context ...construct and then evaluate simulated users are open problems. Clearly there is a dependency between the accuracy of the simulation used for training and the eventual dialogue policy that is learned (=-=Schatzmann et al. 2005-=-). Current research attempts to develop metrics for user simulation that are predictive of the overall quality of the final learned dialogue policy (Schatzmann, Georgila, and Young 2005; Schatzmann et... |
58 | Quantitative and qualitative evaluation of DARPA Communicator spoken dialogue systems. In:
- Walker, Passonneau, et al.
- 2001
(Show Context)
Citation Context ...he COMMUNICATOR Domain and Data Annotation To empirically evaluate our proposed learning method, we apply it to the COMMUNICATOR domain using the COMMUNICATOR corpora. The COMMUNICATOR corpora (2000 [=-=Walker et al. 2001-=-] and 2001 [Walker et al. 2002b]) consist of human–machine dialogues (approximately 2,300 dialogues in total). The users always try to book a flight, but they may also try to select a hotel or car ren... |
55 | A Framework for Unsupervised Learning of Dialogue Strategies,
- Pietquin
- 2004
(Show Context)
Citation Context ...cheffler and Young 2002; Williams, Poupart, and Young 2005a, 2005b; Williams and Young 2005; Pietquin and Dutoit 2006b), with perhaps some additional low-level information (such as acoustic features [=-=Pietquin 2004-=-]). Only recently have researchers experimented with using enriched representations of dialogue context (Gabsdil and Lemon 2004; Lemon et al. 2005; Frampton and Lemon 2006; Rieser and Lemon 2006c), as... |
53 | User simulation for spoken dialogue systems: Learning and evaluation. - Georgila, Henderson, et al. - 2006 |
50 | Automatic Learning of Dialogue Strategy using Dialogue Simulation and Reinforcement Learning - Scheffler, Young - 2002 |
48 | Effects of the User Model on Simulation-based Learning of Dialogue Strategies
- Schatzmann, Weilhammer, et al.
- 2005
(Show Context)
Citation Context ...construct and then evaluate simulated users are open problems. Clearly there is a dependency between the accuracy of the simulation used for training and the eventual dialogue policy that is learned (=-=Schatzmann et al. 2005-=-). Current research attempts to develop metrics for user simulation that are predictive of the overall quality of the final learned dialogue policy (Schatzmann, Georgila, and Young 2005; Schatzmann et... |
46 | An isu dialogue system exhibiting reinforcement learning of dialogue policies: generic slot-filling in the talk in-car system - Lemon, Georgila, et al. - 2006 |
43 |
Information state and dialogue management in the T
- Larsson, Traum
- 2000
(Show Context)
Citation Context ...d how can we encode the task in a way which is appropriate for these methods? For the latter challenge, we exploit the Information State Update (ISU) approach to dialogue systems (Bohlin et al. 1999; =-=Larsson and Traum 2000-=-), which provides the kind of rich and flexible feature-based representations of context that are used with many recent machine learning methods, including the linear function approximation method we ... |
40 | Darpa communicator dialog travel planning systems: The june 2000 data collection - Walker, Aberdeen, et al. - 2001 |
32 | Evaluating effectiveness and portability of reinforcement learned dialogue strategies with real users: The TALK TownInfo evaluation. - Lemon, Georgila, et al. - 2006 |
32 | Spoken dialog management for robots,” - Roy, Pineau, et al. - 2000 |
28 | DARPA Communicator: cross-system results for the 2001 evaluation. In: - Walker, Rudnicky, et al. - 2002 |
26 | Automatic annotation of COMMUNICATOR dialogue data for learning dialogue strategies and user simulations
- Georgila, Lemon, et al.
- 2005
(Show Context)
Citation Context ...f which result from system actions and one third from user actions. The annotation system is implemented using DIPPER (Bos et al. 2003) and OAA (Cheyer and Martin 2001), using several OAA agents (see =-=Georgila, Lemon, and Henderson, 2005-=-, and Georgila et al., submitted, for more details). Following the ISU approach, we represented states using Information States, which are feature structures intended to record all the information abo... |
22 | DATE: A Dialogue Act Tagging Scheme for Evaluation of Spoken Dialogue Systems. - Walker, Passonneau - 2001 |
14 | Reinforcement Learning of Dialogue Strategies Using The User’s Last Dialogue Act
- Frampton, Lemon
(Show Context)
Citation Context ...a; Schatzmann et al. 2006; Williams 2007). Furthermore, several approaches use simple probabilistic simulations encoded by hand, using intuitions about reasonable user behaviors (e.g., Pietquin 2004; =-=Frampton and Lemon 2005-=-; Pietquin and Dutoit 2006a), whereas other work (e.g., Scheffler and Young 2001, 2002; Georgila, Henderson, and Lemon 2005; Georgila, Henderson, and Lemon 2006; Rieser and Lemon 2006a) builds simulat... |
14 | Using Wizard-of-Oz simulations to bootstrap Reinforcement- Learning based dialog management systems - Williams, Young - 2003 |
12 | Fast reinforcement learning of dialog strategies
- Goddeau, Pineau
- 2000
(Show Context)
Citation Context ...nd Walker, Fromer, and Narayanan (1998). Previous work has been restricted to limited dialogue context representations and limited sets of actions to choose among (Walker, Fromer, and Narayanan 1998; =-=Goddeau and Pineau 2000-=-; Levin, Pieraccini, and Eckert 2000; Roy, Pineau, and Thrun 2000; Scheffler and Young 2002; Singh et al. 2002; Williams and Young 2005; Williams, Poupart, and Young 2005a). Much of the prior work in ... |
11 |
Deliverable D4.1: integration of learning and adaptivity with the ISU approach
- Lemon, Georgila, et al.
- 2005
(Show Context)
Citation Context ...itional low-level information (such as acoustic features [Pietquin 2004]). Only recently have researchers experimented with using enriched representations of dialogue context (Gabsdil and Lemon 2004; =-=Lemon et al. 2005-=-; Frampton and Lemon 2006; Rieser and Lemon 2006c), as we do in this article. From this work it is known that adding context features leads to better dialogue strategies, compared to, for example, sim... |
9 |
Showcase exhibiting reinforcement learning for dialogue strategies
- Lemon, Georgila, et al.
(Show Context)
Citation Context ...itional low-level information (such as acoustic features [Pietquin 2004]). Only recently have researchers experimented with using enriched representations of dialogue context (Gabsdil and Lemon 2004; =-=Lemon et al. 2005-=-; Frampton and Lemon 2006; Rieser and Lemon 2006c), as we do in this article. From this work it is known that adding context features leads to better dialogue strategies, compared to, for example, sim... |
7 | Kallirroi Georgila, “Hybrid Reinforcement/Supervised Learning for Dialogue Policies from COMMUNICATOR data - Henderson, Lemon - 2005 |
6 | Fast reinforcement learning of dialogue policies using stable function approximation - Denecke, Dohsaka, et al. - 2005 |
3 | Empirical Evaluation of a Reinforcement Learning Dialogue System - Singh, Kearns, et al. - 2000 |
1 |
Dynamic Bayesian networks for NLU simulation with application to dialog optimal strategy learning
- 2006a
(Show Context)
Citation Context ...is article performs better than a state-of-the-art hand-coded system in experiments with human users. The experiments were done using the “Town Information” multimodal dialogue system of Lemon et at. =-=(2006)-=- and Lemon, Georgila, and Stuttle (2005). The hybrid policy reported here (trained on the COMMUNICATOR data) was ported to this domain, and then evaluated with human subjects. The learned policy achie... |
1 |
Using machine learning to explore human multimodal clarification strategies
- 2006c
(Show Context)
Citation Context ...is article performs better than a state-of-the-art hand-coded system in experiments with human users. The experiments were done using the “Town Information” multimodal dialogue system of Lemon et at. =-=(2006)-=- and Lemon, Georgila, and Stuttle (2005). The hybrid policy reported here (trained on the COMMUNICATOR data) was ported to this domain, and then evaluated with human subjects. The learned policy achie... |
1 | Automatic learning of dialogue strategy using dialogue simulation and reinforcement learning - Shawe-Taylor, Cristianini - 2002 |
1 |
Factored partially observable Markov decision processes for dialogue management
- 2005a
(Show Context)
Citation Context ...n 2004). In contrast, we use function approximation to allow generalization to states that were not in the training data. Function approximation was also applied to RL by Denecke, Dohsaka, and Nakano =-=(2005)-=-, but they still use a relatively small state space (6 features, 972 possible states). They also only exploit data for the 50 most frequent states, using what is in effect a Gaussian kernel to compute... |