Results 1 -
3 of
3
U N I V E R S
"... This thesis focuses on the problem of scalable optimization of dialogue behaviour in speech-based conversational systems using reinforcement learning. Most previous investigations in dialogue strategy learning have proposed flat reinforcement learning methods, which are more suitable for small-scale ..."
Abstract
- Add to MetaCart
This thesis focuses on the problem of scalable optimization of dialogue behaviour in speech-based conversational systems using reinforcement learning. Most previous investigations in dialogue strategy learning have proposed flat reinforcement learning methods, which are more suitable for small-scale spoken dialogue systems. This research formulates the problem in terms of Semi-Markov Decision Processes (SMDPs), and proposes two hierarchical reinforcement learning methods to optimize sub-dialogues rather than full dialogues. The first method uses a hierarchy of SMDPs, where every SMDP ignores irrelevant state variables and actions in order to optimize a sub-dialogue. The second method extends the first one by constraining every SMDP in the hierarchy with prior expert knowledge. The latter method proposes a learning algorithm called ‘HAM+HSMQ-Learning’, which combines two existing algorithms in the literature of hierarchical reinforcement learning. Whilst the first method generates fully-learnt behaviour, the second one generates semi-learnt behaviour. In addition, this research proposes a heuristic dialogue simulation environment for automatic dialogue strategy learning. Experiments were performed on simulated and real environments
Hierarchical Reinforcement Learning for Spoken . . .
, 2009
"... This thesis focuses on the problem of scalable optimization of dialogue behaviour in speech-based conversational systems using reinforcement learning. Most previous investigations in dialogue strategy learning have proposed flat reinforcement learning methods, which are more suitable for small-scale ..."
Abstract
- Add to MetaCart
This thesis focuses on the problem of scalable optimization of dialogue behaviour in speech-based conversational systems using reinforcement learning. Most previous investigations in dialogue strategy learning have proposed flat reinforcement learning methods, which are more suitable for small-scale spoken dialogue systems. This research formulates the problem in terms of Semi-Markov Decision Processes (SMDPs), and proposes two hierarchical reinforcement learning methods to optimize sub-dialogues rather than full dialogues. The first method uses a hierarchy of SMDPs, where every SMDP ignores irrelevant state variables and actions in order to optimize a sub-dialogue. The second method extends the first one by constraining every SMDP in the hierarchy with prior expert knowledge. The latter method proposes a learning algorithm called ‘HAM+HSMQ-Learning’, which combines two existing algorithms in the literature of hierarchical reinforcement learning. Whilst the first method generates fully-learnt behaviour, the second one generates semi-learnt behaviour. In addition, this research proposes a heuristic dialogue simulation environment for automatic dialogue strategy learning. Experiments were performed on simulated and real environments
Department of Computer Science.
"... Application of reinforcement learning methods in the development of dialogue strategies that support robust and efficient human–computer interaction using spoken language is a growing research area. In spoken dialogue system, Markov Decision Processes (MDPs) provide a formal framework for making dia ..."
Abstract
- Add to MetaCart
Application of reinforcement learning methods in the development of dialogue strategies that support robust and efficient human–computer interaction using spoken language is a growing research area. In spoken dialogue system, Markov Decision Processes (MDPs) provide a formal framework for making dialogue management decisions for planning. This framework enables the system to learn the value of initiating an action from each possible state which in turn facilitates the maximization of the total reward. However, these MDP systems with large state-action spaces lead to intractable solution. The goal of this paper is, thus, to present a novel approximation method with sampling practice to compute an optimal solution to control dialogue strategy based on learning automata. Compared to other baseline reinforcement learning methods the proposed approach exhibits a better performance with regard to the learning speed, good exploration/exploitation in its update and robustness in the presence of uncertainty in the states obtained.

