Results 1  10
of
57
Partially observable markov decision processes with continuous observations for dialogue management
 Computer Speech and Language
, 2005
"... This work shows how a dialogue model can be represented as a Partially Observable Markov Decision Process (POMDP) with observations composed of a discrete and continuous component. The continuous component enables the model to directly incorporate a confidence score for automated planning. Using a t ..."
Abstract

Cited by 160 (43 self)
 Add to MetaCart
(Show Context)
This work shows how a dialogue model can be represented as a Partially Observable Markov Decision Process (POMDP) with observations composed of a discrete and continuous component. The continuous component enables the model to directly incorporate a confidence score for automated planning. Using a testbed simulated dialogue management problem, we show how recent optimization techniques are able to find a policy for this continuous POMDP which outperforms a traditional MDP approach. Further, we present a method for automatically improving handcrafted dialogue managers by incorporating POMDP belief state monitoring, including confidence score information. Experiments on the testbed system show significant improvements for several example handcrafted dialogue managers across a range of operating conditions. 1
Dynamic Programming for Partially Observable Stochastic Games
 IN PROCEEDINGS OF THE NINETEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE
, 2004
"... We develop an exact dynamic programming algorithm for partially observable stochastic games (POSGs). The algorithm is a synthesis of dynamic programming for partially observable Markov decision processes (POMDPs) and iterated elimination of dominated strategies in normal form games. ..."
Abstract

Cited by 123 (23 self)
 Add to MetaCart
(Show Context)
We develop an exact dynamic programming algorithm for partially observable stochastic games (POSGs). The algorithm is a synthesis of dynamic programming for partially observable Markov decision processes (POMDPs) and iterated elimination of dominated strategies in normal form games.
Exploiting Structure to Efficiently Solve Large Scale Partially Observable Markov Decision Processes
, 2005
"... Partially observable Markov decision processes (POMDPs) provide a natural and principled framework to model a wide range of sequential decision making problems under uncertainty. To date, the use of POMDPs in realworld problems has been limited by the poor scalability of existing solution algorithm ..."
Abstract

Cited by 67 (6 self)
 Add to MetaCart
Partially observable Markov decision processes (POMDPs) provide a natural and principled framework to model a wide range of sequential decision making problems under uncertainty. To date, the use of POMDPs in realworld problems has been limited by the poor scalability of existing solution algorithms, which can only solve problems with up to ten thousand states. In fact, the complexity of finding an optimal policy for a finitehorizon discrete POMDP is PSPACEcomplete. In practice, two important sources of intractability plague most solution algorithms: large policy spaces and large state spaces. On the other hand,
Finding Approximate POMDP Solutions Through Belief Compression
, 2003
"... Standard value function approaches to finding policies for Partially Observable Markov Decision Processes (POMDPs) are generally considered to be intractable for large models. The intractability of these algorithms is to a large extent a consequence of computing an exact, optimal policy over the ent ..."
Abstract

Cited by 66 (2 self)
 Add to MetaCart
Standard value function approaches to finding policies for Partially Observable Markov Decision Processes (POMDPs) are generally considered to be intractable for large models. The intractability of these algorithms is to a large extent a consequence of computing an exact, optimal policy over the entire belief space. However, in realworld POMDP problems, computing the optimal policy for the full belief space is often unnecessary for good control even for problems with complicated policy classes. The beliefs experienced by the controller often lie near a structured, lowdimensional manifold embedded in the highdimensional belief space. Finding a good approximation to the optimal value function for only this manifold can be much easier than computing the full value function. We introduce a new method for solving largescale POMDPs by reducing the dimensionality of the belief space. We use Exponential family Principal Components Analysis (Collins, Dasgupta, & Schapire, 2002) to represent sparse, highdimensional belief spaces using lowdimensional sets of learned features of the belief state. We then plan only in terms of the lowdimensional belief features. By planning in this lowdimensional space, we can find policies for POMDP models that are orders of magnitude larger than models that can be handled by conventional techniques. We demonstrate the use of this algorithm on a synthetic problem and on mobile robot navigation tasks. 1.
Contingent Planning Under Uncertainty via Stochastic Satisfiability
 Artificial Intelligence
, 1999
"... We describe two new probabilistic planning techniques cmaxplan and zanderthat generate contingent plans in probabilistic propositional domains. Both operate by transforming the planning problem into a stochastic satisfiability problem and solving that problem instead. cmaxplan encodes t ..."
Abstract

Cited by 63 (10 self)
 Add to MetaCart
(Show Context)
We describe two new probabilistic planning techniques cmaxplan and zanderthat generate contingent plans in probabilistic propositional domains. Both operate by transforming the planning problem into a stochastic satisfiability problem and solving that problem instead. cmaxplan encodes the problem as an EMajsat instance, while zander encodes the problem as an SSat instance. Although SSat problems are in a higher complexity class than EMajsat problems, the problem encodings produced by zander are substantially more compact and appear to be easier to solve than the corresponding EMajsat encodings. Preliminary results for zander indicate that it is competitive with existing planners on a variety of problems. Introduction When planning under uncertainty, any information about the state of the world is precious. A contingent plan is one that can make action choices contingent on such information. In this paper, we present an implemented framework for contingent pl...
VDCBPI: an Approximate Scalable Algorithm for Large POMDPs
"... Existing algorithms for discrete partially observable Markov decision processes can at best solve problems of a few thousand states due to two important sources of intractability: the curse of dimensionality and the policy space complexity. This paper describes a new algorithm (VDCBPI) that miti ..."
Abstract

Cited by 43 (5 self)
 Add to MetaCart
Existing algorithms for discrete partially observable Markov decision processes can at best solve problems of a few thousand states due to two important sources of intractability: the curse of dimensionality and the policy space complexity. This paper describes a new algorithm (VDCBPI) that mitigates both sources of intractability by combining the Value Directed Compression (VDC) technique [13] with Bounded Policy Iteration (BPI) [14]. The scalability of VDCBPI is demonstrated on synthetic network management problems with up to 33 million states.
Nonapproximability Results for Partially Observable Markov Decision Processes
, 2000
"... We show that for several variations of partially observable Markov decision processes, polynomialtime algorithms for nding control policies are unlikely to or simply don't have guarantees of nding policies within a constant factor or a constant summand of optimal. Here "unlikely" ..."
Abstract

Cited by 35 (0 self)
 Add to MetaCart
We show that for several variations of partially observable Markov decision processes, polynomialtime algorithms for nding control policies are unlikely to or simply don't have guarantees of nding policies within a constant factor or a constant summand of optimal. Here "unlikely" means \unless some complexity classes collapse," where the collapses considered are P = NP, P = PSPACE, or P = EXP. Until or unless these collapses are shown to hold, any controlpolicy designer must choose between such performance guarantees and ecient computation.
Synthesis of hierarchical finitestate controllers for POMDPs
 In Proceedings of ICAPS
, 2003
"... We develop a hierarchical approach to planning for partially observable Markov decision processes (POMDPs) in which a policy is represented as a hierarchical finitestate controller. To provide a foundation for this approach, we discuss some extensions of the POMDP framework that allow us to formali ..."
Abstract

Cited by 29 (1 self)
 Add to MetaCart
We develop a hierarchical approach to planning for partially observable Markov decision processes (POMDPs) in which a policy is represented as a hierarchical finitestate controller. To provide a foundation for this approach, we discuss some extensions of the POMDP framework that allow us to formalize the process of abstraction by which a hierarchical controller is constructed. We describe a planning algorithm that uses a programmerdefined task hierarchy to constrain the search space of finitestate controllers, and prove that this algorithm converges to a hierarchical finitestate controller that is εoptimal in a limited but welldefined sense, related to the concept of recursive optimality.
Exploiting locality of interaction in factored DecPOMDPs
 In Proc. Int. Joint Conf. Autonomous Agents and Multi Agent Systems
, 2008
"... Decentralized partially observable Markov decision processes (DecPOMDPs) constitute an expressive framework for multiagent planning under uncertainty, but solving them is provably intractable. We demonstrate how their scalability can be improved by exploiting locality of interaction between agents ..."
Abstract

Cited by 29 (11 self)
 Add to MetaCart
(Show Context)
Decentralized partially observable Markov decision processes (DecPOMDPs) constitute an expressive framework for multiagent planning under uncertainty, but solving them is provably intractable. We demonstrate how their scalability can be improved by exploiting locality of interaction between agents in a factored representation. Factored DecPOMDP representations have been proposed before, but only for DecPOMDPs whose transition and observation models are fully independent. Such strong assumptions simplify the planning problem, but result in models with limited applicability. By contrast, we consider general factored DecPOMDPs for which we analyze the model dependencies over space (locality of interaction) and time (horizon of the problem). We also present a formulation of decomposable value functions. Together, our results allow us to exploit the problem structure as well as heuristics in a single framework that is based on collaborative graphical Bayesian games (CGBGs). A preliminary experiment shows a speedup of two orders of magnitude.
Solving Factored POMDPs with Linear Value Functions
 In IJCAI01 workshop on Planning under Uncertainty and Incomplete Information
, 2001
"... Partially Observable Markov Decision Processes (POMDPs) provide a coherent mathematical framework for planning under uncertainty when the state of the system cannot be fully observed. However, the problem of finding an exact POMDP solution is intractable. ..."
Abstract

Cited by 26 (3 self)
 Add to MetaCart
Partially Observable Markov Decision Processes (POMDPs) provide a coherent mathematical framework for planning under uncertainty when the state of the system cannot be fully observed. However, the problem of finding an exact POMDP solution is intractable.