Results 1 - 10
of
34
Dynamic Programming for Partially Observable Stochastic Games
- IN PROCEEDINGS OF THE NINETEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE
, 2004
"... We develop an exact dynamic programming algorithm for partially observable stochastic games (POSGs). The algorithm is a synthesis of dynamic programming for partially observable Markov decision processes (POMDPs) and iterated elimination of dominated strategies in normal form games. ..."
Abstract
-
Cited by 89 (18 self)
- Add to MetaCart
We develop an exact dynamic programming algorithm for partially observable stochastic games (POSGs). The algorithm is a synthesis of dynamic programming for partially observable Markov decision processes (POMDPs) and iterated elimination of dominated strategies in normal form games.
Partially observable markov decision processes with continuous observations for dialogue management
- Computer Speech and Language
, 2005
"... This work shows how a dialogue model can be represented as a Partially Observable Markov Decision Process (POMDP) with observations composed of a discrete and continuous component. The continuous component enables the model to directly incorporate a confidence score for automated planning. Using a t ..."
Abstract
-
Cited by 79 (24 self)
- Add to MetaCart
This work shows how a dialogue model can be represented as a Partially Observable Markov Decision Process (POMDP) with observations composed of a discrete and continuous component. The continuous component enables the model to directly incorporate a confidence score for automated planning. Using a testbed simulated dialogue management problem, we show how recent optimization techniques are able to find a policy for this continuous POMDP which outperforms a traditional MDP approach. Further, we present a method for automatically improving handcrafted dialogue managers by incorporating POMDP belief state monitoring, including confidence score information. Experiments on the testbed system show significant improvements for several example handcrafted dialogue managers across a range of operating conditions. 1
Contingent Planning Under Uncertainty via Stochastic Satisfiability
- Artificial Intelligence
, 1999
"... We describe two new probabilistic planning techniques ---c-maxplan and zander---that generate contingent plans in probabilistic propositional domains. Both operate by transforming the planning problem into a stochastic satisfiability problem and solving that problem instead. c-maxplan encodes t ..."
Abstract
-
Cited by 49 (5 self)
- Add to MetaCart
We describe two new probabilistic planning techniques ---c-maxplan and zander---that generate contingent plans in probabilistic propositional domains. Both operate by transforming the planning problem into a stochastic satisfiability problem and solving that problem instead. c-maxplan encodes the problem as an E-Majsat instance, while zander encodes the problem as an S-Sat instance. Although S-Sat problems are in a higher complexity class than E-Majsat problems, the problem encodings produced by zander are substantially more compact and appear to be easier to solve than the corresponding E-Majsat encodings. Preliminary results for zander indicate that it is competitive with existing planners on a variety of problems. Introduction When planning under uncertainty, any information about the state of the world is precious. A contingent plan is one that can make action choices contingent on such information. In this paper, we present an implemented framework for contingent pl...
Finding Approximate POMDP Solutions Through Belief Compression
, 2003
"... Standard value function approaches to finding policies for Partially Observable Markov Decision Processes (POMDPs) are generally considered to be intractable for large models. The intractability of these algorithms is to a large extent a consequence of computing an exact, optimal policy over the ent ..."
Abstract
-
Cited by 46 (2 self)
- Add to MetaCart
Standard value function approaches to finding policies for Partially Observable Markov Decision Processes (POMDPs) are generally considered to be intractable for large models. The intractability of these algorithms is to a large extent a consequence of computing an exact, optimal policy over the entire belief space. However, in real-world POMDP problems, computing the optimal policy for the full belief space is often unnecessary for good control even for problems with complicated policy classes. The beliefs experienced by the controller often lie near a structured, low-dimensional manifold embedded in the high-dimensional belief space. Finding a good approximation to the optimal value function for only this manifold can be much easier than computing the full value function. We introduce a new method for solving large-scale POMDPs by reducing the dimensionality of the belief space. We use Exponential family Principal Components Analysis (Collins, Dasgupta, & Schapire, 2002) to represent sparse, high-dimensional belief spaces using low-dimensional sets of learned features of the belief state. We then plan only in terms of the low-dimensional belief features. By planning in this low-dimensional space, we can find policies for POMDP models that are orders of magnitude larger than models that can be handled by conventional techniques. We demonstrate the use of this algorithm on a synthetic problem and on mobile robot navigation tasks. 1.
Exploiting Structure to Efficiently Solve Large Scale Partially Observable Markov Decision Processes
, 2005
"... Partially observable Markov decision processes (POMDPs) provide a natural and principled framework to model a wide range of sequential decision making problems under uncertainty. To date, the use of POMDPs in real-world problems has been limited by the poor scalability of existing solution algorithm ..."
Abstract
-
Cited by 45 (4 self)
- Add to MetaCart
Partially observable Markov decision processes (POMDPs) provide a natural and principled framework to model a wide range of sequential decision making problems under uncertainty. To date, the use of POMDPs in real-world problems has been limited by the poor scalability of existing solution algorithms, which can only solve problems with up to ten thousand states. In fact, the complexity of finding an optimal policy for a finite-horizon discrete POMDP is PSPACE-complete. In practice, two important sources of intractability plague most solution algorithms: large policy spaces and large state spaces. On the other hand,
VDCBPI: an Approximate Scalable Algorithm for Large POMDPs
"... Existing algorithms for discrete partially observable Markov decision processes can at best solve problems of a few thousand states due to two important sources of intractability: the curse of dimensionality and the policy space complexity. This paper describes a new algorithm (VDCBPI) that miti ..."
Abstract
-
Cited by 36 (5 self)
- Add to MetaCart
Existing algorithms for discrete partially observable Markov decision processes can at best solve problems of a few thousand states due to two important sources of intractability: the curse of dimensionality and the policy space complexity. This paper describes a new algorithm (VDCBPI) that mitigates both sources of intractability by combining the Value Directed Compression (VDC) technique [13] with Bounded Policy Iteration (BPI) [14]. The scalability of VDCBPI is demonstrated on synthetic network management problems with up to 33 million states.
Nonapproximability Results for Partially Observable Markov Decision Processes
, 2000
"... We show that for several variations of partially observable Markov decision processes, polynomial-time algorithms for nding control policies are unlikely to or simply don't have guarantees of nding policies within a constant factor or a constant summand of optimal. Here "unlikely" means \unless s ..."
Abstract
-
Cited by 27 (0 self)
- Add to MetaCart
We show that for several variations of partially observable Markov decision processes, polynomial-time algorithms for nding control policies are unlikely to or simply don't have guarantees of nding policies within a constant factor or a constant summand of optimal. Here "unlikely" means \unless some complexity classes collapse," where the collapses considered are P = NP, P = PSPACE, or P = EXP. Until or unless these collapses are shown to hold, any control-policy designer must choose between such performance guarantees and ecient computation.
Solving Factored POMDPs with Linear Value Functions
- In IJCAI-01 workshop on Planning under Uncertainty and Incomplete Information
, 2001
"... Partially Observable Markov Decision Processes (POMDPs) provide a coherent mathematical framework for planning under uncertainty when the state of the system cannot be fully observed. However, the problem of finding an exact POMDP solution is intractable. ..."
Abstract
-
Cited by 21 (3 self)
- Add to MetaCart
Partially Observable Markov Decision Processes (POMDPs) provide a coherent mathematical framework for planning under uncertainty when the state of the system cannot be fully observed. However, the problem of finding an exact POMDP solution is intractable.
Value-directed belief state approximation for pomdps
- In UAI-2000
, 2000
"... We consider the problem belief-state monitoring for the purposes of implementing a policy for a partially-observable Markov decision process (POMDP), specifically how one might approximate the belief state. Other schemes for beliefstate approximation (e.g., based on minimizing a measure such as KL-d ..."
Abstract
-
Cited by 21 (3 self)
- Add to MetaCart
We consider the problem belief-state monitoring for the purposes of implementing a policy for a partially-observable Markov decision process (POMDP), specifically how one might approximate the belief state. Other schemes for beliefstate approximation (e.g., based on minimizing a measure such as KL-divergence between the true and estimated state) are not necessarily appropriate for POMDPs. Instead we propose a framework for analyzing value-directed approximation schemes, where approximation quality is determined by the expected error in utility rather than by the error in the belief state itself. We propose heuristic methods for finding good projection schemes for belief state estimation—exhibiting anytime characteristics—given a POMDP value function. We also describe several algorithms for constructing bounds on the error in decision quality (expected utility) associated with acting in accordance with a given belief state approximation. 1
Synthesis of Hierarchical Finite-State Controllers for POMDPs
, 2003
"... We develop a hierarchical approach to planning for partially observable Markov decision processes (POMDPs) in which a policy is represented as a hierarchical finite-state controller. ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
We develop a hierarchical approach to planning for partially observable Markov decision processes (POMDPs) in which a policy is represented as a hierarchical finite-state controller.

