Results 1  10
of
25
Valuefunction approximations for partially observable Markov decision processes
 Journal of Artificial Intelligence Research
, 2000
"... Partially observable Markov decision processes (POMDPs) provide an elegant mathematical framework for modeling complex decision and planning problems in stochastic domains in which states of the system are observable only indirectly, via a set of imperfect or noisy observations. The modeling advanta ..."
Abstract

Cited by 127 (0 self)
 Add to MetaCart
Partially observable Markov decision processes (POMDPs) provide an elegant mathematical framework for modeling complex decision and planning problems in stochastic domains in which states of the system are observable only indirectly, via a set of imperfect or noisy observations. The modeling advantage of POMDPs, however, comes at a price — exact methods for solving them are computationally very expensive and thus applicable in practice only to very simple problems. We focus on efficient approximation (heuristic) methods that attempt to alleviate the computational problem and trade off accuracy for speed. We have two objectives here. First, we survey various approximation methods, analyze their properties and relations and provide some new insights into their differences. Second, we present a number of new approximation methods and novel refinements of existing techniques. The theoretical results are supported by experiments on a problem from the agent navigation domain. 1.
Solving POMDPs by Searching in Policy Space
, 1998
"... Most algorithms for solving POMDPs iteratively improve a value function that implicitly represents a policy and are said to search in value function space. This paper presents an approach to solving POMDPs that represents a policy explicitly as a finitestate controller and iteratively improve ..."
Abstract

Cited by 93 (9 self)
 Add to MetaCart
Most algorithms for solving POMDPs iteratively improve a value function that implicitly represents a policy and are said to search in value function space. This paper presents an approach to solving POMDPs that represents a policy explicitly as a finitestate controller and iteratively improves the controller by search in policy space. Two related algorithms illustrate this approach. The first is a policy iteration algorithm that can outperform value iteration in solving infinitehorizon POMDPs. It provides the foundation for a new heuristic search algorithm that promises further speedup by focusing computational effort on regions of the problem space that are reachable, or likely to be reached, from a start state. 1 Introduction A partially observable Markov decision process (POMDP) provides an elegant mathematical model for planning and control problems for which there can be uncertainty about the effects of actions and about the current state. It is wellknown that ...
Equivalence notions and model minimization in Markov decision processes
, 2003
"... Many stochastic planning problems can be represented using Markov Decision Processes (MDPs). A difficulty with using these MDP representations is that the common algorithms for solving them run in time polynomial in the size of the state space, where this size is extremely large for most realworld ..."
Abstract

Cited by 92 (2 self)
 Add to MetaCart
Many stochastic planning problems can be represented using Markov Decision Processes (MDPs). A difficulty with using these MDP representations is that the common algorithms for solving them run in time polynomial in the size of the state space, where this size is extremely large for most realworld planning problems of interest. Recent AI research has addressed this problem by representing the MDP in a factored form. Factored MDPs, however, are not amenable to traditional solution methods that call for an explicit enumeration of the state space. One familiar way to solve MDP problems with very large state spaces is to form a reduced (or aggregated) MDP with the same properties as the original MDP by combining “equivalent ” states. In this paper, we discuss applying this approach to solving factored MDP problems—we avoid enumerating the state space by describing large blocks of “equivalent” states in factored form, with the block descriptions being inferred directly from the original factored representation. The resulting reduced MDP may have exponentially fewer states than the original factored MDP, and can then be solved using traditional methods. The reduced MDP found depends on the notion of equivalence between states used in the aggregation. The notion of equivalence chosen will be fundamental in designing and analyzing
Spoken Dialogue Management Using Probabilistic Reasoning
, 2000
"... Spoken dialogue managers have benefited from stochastic planners such as MDPs. However, so far, MDPs do not handle well noisy and ambiguous speech utterances. We use a POMDPstyle approach to generate dialogue strategies by inverting the notion of dialogue state; the state represents the user's inte ..."
Abstract

Cited by 78 (9 self)
 Add to MetaCart
Spoken dialogue managers have benefited from stochastic planners such as MDPs. However, so far, MDPs do not handle well noisy and ambiguous speech utterances. We use a POMDPstyle approach to generate dialogue strategies by inverting the notion of dialogue state; the state represents the user's intentions, rather than the system state. We compare the performance of MDP and POMDP dialogue managers and show that as speech recognition degrades, the POMDP dialogue manager automatically adjusts the policy and makes fewer mistakes compared to the MDP manager. 1 Introduction The development of automatic speech recognition has made possible more natural humancomputer interaction. Speech recognition and speech understanding, however, are not yet at the point where a computer can reliably extract the intended meaning from every human utterance. Human speech can be both noisy and ambiguous, and many realworld systems must also be speakerindependent. Regardless of these difficulties, any syst...
Finding Approximate POMDP Solutions Through Belief Compression
, 2003
"... Standard value function approaches to finding policies for Partially Observable Markov Decision Processes (POMDPs) are generally considered to be intractable for large models. The intractability of these algorithms is to a large extent a consequence of computing an exact, optimal policy over the ent ..."
Abstract

Cited by 62 (2 self)
 Add to MetaCart
Standard value function approaches to finding policies for Partially Observable Markov Decision Processes (POMDPs) are generally considered to be intractable for large models. The intractability of these algorithms is to a large extent a consequence of computing an exact, optimal policy over the entire belief space. However, in realworld POMDP problems, computing the optimal policy for the full belief space is often unnecessary for good control even for problems with complicated policy classes. The beliefs experienced by the controller often lie near a structured, lowdimensional manifold embedded in the highdimensional belief space. Finding a good approximation to the optimal value function for only this manifold can be much easier than computing the full value function. We introduce a new method for solving largescale POMDPs by reducing the dimensionality of the belief space. We use Exponential family Principal Components Analysis (Collins, Dasgupta, & Schapire, 2002) to represent sparse, highdimensional belief spaces using lowdimensional sets of learned features of the belief state. We then plan only in terms of the lowdimensional belief features. By planning in this lowdimensional space, we can find policies for POMDP models that are orders of magnitude larger than models that can be handled by conventional techniques. We demonstrate the use of this algorithm on a synthetic problem and on mobile robot navigation tasks. 1.
An Improved GridBased Approximation Algorithm for POMDPs
, 2001
"... Although a partially observable Markov decision process (POMDP) provides an appealing model for problems of planning under uncertainty, exact algorithms for POMDPs are intractable. This motivates work on approximation algorithms, and gridbased approximation is a widelyused approach. We descri ..."
Abstract

Cited by 45 (0 self)
 Add to MetaCart
Although a partially observable Markov decision process (POMDP) provides an appealing model for problems of planning under uncertainty, exact algorithms for POMDPs are intractable. This motivates work on approximation algorithms, and gridbased approximation is a widelyused approach. We describe a novel approach to gridbased approximation that uses a variableresolution regular grid, and show that it outperforms previous gridbased approaches to approximation. 1
Learning Low Dimensional Predictive Representations
 IN ICML ’04: TWENTYFIRST INTERNATIONAL CONFERENCE ON MACHINE LEARNING
, 2004
"... ..."
An Improved Policy Iteration Algorithm for Partially Observable MDPs
 In Advances in Neural Information Processing Systems, 10
, 1997
"... A new policy iteration algorithm for partially observable Markov decision processes is presented that is simpler and more efficient than an earlier policy iteration algorithm of Sondik (1971,1978). The key simplification is representation of a policy as a finitestate controller. This representa ..."
Abstract

Cited by 36 (1 self)
 Add to MetaCart
A new policy iteration algorithm for partially observable Markov decision processes is presented that is simpler and more efficient than an earlier policy iteration algorithm of Sondik (1971,1978). The key simplification is representation of a policy as a finitestate controller. This representation makes policy evaluation straightforward. The paper 's contribution is to show that the dynamicprogramming update used in the policy improvement step can be interpreted as the transformation of a finitestate controller into an improved finitestate controller. The new algorithm consistently outperforms value iteration as an approach to solving infinitehorizon problems. 1 Introduction A partially observable Markov decision process (POMDP) is a generalization of the standard completely observable Markov decision process that allows imperfect information about the state of the system. First studied as a model of decisionmaking in operations research, it has recently been used as ...
HighLevel Planning and Control with Incomplete Information Using POMDP's
, 1998
"... We develop an approach to planning with incomplete information that is based on three elements: 1. a highlevel language for describing the effects of actions on both the world and the agent's beliefs that we call pomdp theories 2. a semantics that translates such theories into actual pomdps 3. a ..."
Abstract

Cited by 36 (7 self)
 Add to MetaCart
We develop an approach to planning with incomplete information that is based on three elements: 1. a highlevel language for describing the effects of actions on both the world and the agent's beliefs that we call pomdp theories 2. a semantics that translates such theories into actual pomdps 3. a real time dynamic programming algorithm that produces controllers from such pomdps. We show that the resulting approach is not only clean and general but that is practical as well. We have implemented a shell that accepts pomdp theories and produces controllers, and have tested it over a number of problems. In this paper we present the main elements of the approach and report results for the `omelette problem' where the resulting controller exhibits a better performance than the handcrafted controller. Introduction Consider an agent that has a large supply of eggs and whose goal is to get three good eggs and no bad ones into one of two bowls. The eggs can be either good or bad, and at any...
PointBased Value Iteration for Continuous POMDPs
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... We propose a novel approach to optimize Partially Observable Markov Decisions Processes (POMDPs) defined on continuous spaces. To date, most algorithms for modelbased POMDPs are restricted to discrete states, actions, and observations, but many realworld problems such as, for instance, robot na ..."
Abstract

Cited by 34 (2 self)
 Add to MetaCart
We propose a novel approach to optimize Partially Observable Markov Decisions Processes (POMDPs) defined on continuous spaces. To date, most algorithms for modelbased POMDPs are restricted to discrete states, actions, and observations, but many realworld problems such as, for instance, robot navigation, are naturally defined on continuous spaces. In this work, we demonstrate that the value function for continuous POMDPs is convex in the beliefs over continuous state spaces, and piecewiselinear convex for the particular case of discrete observations and actions but still continuous states. We also demonstrate that continuous Bellman backups are contracting and isotonic ensuring the monotonic convergence of valueiteration algorithms. Relying on those properties, we extend the PERSEUS algorithm, originally developed for discrete POMDPs, to work in continuous state spaces by representing the observation, transition, and reward models using Gaussian mixtures, and the beliefs using Gaussian mixtures or particle sets. With these representations, the integrals that appear in the Bellman backup can be computed in closed form and, therefore, the algorithm is computationally feasible. Finally, we further extend PERSEUS to deal with continuous action and observation sets by designing effective sampling approaches.