Results 1  10
of
31
Perseus: Randomized pointbased value iteration for POMDPs
 Journal of Artificial Intelligence Research
, 2005
"... Partially observable Markov decision processes (POMDPs) form an attractive and principled framework for agent planning under uncertainty. Pointbased approximate techniques for POMDPs compute a policy based on a finite set of points collected in advance from the agent’s belief space. We present a ra ..."
Abstract

Cited by 144 (12 self)
 Add to MetaCart
Partially observable Markov decision processes (POMDPs) form an attractive and principled framework for agent planning under uncertainty. Pointbased approximate techniques for POMDPs compute a policy based on a finite set of points collected in advance from the agent’s belief space. We present a randomized pointbased value iteration algorithm called Perseus. The algorithm performs approximate value backup stages, ensuring that in each backup stage the value of each point in the belief set is improved; the key observation is that a single backup may improve the value of many belief points. Contrary to other pointbased methods, Perseus backs up only a (randomly selected) subset of points in the belief set, sufficient for improving the value of each belief point in the set. We show how the same idea can be extended to dealing with continuous action spaces. Experimental results show the potential of Perseus in large scale POMDP problems. 1.
Solving POMDPs by Searching in Policy Space
, 1998
"... Most algorithms for solving POMDPs iteratively improve a value function that implicitly represents a policy and are said to search in value function space. This paper presents an approach to solving POMDPs that represents a policy explicitly as a finitestate controller and iteratively improve ..."
Abstract

Cited by 94 (9 self)
 Add to MetaCart
Most algorithms for solving POMDPs iteratively improve a value function that implicitly represents a policy and are said to search in value function space. This paper presents an approach to solving POMDPs that represents a policy explicitly as a finitestate controller and iteratively improves the controller by search in policy space. Two related algorithms illustrate this approach. The first is a policy iteration algorithm that can outperform value iteration in solving infinitehorizon POMDPs. It provides the foundation for a new heuristic search algorithm that promises further speedup by focusing computational effort on regions of the problem space that are reachable, or likely to be reached, from a start state. 1 Introduction A partially observable Markov decision process (POMDP) provides an elegant mathematical model for planning and control problems for which there can be uncertainty about the effects of actions and about the current state. It is wellknown that ...
Learning finitestate controllers for partially observable environments
 In Proceedings of the fifteenth conference on uncertainty in artificial intelligence
, 1999
"... Reactive (memoryless) policies are sufficient in completely observable Markov decision processes (MDPs), but some kind of memory is usually necessary for optimal control of a partially observable MDP. Policies with finite memory can be represented as finitestate automata. In this paper, we extend B ..."
Abstract

Cited by 78 (10 self)
 Add to MetaCart
Reactive (memoryless) policies are sufficient in completely observable Markov decision processes (MDPs), but some kind of memory is usually necessary for optimal control of a partially observable MDP. Policies with finite memory can be represented as finitestate automata. In this paper, we extend Baird and Moore’s VAPS algorithm to the problem of learning general finitestate automata. Because it performs stochastic gradient descent, this algorithm can be shown to converge to a locally optimal finitestate controller. We provide the details of the algorithm and then consider the question of under what conditions stochastic gradient descent will outperform exact gradient descent. We conclude with empirical results comparing the performance of stochastic and exact gradient descent, and showing the ability of our algorithm to extract the useful information contained in the sequence of past observations to compensate for the lack of observability at each timestep. 1
The Computational Complexity of Probabilistic Planning
 Journal of Artificial Intelligence Research
, 1998
"... We examine the computational complexity of testing and finding small plans in probabilistic planning domains with both flat and propositional representations. The complexity of plan evaluation and existence varies with the plan type sought; we examine totally ordered plans, acyclic plans, and loopin ..."
Abstract

Cited by 77 (5 self)
 Add to MetaCart
We examine the computational complexity of testing and finding small plans in probabilistic planning domains with both flat and propositional representations. The complexity of plan evaluation and existence varies with the plan type sought; we examine totally ordered plans, acyclic plans, and looping plans, and partially ordered plans under three natural definitions of plan value. We show that problems of interest are complete for a variety of complexity classes: PL, P, NP, coNP, PP, NP PP, coNP PP , and PSPACE. In the process of proving that certain planning problems are complete for NP PP , we introduce a new basic NP PP complete problem, EMajsat, which generalizes the standard Boolean satisfiability problem to computations involving probabilistic quantities; our results suggest that the development of good heuristics for EMajsat could be important for the creation of efficient algorithms for a wide variety of problems.
Solving POMDPs by searching the space of finite policies
 IN PROCEEDINGS OF THE FIFTEENTH CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE
, 1999
"... Solving partially observable Markov decision processes (POMDPs) is highly intractable in general, at least in part because the optimal policy may be infinitely large. In this paper, we explore the problem of finding the optimal policy from a restricted set of policies, represented as finite state au ..."
Abstract

Cited by 58 (3 self)
 Add to MetaCart
Solving partially observable Markov decision processes (POMDPs) is highly intractable in general, at least in part because the optimal policy may be infinitely large. In this paper, we explore the problem of finding the optimal policy from a restricted set of policies, represented as finite state automata of a given size. This problem is also intractable, but we show that the complexity can be greatly reduced when the POMDP and/or policy are further constrained. We demonstrate good empirical results with a branchandbound method for finding globally optimal deterministic policies, and a gradientascent method for finding locally optimal stochastic policies.
Learning Policies with External Memory
, 2001
"... In order for an agent to perform well in partially observable domains, it is usually necessary for actions to depend on the history of observations. In this paper, we explore a stigmergic approach, in which the agent’s actions include the ability to set and clear bits in an external memory, and the ..."
Abstract

Cited by 48 (8 self)
 Add to MetaCart
In order for an agent to perform well in partially observable domains, it is usually necessary for actions to depend on the history of observations. In this paper, we explore a stigmergic approach, in which the agent’s actions include the ability to set and clear bits in an external memory, and the external memory is included as part of the input to the agent. In this case, we need to learn a reactive policy in a highly nonMarkovian domain. We explore two algorithms: sarsa(λ), which has had empirical success in partially observable domains, and vaps, a new algorithm due to Baird and Moore, with convergence guarantees in partially observable domains. We compare the performance of these two algorithms on benchmark problems.
An Improved Policy Iteration Algorithm for Partially Observable MDPs
 In Advances in Neural Information Processing Systems, 10
, 1997
"... A new policy iteration algorithm for partially observable Markov decision processes is presented that is simpler and more efficient than an earlier policy iteration algorithm of Sondik (1971,1978). The key simplification is representation of a policy as a finitestate controller. This representa ..."
Abstract

Cited by 36 (1 self)
 Add to MetaCart
A new policy iteration algorithm for partially observable Markov decision processes is presented that is simpler and more efficient than an earlier policy iteration algorithm of Sondik (1971,1978). The key simplification is representation of a policy as a finitestate controller. This representation makes policy evaluation straightforward. The paper 's contribution is to show that the dynamicprogramming update used in the policy improvement step can be interpreted as the transformation of a finitestate controller into an improved finitestate controller. The new algorithm consistently outperforms value iteration as an approach to solving infinitehorizon problems. 1 Introduction A partially observable Markov decision process (POMDP) is a generalization of the standard completely observable Markov decision process that allows imperfect information about the state of the system. First studied as a model of decisionmaking in operations research, it has recently been used as ...
Nonapproximability Results for Partially Observable Markov Decision Processes
, 2000
"... We show that for several variations of partially observable Markov decision processes, polynomialtime algorithms for nding control policies are unlikely to or simply don't have guarantees of nding policies within a constant factor or a constant summand of optimal. Here "unlikely" means \unless s ..."
Abstract

Cited by 33 (0 self)
 Add to MetaCart
We show that for several variations of partially observable Markov decision processes, polynomialtime algorithms for nding control policies are unlikely to or simply don't have guarantees of nding policies within a constant factor or a constant summand of optimal. Here "unlikely" means \unless some complexity classes collapse," where the collapses considered are P = NP, P = PSPACE, or P = EXP. Until or unless these collapses are shown to hold, any controlpolicy designer must choose between such performance guarantees and ecient computation.