Results 1  10
of
101
MonteCarlo Planning in Large POMDPs
 In Advances in Neural Information Processing Systems 23
, 2010
"... This paper introduces a MonteCarlo algorithm for online planning in large POMDPs. The algorithm combines a MonteCarlo update of the agent’s belief state with a MonteCarlo tree search from the current belief state. The new algorithm, POMCP, has two important properties. First, MonteCarlo sampling ..."
Abstract

Cited by 111 (8 self)
 Add to MetaCart
(Show Context)
This paper introduces a MonteCarlo algorithm for online planning in large POMDPs. The algorithm combines a MonteCarlo update of the agent’s belief state with a MonteCarlo tree search from the current belief state. The new algorithm, POMCP, has two important properties. First, MonteCarlo sampling is used to break the curse of dimensionality both during belief state updates and during planning. Second, only a black box simulator of the POMDP is required, rather than explicit probability distributions. These properties enable POMCP to plan effectively in significantly larger POMDPs than has previously been possible. We demonstrate its effectiveness in three large POMDPs. We scale up a wellknown benchmark problem, rocksample, by several orders of magnitude. We also introduce two challenging new POMDPs: 10 × 10 battleship and partially observable PacMan, with approximately 10 18 and 10 56 states respectively. Our MonteCarlo planning algorithm achieved a high level of performance with no prior knowledge, and was also able to exploit simple domain knowledge to achieve better results with less search. POMCP is the first general purpose planner to achieve high performance in such large and unfactored POMDPs. 1
Online planning algorithms for POMDPs
 Journal of Artificial Intelligence Research
, 2008
"... Partially Observable Markov Decision Processes (POMDPs) provide a rich framework for sequential decisionmaking under uncertainty in stochastic domains. However, solving a POMDP is often intractable except for small problems due to their complexity. Here, we focus on online approaches that alleviate ..."
Abstract

Cited by 109 (3 self)
 Add to MetaCart
(Show Context)
Partially Observable Markov Decision Processes (POMDPs) provide a rich framework for sequential decisionmaking under uncertainty in stochastic domains. However, solving a POMDP is often intractable except for small problems due to their complexity. Here, we focus on online approaches that alleviate the computational complexity by computing good local policies at each decision step during the execution. Online algorithms generally consist of a lookahead search to find the best action to execute at each time step in an environment. Our objectives here are to survey the various existing online POMDP methods, analyze their properties and discuss their advantages and disadvantages; and to thoroughly evaluate these online approaches in different environments under various metrics (return, error bound reduction, lower bound improvement). Our experimental results indicate that stateoftheart online heuristic search methods can handle large POMDP domains efficiently. 1.
Adaptive submodularity: Theory and applications in active learning and stochastic optimization
 J. Artificial Intelligence Research
, 2011
"... Many problems in artificial intelligence require adaptively making a sequence of decisions with uncertain outcomes under partial observability. Solving such stochastic optimization problems is a fundamental but notoriously difficult challenge. In this paper, we introduce the concept of adaptive subm ..."
Abstract

Cited by 64 (15 self)
 Add to MetaCart
(Show Context)
Many problems in artificial intelligence require adaptively making a sequence of decisions with uncertain outcomes under partial observability. Solving such stochastic optimization problems is a fundamental but notoriously difficult challenge. In this paper, we introduce the concept of adaptive submodularity, generalizing submodular set functions to adaptive policies. We prove that if a problem satisfies this property, a simple adaptive greedy algorithm is guaranteed to be competitive with the optimal policy. In addition to providing performance guarantees for both stochastic maximization and coverage, adaptive submodularity can be exploited to drastically speed up the greedy algorithm by using lazy evaluations. We illustrate the usefulness of the concept by giving several examples of adaptive submodular objectives arising in diverse AI applications including management of sensing resources, viral marketing and active learning. Proving adaptive submodularity for these problems allows us to recover existing results in these applications as special cases, improve approximation guarantees and handle natural generalizations. 1.
Closing the learningplanning loop with predictive state representations (Extended Abstract)
, 2010
"... A central problem in artificial intelligence is to plan to maximize future reward under uncertainty in a partially observable environment. Models of such environments include Partially Observable Markov Decision Processes (POMDPs) [4] as well as their generalizations, Predictive State Representation ..."
Abstract

Cited by 49 (12 self)
 Add to MetaCart
(Show Context)
A central problem in artificial intelligence is to plan to maximize future reward under uncertainty in a partially observable environment. Models of such environments include Partially Observable Markov Decision Processes (POMDPs) [4] as well as their generalizations, Predictive State Representations (PSRs) [9] and Observable Operator Models (OOMs) [7]. POMDPs model the state of the world as a latent variable; in contrast, PSRs and OOMs represent state by tracking occurrence probabilities of a set of future events (called tests or characteristic events) conditioned on past events (called histories or indicative events). Unfortunately, exact planning algorithms such as value iteration [14] are intractable for most realistic POMDPs due to the curse of history and the curse of dimensionality [11]. However, PSRs and OOMs hold the promise of mitigating both of these curses: first, many successful approximate planning techniques designed to address
Solving POMDPs: RTDPBel vs. Pointbased Algorithms
"... Pointbased algorithms and RTDPBel are approximate methods for solving POMDPs that replace the full updates of parallel value iteration by faster and more effective updates at selected beliefs. An important difference between the two methods is that the former adopt Sondik’s representation of the v ..."
Abstract

Cited by 33 (7 self)
 Add to MetaCart
Pointbased algorithms and RTDPBel are approximate methods for solving POMDPs that replace the full updates of parallel value iteration by faster and more effective updates at selected beliefs. An important difference between the two methods is that the former adopt Sondik’s representation of the value function, while the latter uses a tabular representation and a discretization function. The algorithms, however, have not been compared up to now, because they target different POMDPs: discounted POMDPs on the one hand, and Goal POMDPs on the other. In this paper, we bridge this representational gap, showing how to transform discounted POMDPs into Goal POMDPs, and use the transformation to compare RTDPBel with pointbased algorithms over the existing discounted benchmarks. The results appear to contradict the conventional wisdom in the area showing that RTDPBel is competitive, and sometimes superior to pointbased algorithms in both quality and time. 1
Motion planning and control from temporal logic specifications with probabilistic satisfaction guarantees
 in ICRA, 2010
"... Abstract — We present a computational framework for automatic deployment of a robot from a temporal logic specification over a set of properties of interest satisfied at the regions of a partitioned environment. We assume that, during the motion of the robot in the environment, the current region c ..."
Abstract

Cited by 32 (5 self)
 Add to MetaCart
Abstract — We present a computational framework for automatic deployment of a robot from a temporal logic specification over a set of properties of interest satisfied at the regions of a partitioned environment. We assume that, during the motion of the robot in the environment, the current region can be precisely determined, while due to sensor and actuation noise, the outcome of a control action can only be predicted probabilistically. Under these assumptions, the deployment problem translates to generating a control strategy for a Markov Decision Process (MDP) from a temporal logic formula. We propose an algorithm inspired from probabilistic Computation Tree Logic (PCTL) model checking to find a control strategy that maximizes the probability of satisfying the specification. We illustrate our method with simulation and experimental results. I.
Optimal Value of Information in Graphical Models
"... Many realworld decision making tasks require us to choose among several expensive observations. In a sensor network, for example, it is important to select the subset of sensors that is expected to provide the strongest reduction in uncertainty. In medical decision making tasks, one needs to select ..."
Abstract

Cited by 25 (6 self)
 Add to MetaCart
Many realworld decision making tasks require us to choose among several expensive observations. In a sensor network, for example, it is important to select the subset of sensors that is expected to provide the strongest reduction in uncertainty. In medical decision making tasks, one needs to select which tests to administer before deciding on the most effective treatment. It has been general practice to use heuristicguided procedures for selecting observations. In this paper, we present the first efficient optimal algorithms for selecting observations for a class of probabilistic graphical models. For example, our algorithms allow to optimally label hidden variables in Hidden Markov Models (HMMs). We provide results for both selecting the optimal subset of observations, and for obtaining an optimal conditional observation plan. Furthermore we prove a surprising result: In most graphical models tasks, if one designs an efficient algorithm for chain graphs, such as HMMs, this procedure can be generalized to polytree graphical models. We prove that the optimizing value of information is NP PPhard even for polytrees. It also follows from our results that just computing decision theoretic value of information objective functions, which are commonly used in practice, is a #Pcomplete problem even on Naive Bayes models (a simple special case of polytrees). In addition, we consider several extensions, such as using our algorithms for scheduling observation selection for multiple sensors. We demonstrate the effectiveness of our approach on several realworld datasets, including a prototype sensor network deployment for energy conservation in buildings. 1.
Pointbased backup for decentralized POMDPs: Complexity and new algorithms
 In AAMAS
, 2010
"... Decentralized POMDPs provide an expressive framework for sequential multiagent decision making. Despite their high complexity, there has been significant progress in scaling up existing algorithms, largely due to the use of pointbased methods. Performing pointbased backup is a fundamental operatio ..."
Abstract

Cited by 21 (1 self)
 Add to MetaCart
(Show Context)
Decentralized POMDPs provide an expressive framework for sequential multiagent decision making. Despite their high complexity, there has been significant progress in scaling up existing algorithms, largely due to the use of pointbased methods. Performing pointbased backup is a fundamental operation in stateoftheart algorithms. We show that even a single backup step in the multiagent setting is NPComplete. Despite this negative worstcase result, we present an efficient and scalable optimal algorithm as well as a principled approximation scheme. The optimal algorithm exploits recent advances in the weighted CSP literature to overcome the complexity of the backup operation. The polytime approximation scheme provides a constant factor approximation guarantee based on the number of belief points. In experiments on standard domains, the optimal approach provides significant speedup (up to 2 orders of magnitude) over the previous best optimal algorithm and is able to increase the number of belief points by more than a factor of 3. The approximation scheme also works well in practice, providing nearoptimal solutions to the backup problem.
Continuousstate POMDPs with hybrid dynamics
 In Symposium on Artificial Intelligence and Mathematics
, 2008
"... Continuousstate POMDPs provide a natural representation for a variety of tasks, including many in robotics. However, existing continuousstate POMDP approaches are limited by their reliance on a single linear model to represent the world dynamics. We introduce a new switchingstate (hybrid) dynamic ..."
Abstract

Cited by 18 (3 self)
 Add to MetaCart
Continuousstate POMDPs provide a natural representation for a variety of tasks, including many in robotics. However, existing continuousstate POMDP approaches are limited by their reliance on a single linear model to represent the world dynamics. We introduce a new switchingstate (hybrid) dynamics model that can represent multimodal statedependent dynamics. We present a new pointbased POMDP planning algorithm for solving continuousstate POMDPs using this dynamics model. We also provide a constrained optimization approach for approximating the value function as a mixture of a bounded number of Gaussians. We present results on a set of example problems and demonstrate that when different degrees of state accuracy are needed to accomplish a task, our hybrid continuousstate approach outperforms a standard discrete state technique. 1
Inverse Reinforcement Learning in Partially Observable Environments
"... Inverse reinforcement learning (IRL) is the problem of recovering the underlying reward function from the behaviour of an expert. Most of the existing algorithms for IRL assume that the expert’s environment is modeled as a Markov decision process (MDP), although they should be able to handle partial ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
Inverse reinforcement learning (IRL) is the problem of recovering the underlying reward function from the behaviour of an expert. Most of the existing algorithms for IRL assume that the expert’s environment is modeled as a Markov decision process (MDP), although they should be able to handle partially observable settings in order to widen the applicability to more realistic scenarios. In this paper, we present an extension of the classical IRL algorithm by Ng and Russell to partially observable environments. We discuss technical issues and challenges, and present the experimental results on some of the benchmark partially observable domains. 1