Results 1  10
of
104
MonteCarlo planning in large POMDPs
 in Proc. Neural Inf. Process. Syst
, 2010
"... Abstract This paper introduces a MonteCarlo algorithm for online planning in large POMDPs. The algorithm combines a MonteCarlo update of the agent's belief state with a MonteCarlo tree search from the current belief state. The new algorithm, POMCP, has two important properties. First, Monte ..."
Abstract

Cited by 112 (8 self)
 Add to MetaCart
(Show Context)
Abstract This paper introduces a MonteCarlo algorithm for online planning in large POMDPs. The algorithm combines a MonteCarlo update of the agent's belief state with a MonteCarlo tree search from the current belief state. The new algorithm, POMCP, has two important properties. First, MonteCarlo sampling is used to break the curse of dimensionality both during belief state updates and during planning. Second, only a black box simulator of the POMDP is required, rather than explicit probability distributions. These properties enable POMCP to plan effectively in significantly larger POMDPs than has previously been possible. We demonstrate its effectiveness in three large POMDPs. We scale up a wellknown benchmark problem, rocksample, by several orders of magnitude. We also introduce two challenging new POMDPs: 10 × 10 battleship and partially observable PacMan, with approximately 10 18 and 10 56 states respectively. Our MonteCarlo planning algorithm achieved a high level of performance with no prior knowledge, and was also able to exploit simple domain knowledge to achieve better results with less search. POMCP is the first general purpose planner to achieve high performance in such large and unfactored POMDPs.
Online planning algorithms for POMDPs
 Journal of Artificial Intelligence Research
, 2008
"... Partially Observable Markov Decision Processes (POMDPs) provide a rich framework for sequential decisionmaking under uncertainty in stochastic domains. However, solving a POMDP is often intractable except for small problems due to their complexity. Here, we focus on online approaches that alleviate ..."
Abstract

Cited by 109 (3 self)
 Add to MetaCart
(Show Context)
Partially Observable Markov Decision Processes (POMDPs) provide a rich framework for sequential decisionmaking under uncertainty in stochastic domains. However, solving a POMDP is often intractable except for small problems due to their complexity. Here, we focus on online approaches that alleviate the computational complexity by computing good local policies at each decision step during the execution. Online algorithms generally consist of a lookahead search to find the best action to execute at each time step in an environment. Our objectives here are to survey the various existing online POMDP methods, analyze their properties and discuss their advantages and disadvantages; and to thoroughly evaluate these online approaches in different environments under various metrics (return, error bound reduction, lower bound improvement). Our experimental results indicate that stateoftheart online heuristic search methods can handle large POMDP domains efficiently. 1.
Adaptive submodularity: Theory and applications in active learning and stochastic optimization
 J. Artificial Intelligence Research
, 2011
"... Many problems in artificial intelligence require adaptively making a sequence of decisions with uncertain outcomes under partial observability. Solving such stochastic optimization problems is a fundamental but notoriously difficult challenge. In this paper, we introduce the concept of adaptive subm ..."
Abstract

Cited by 70 (15 self)
 Add to MetaCart
(Show Context)
Many problems in artificial intelligence require adaptively making a sequence of decisions with uncertain outcomes under partial observability. Solving such stochastic optimization problems is a fundamental but notoriously difficult challenge. In this paper, we introduce the concept of adaptive submodularity, generalizing submodular set functions to adaptive policies. We prove that if a problem satisfies this property, a simple adaptive greedy algorithm is guaranteed to be competitive with the optimal policy. In addition to providing performance guarantees for both stochastic maximization and coverage, adaptive submodularity can be exploited to drastically speed up the greedy algorithm by using lazy evaluations. We illustrate the usefulness of the concept by giving several examples of adaptive submodular objectives arising in diverse AI applications including management of sensing resources, viral marketing and active learning. Proving adaptive submodularity for these problems allows us to recover existing results in these applications as special cases, improve approximation guarantees and handle natural generalizations. 1.
Closing the learningplanning loop with predictive state representations (Extended Abstract)
, 2010
"... A central problem in artificial intelligence is to plan to maximize future reward under uncertainty in a partially observable environment. Models of such environments include Partially Observable Markov Decision Processes (POMDPs) [4] as well as their generalizations, Predictive State Representation ..."
Abstract

Cited by 50 (12 self)
 Add to MetaCart
(Show Context)
A central problem in artificial intelligence is to plan to maximize future reward under uncertainty in a partially observable environment. Models of such environments include Partially Observable Markov Decision Processes (POMDPs) [4] as well as their generalizations, Predictive State Representations (PSRs) [9] and Observable Operator Models (OOMs) [7]. POMDPs model the state of the world as a latent variable; in contrast, PSRs and OOMs represent state by tracking occurrence probabilities of a set of future events (called tests or characteristic events) conditioned on past events (called histories or indicative events). Unfortunately, exact planning algorithms such as value iteration [14] are intractable for most realistic POMDPs due to the curse of history and the curse of dimensionality [11]. However, PSRs and OOMs hold the promise of mitigating both of these curses: first, many successful approximate planning techniques designed to address
A survey of pointbased POMDP solvers
 AUTON AGENT MULTIAGENT SYST
, 2012
"... The past decade has seen a significant breakthrough in research on solving partially observable Markov decision processes (POMDPs). Where past solvers could not scale beyond perhaps a dozen states, modern solvers can handle complex domains with many thousands of states. This breakthrough was mainly ..."
Abstract

Cited by 33 (5 self)
 Add to MetaCart
The past decade has seen a significant breakthrough in research on solving partially observable Markov decision processes (POMDPs). Where past solvers could not scale beyond perhaps a dozen states, modern solvers can handle complex domains with many thousands of states. This breakthrough was mainly due to the idea of restricting value function computations to a finite subset of the belief space, permitting only local value updates for this subset. This approach, known as pointbased value iteration, avoids the exponential growth of the value function, and is thus applicable for domains with longer horizons, even with relatively large state spaces. Many extensions were suggested to this basic idea, focusing on various aspects of the algorithm—mainly the selection of the belief space subset, and the order of value function updates. In this survey, we walk the reader through the fundamentals of pointbased value iteration, explaining the main concepts and ideas. Then, we survey the major extensions to the basic algorithm, discussing their merits. Finally, we include an extensive empirical analysis using well known benchmarks, in order to shed light on the strengths and limitations of the various approaches.
Solving POMDPs: RTDPBel vs. Pointbased Algorithms
"... Pointbased algorithms and RTDPBel are approximate methods for solving POMDPs that replace the full updates of parallel value iteration by faster and more effective updates at selected beliefs. An important difference between the two methods is that the former adopt Sondik’s representation of the v ..."
Abstract

Cited by 32 (7 self)
 Add to MetaCart
Pointbased algorithms and RTDPBel are approximate methods for solving POMDPs that replace the full updates of parallel value iteration by faster and more effective updates at selected beliefs. An important difference between the two methods is that the former adopt Sondik’s representation of the value function, while the latter uses a tabular representation and a discretization function. The algorithms, however, have not been compared up to now, because they target different POMDPs: discounted POMDPs on the one hand, and Goal POMDPs on the other. In this paper, we bridge this representational gap, showing how to transform discounted POMDPs into Goal POMDPs, and use the transformation to compare RTDPBel with pointbased algorithms over the existing discounted benchmarks. The results appear to contradict the conventional wisdom in the area showing that RTDPBel is competitive, and sometimes superior to pointbased algorithms in both quality and time. 1
Motion planning and control from temporal logic specifications with probabilistic satisfaction guarantees
 in ICRA, 2010
"... Abstract — We present a computational framework for automatic deployment of a robot from a temporal logic specification over a set of properties of interest satisfied at the regions of a partitioned environment. We assume that, during the motion of the robot in the environment, the current region c ..."
Abstract

Cited by 32 (5 self)
 Add to MetaCart
Abstract — We present a computational framework for automatic deployment of a robot from a temporal logic specification over a set of properties of interest satisfied at the regions of a partitioned environment. We assume that, during the motion of the robot in the environment, the current region can be precisely determined, while due to sensor and actuation noise, the outcome of a control action can only be predicted probabilistically. Under these assumptions, the deployment problem translates to generating a control strategy for a Markov Decision Process (MDP) from a temporal logic formula. We propose an algorithm inspired from probabilistic Computation Tree Logic (PCTL) model checking to find a control strategy that maximizes the probability of satisfying the specification. We illustrate our method with simulation and experimental results. I.
Graphical models for interactive POMDPs: representations and solutions
 AUTON AGENT MULTIAGENT SYST (2009) 18:376–416
, 2008
"... We develop new graphical representations for the problem of sequential decision making in partially observable multiagent environments, as formalized by interactive partially observable Markov decision processes (IPOMDPs). The graphical models called interactive influence diagrams (IIDs) and the ..."
Abstract

Cited by 31 (14 self)
 Add to MetaCart
We develop new graphical representations for the problem of sequential decision making in partially observable multiagent environments, as formalized by interactive partially observable Markov decision processes (IPOMDPs). The graphical models called interactive influence diagrams (IIDs) and their dynamic counterparts, interactive dynamic influence diagrams (IDIDs), seek to explicitly model the structure that is often present in realworld problems by decomposing the situation into chance and decision variables, and the dependencies between the variables. IDIDs generalize DIDs, which may be viewed as graphical representations of POMDPs, to multiagent settings in the same way that IPOMDPs generalize POMDPs. IDIDs may be used to compute the policy of an agent given its belief as the agent acts and observes in a setting that is populated by other interacting agents. Using several examples, we show how IIDs and IDIDs may be applied and demonstrate their usefulness. We also show how the models may be solved using the standard algorithms that are applicable to DIDs. Solving IDIDs exactly involves knowing the solutions of possible models of the other agents. The space of models grows exponentially with the number of time steps. We present a method of solving IDIDs approximately by limiting the number
Optimal value of information in graphical models
 Journal of Artificial Intelligence Research (JAIR
"... Abstract Many realworld decision making tasks require us to choose among several expensive observations. In a sensor network, for example, it is important to select the subset of sensors that is expected to provide the strongest reduction in uncertainty. In medical decision making tasks, one needs ..."
Abstract

Cited by 28 (6 self)
 Add to MetaCart
Abstract Many realworld decision making tasks require us to choose among several expensive observations. In a sensor network, for example, it is important to select the subset of sensors that is expected to provide the strongest reduction in uncertainty. In medical decision making tasks, one needs to select which tests to administer before deciding on the most effective treatment. It has been general practice to use heuristicguided procedures for selecting observations. In this paper, we present the first efficient optimal algorithms for selecting observations for a class of probabilistic graphical models. For example, our algorithms allow to optimally label hidden variables in Hidden Markov Models (HMMs). We provide results for both selecting the optimal subset of observations, and for obtaining an optimal conditional observation plan. Furthermore we prove a surprising result: In most graphical models tasks, if one designs an efficient algorithm for chain graphs, such as HMMs, this procedure can be generalized to polytree graphical models. We prove that the optimizing value of information is NP PP hard even for polytrees. It also follows from our results that just computing decision theoretic value of information objective functions, which are commonly used in practice, is a #Pcomplete problem even on Naive Bayes models (a simple special case of polytrees). In addition, we consider several extensions, such as using our algorithms for scheduling observation selection for multiple sensors. We demonstrate the effectiveness of our approach on several realworld datasets, including a prototype sensor network deployment for energy conservation in buildings.
Pointbased backup for decentralized POMDPs: Complexity and new algorithms
 In AAMAS
, 2010
"... Decentralized POMDPs provide an expressive framework for sequential multiagent decision making. Despite their high complexity, there has been significant progress in scaling up existing algorithms, largely due to the use of pointbased methods. Performing pointbased backup is a fundamental operatio ..."
Abstract

Cited by 22 (1 self)
 Add to MetaCart
(Show Context)
Decentralized POMDPs provide an expressive framework for sequential multiagent decision making. Despite their high complexity, there has been significant progress in scaling up existing algorithms, largely due to the use of pointbased methods. Performing pointbased backup is a fundamental operation in stateoftheart algorithms. We show that even a single backup step in the multiagent setting is NPComplete. Despite this negative worstcase result, we present an efficient and scalable optimal algorithm as well as a principled approximation scheme. The optimal algorithm exploits recent advances in the weighted CSP literature to overcome the complexity of the backup operation. The polytime approximation scheme provides a constant factor approximation guarantee based on the number of belief points. In experiments on standard domains, the optimal approach provides significant speedup (up to 2 orders of magnitude) over the previous best optimal algorithm and is able to increase the number of belief points by more than a factor of 3. The approximation scheme also works well in practice, providing nearoptimal solutions to the backup problem.