Results 1  10
of
14
Speeding Up the Convergence of Value Iteration in Partially Observable Markov Decision Processes
 Journal of Artificial Intelligence Research
, 2001
"... Partially observable Markov decision processes (POMDPs) have recently become popular among many AI researchers because they serve as a natural model for planning under uncertainty. Value iteration is a wellknown algorithm for finding optimal policies for POMDPs. It typically takes a large number ..."
Abstract

Cited by 61 (4 self)
 Add to MetaCart
Partially observable Markov decision processes (POMDPs) have recently become popular among many AI researchers because they serve as a natural model for planning under uncertainty. Value iteration is a wellknown algorithm for finding optimal policies for POMDPs. It typically takes a large number of iterations to converge. This paper proposes a method for accelerating the convergence of value iteration. The method has been evaluated on an array of benchmark problems and was found to be very effective: It enabled value iteration to converge after only a few iterations on all the test problems. 1. Introduction POMDPs model sequential decision making problems where effects of actions are nondeterministic and the state of the world is not known with certainty. They have attracted many researchers in Operations Research and Artificial Intelligence because of their potential applications in a wide range of areas (Monahan 1982, Cassandra 1998b), one of which is planning under uncertai...
An Improved GridBased Approximation Algorithm for POMDPs
, 2001
"... Although a partially observable Markov decision process (POMDP) provides an appealing model for problems of planning under uncertainty, exact algorithms for POMDPs are intractable. This motivates work on approximation algorithms, and gridbased approximation is a widelyused approach. We descri ..."
Abstract

Cited by 53 (0 self)
 Add to MetaCart
Although a partially observable Markov decision process (POMDP) provides an appealing model for problems of planning under uncertainty, exact algorithms for POMDPs are intractable. This motivates work on approximation algorithms, and gridbased approximation is a widelyused approach. We describe a novel approach to gridbased approximation that uses a variableresolution regular grid, and show that it outperforms previous gridbased approaches to approximation. 1
Nonapproximability Results for Partially Observable Markov Decision Processes
, 2000
"... We show that for several variations of partially observable Markov decision processes, polynomialtime algorithms for nding control policies are unlikely to or simply don't have guarantees of nding policies within a constant factor or a constant summand of optimal. Here "unlikely" ..."
Abstract

Cited by 40 (0 self)
 Add to MetaCart
We show that for several variations of partially observable Markov decision processes, polynomialtime algorithms for nding control policies are unlikely to or simply don't have guarantees of nding policies within a constant factor or a constant summand of optimal. Here "unlikely" means \unless some complexity classes collapse," where the collapses considered are P = NP, P = PSPACE, or P = EXP. Until or unless these collapses are shown to hold, any controlpolicy designer must choose between such performance guarantees and ecient computation.
Ant Colony Optimization and its Application to Adaptive Routing in Telecommunication Networks
, 2004
"... In ant societies, and, more in general, in insect societies, the activities of the individuals, as well asofthesocietyasawhole,arenotregulatedbyanyexplicit
formofcentralizedcontrol. Onthe other hand, adaptive and robust behaviors transcending the behavioral repertoire of the single individualcanbeea ..."
Abstract

Cited by 33 (13 self)
 Add to MetaCart
In ant societies, and, more in general, in insect societies, the activities of the individuals, as well asofthesocietyasawhole,arenotregulatedbyanyexplicit
formofcentralizedcontrol. Onthe other hand, adaptive and robust behaviors transcending the behavioral repertoire of the single individualcanbeeasilyobserved at society level. Thesecomplexglobalbehaviorsaretheresult of selforganizing dynamics driven by local interactions and communications among a number of relatively simple individuals. The simultaneous presence of these and other fascinating and unique characteristics have made ant societies an attractive and inspiring model for building newalgorithmsandnewmultiagentsystems. Inthelastdecade,antsocietieshavebeentakenasa referenceforanevergrowingbodyof scientific work, mostly in the fields of robotics, operations research, and telecommunications. Among the different works inspired by ant colonies, the Ant Colony Optimization metaheuristic (ACO) is probably the most successful and popular one. The ACO metaheuristic is a multiagent framework for combinatorial optimization whose main components are: a set of antlike agents, the use of memory and of stochastic decisions, and strategies of collective and distributed learning. It finds its roots
Prioritization methods for accelerating MDP solvers
 Journal of Machine Learning Research
, 2005
"... The performance of value and policy iteration can be dramatically improved by eliminating redundant or useless backups, and by backing up states in the right order. We study several methods designed to accelerate these iterative solvers, including prioritization, partitioning, and variable reorderin ..."
Abstract

Cited by 26 (2 self)
 Add to MetaCart
The performance of value and policy iteration can be dramatically improved by eliminating redundant or useless backups, and by backing up states in the right order. We study several methods designed to accelerate these iterative solvers, including prioritization, partitioning, and variable reordering. We generate a family of algorithms by combining several of the methods discussed, and present extensive empirical evidence demonstrating that performance can improve by several orders of magnitude for many problems, while preserving accuracy and convergence guarantees.
Algorithms for Partially Observable Markov Decision Processes
 HONG KONG UNIVERSITY OF SCIENCE AND TECHNOLOGY
, 2001
"... Partially Observable Markov Decision Process (POMDP) is a general sequential decisionmaking model where the effects of actions are... ..."
Abstract

Cited by 20 (1 self)
 Add to MetaCart
(Show Context)
Partially Observable Markov Decision Process (POMDP) is a general sequential decisionmaking model where the effects of actions are...
HiddenMode Markov Decision Processes for Nonstationary Sequential Decision Making
 In Sequence Learning  Paradigms, Algorithms, and Applications
, 2001
"... This paper mainly addresses the first and third conditions, whereas the second condition is only briefly discussed. In particular, we are interested in a special type of nonstationary environments that repeat their dynamics in a certain manner. We propose a formal model for such environments. We als ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
(Show Context)
This paper mainly addresses the first and third conditions, whereas the second condition is only briefly discussed. In particular, we are interested in a special type of nonstationary environments that repeat their dynamics in a certain manner. We propose a formal model for such environments. We also develop algorithms for learning the model parameters and for computing optimal policies. Before we proceed, let us briefly review the four categories of problems shown in Figure 1 and define the terminology that will be used in this paper
First order Markov Decision Processes
, 2007
"... Relational Markov Decision Processes (RMDP) are a useful abstraction for complex reinforcement solutions for them that are independent of domain size or instantiation. This thesis develops compact representations for RMDPs and exact solution methods for RMDPs using such representations. One of the c ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
(Show Context)
Relational Markov Decision Processes (RMDP) are a useful abstraction for complex reinforcement solutions for them that are independent of domain size or instantiation. This thesis develops compact representations for RMDPs and exact solution methods for RMDPs using such representations. One of the core contributions of the thesis is development of the First Order Decision Diagram (FODD), a representation that captures functions over relational structures, together with a set of operators to manipulate FODDs. FODDs offer a potentially compact representation for complex functions over relational structures and can therefore serve as underlying engine for efficient algorithms with relational structures. The second core contribution is developing exact solution methods for RMDPs based on FODD representations. In particular FODDs are used to represent value functions, transition probabilities, and domain dynamics of RMDPs. Special operations are developed to implement exact value iteration and a novel variant of policy iteration and the algorithms are shown to calculate optimal solutions for RMDPs. Finally we show how the algorithms for RMDPs using FODDs can be extended to handle relational Partially
Solving Large MDPs Quickly with Partitioned Value Iteration
, 2003
"... Value iteration is not typically considered a viable algorithm for solving largescale MDPs because it converges too slowly. However, the performance of value iteration can be dramatically improved by eliminating redundant or useless backups, and by backing up states in the right order. We presen ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Value iteration is not typically considered a viable algorithm for solving largescale MDPs because it converges too slowly. However, the performance of value iteration can be dramatically improved by eliminating redundant or useless backups, and by backing up states in the right order. We present several methods designed to help structure value dependency, and present a systematic study of companion prioritization techniques (both atomic and hybrid) which focus computation in useful regions of the state space. We generate a family of algorithms by combining several of the methods discussed, and present empirical evidence demonstrating that performance can improve by several orders of magnitude for realworld problems, while preserving accuracy and convergence guarantees.
Solving HiddenMode Markov Decision Problems
, 2001
"... HiddenMode Markov decision processes (HMMDPs) are a novel mathematical framework for a subclass of nonstationary reinforcement learning problems where environment dynamics change over time according to a Markov process. HMMDPs are a special case of partially observable Markov decision proc ..."
Abstract
 Add to MetaCart
HiddenMode Markov decision processes (HMMDPs) are a novel mathematical framework for a subclass of nonstationary reinforcement learning problems where environment dynamics change over time according to a Markov process. HMMDPs are a special case of partially observable Markov decision processes (POMDPs), and therefore nonstationary problems of this type can in principle be addressed indirectly via existing POMDP algorithms. However, previous research has shown that such an indirect approach is inefficient compared with a direct HMMDP approach in terms of the model learning time. In this paper, we investigate how to solve HMMDP problems efficiently by using a direct approach. We exploit the HMMDP structure and derive an equation for dynamic programming update. Our equation decomposes the value function into a number of components and as a result, substantially reduces the amount of computations in finding optimal policies. Based on the incremental pruning an...