Results 1  10
of
163
Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition
 Journal of Artificial Intelligence Research
, 2000
"... This paper presents a new approach to hierarchical reinforcement learning based on decomposing the target Markov decision process (MDP) into a hierarchy of smaller MDPs and decomposing the value function of the target MDP into an additive combination of the value functions of the smaller MDPs. Th ..."
Abstract

Cited by 423 (6 self)
 Add to MetaCart
(Show Context)
This paper presents a new approach to hierarchical reinforcement learning based on decomposing the target Markov decision process (MDP) into a hierarchy of smaller MDPs and decomposing the value function of the target MDP into an additive combination of the value functions of the smaller MDPs. The decomposition, known as the MAXQ decomposition, has both a procedural semanticsas a subroutine hierarchyand a declarative semanticsas a representation of the value function of a hierarchical policy. MAXQ unifies and extends previous work on hierarchical reinforcement learning by Singh, Kaelbling, and Dayan and Hinton. It is based on the assumption that the programmer can identify useful subgoals and define subtasks that achieve these subgoals. By defining such subgoals, the programmer constrains the set of policies that need to be considered during reinforcement learning. The MAXQ value function decomposition can represent the value function of any policy that is consisten...
SPUDD: Stochastic planning using decision diagrams
 In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence
, 1999
"... Recently, structured methods for solving factored Markov decisions processes (MDPs) with large state spaces have been proposed recently to allow dynamic programming to be applied without the need for complete state enumeration. We propose and examine a new value iteration algorithm for MDPs that use ..."
Abstract

Cited by 216 (20 self)
 Add to MetaCart
(Show Context)
Recently, structured methods for solving factored Markov decisions processes (MDPs) with large state spaces have been proposed recently to allow dynamic programming to be applied without the need for complete state enumeration. We propose and examine a new value iteration algorithm for MDPs that uses algebraic decision diagrams (ADDs) to represent value functions and policies, assuming an ADD input representation of the MDP. Dynamic programming is implemented via ADD manipulation. We demonstrate our method on a class of large MDPs (up to 63 million states) and show that significant gains can be had when compared to treestructured representations (with up to a thirtyfold reduction in the number of nodes required to represent optimal value functions). 1
Stochastic Dynamic Programming with Factored Representations
, 1997
"... Markov decision processes(MDPs) have proven to be popular models for decisiontheoretic planning, but standard dynamic programming algorithms for solving MDPs rely on explicit, statebased specifications and computations. To alleviate the combinatorial problems associated with such methods, we propo ..."
Abstract

Cited by 181 (10 self)
 Add to MetaCart
(Show Context)
Markov decision processes(MDPs) have proven to be popular models for decisiontheoretic planning, but standard dynamic programming algorithms for solving MDPs rely on explicit, statebased specifications and computations. To alleviate the combinatorial problems associated with such methods, we propose new representational and computational techniques for MDPs that exploit certain types of problem structure. We use dynamic Bayesian networks (with decision trees representing the local families of conditional probability distributions) to represent stochastic actions in an MDP, together with a decisiontree representation of rewards. Based on this representation, we develop versions of standard dynamic programming algorithms that directly manipulate decisiontree representations of policies and value functions. This generally obviates the need for statebystate computation, aggregating states at the leaves of these trees and requiring computations only for each aggregate state. The key to these algorithms is a decisiontheoretic generalization of classic regression analysis, in which we determine the features relevant to predicting expected value. We demonstrate the method empirically on several planning problems,
Multiagent Planning with Factored MDPs
 In NIPS14
, 2001
"... We present a principled and efficient planning algorithm for cooperative multiagent dynamic systems. A striking feature of our method is that the coordination and communication between the agents is not imposed, but derived directly from the system dynamics and function approximation architecture ..."
Abstract

Cited by 170 (16 self)
 Add to MetaCart
(Show Context)
We present a principled and efficient planning algorithm for cooperative multiagent dynamic systems. A striking feature of our method is that the coordination and communication between the agents is not imposed, but derived directly from the system dynamics and function approximation architecture. We view the entire multiagent system as a single, large Markov decision process (MDP), which we assume can be represented in a factored way using a dynamic Bayesian network (DBN). The action space of the resulting MDP is the joint action space of the entire set of agents. Our approach is based on the use of factored linear value functions as an approximation to the joint value function. This factorization of the value function allows the agents to coordinate their actions at runtime using a natural message passing scheme. We provide a simple and efficient method for computing such an approximate value function by solving a single linear program, whose size is determined by the interaction between the value function structure and the DBN. We thereby avoid the exponential blowup in the state and action space. We show that our approach compares favorably with approaches based on reward sharing. We also show that our algorithm is an efficient alternative to more complicated algorithms even in the single agent case.
Sequential optimality and coordination in multiagent systems
 In International Joint Conference on Artificial Intelligence
, 1999
"... Coordination of agent activities is a key problem in multiagent systems. Set in a larger decision theoretic context, the existence of coordination problems leads to difficulty in evaluating the utility of a situation. This in turn makes defining optimal policies for sequential decision processes pro ..."
Abstract

Cited by 161 (3 self)
 Add to MetaCart
(Show Context)
Coordination of agent activities is a key problem in multiagent systems. Set in a larger decision theoretic context, the existence of coordination problems leads to difficulty in evaluating the utility of a situation. This in turn makes defining optimal policies for sequential decision processes problematic. We propose a method for solving sequential multiagent decision problems by allowing agents to reason explicitly about specific coordination mechanisms. We define an extension of value iteration in which the system’s state space is augmented with the state of the coordination mechanism adopted, allowing agents to reason about the short and long term prospects for coordination, the long term consequences of (mis)coordination, and make decisions to engage or avoid coordination problems based on expected value. We also illustrate the benefits of mechanism generalization. 1
Efficient Solution Algorithms for Factored MDPs
, 2003
"... This paper addresses the problem of planning under uncertainty in large Markov Decision Processes (MDPs). Factored MDPs represent a complex state space using state variables and the transition model using a dynamic Bayesian network. This representation often allows an exponential reduction in the re ..."
Abstract

Cited by 161 (4 self)
 Add to MetaCart
This paper addresses the problem of planning under uncertainty in large Markov Decision Processes (MDPs). Factored MDPs represent a complex state space using state variables and the transition model using a dynamic Bayesian network. This representation often allows an exponential reduction in the representation size of structured MDPs, but the complexity of exact solution algorithms for such MDPs can grow exponentially in the representation size. In this paper, we present two approximate solution algorithms that exploit structure in factored MDPs. Both use an approximate value function represented as a linear combination of basis functions, where each basis function involves only a small subset of the domain variables. A key contribution of this paper is that it shows how the basic operations of both algorithms can be performed efficiently in closed form, by exploiting both additive and contextspecific structure in a factored MDP. A central element of our algorithms is a novel linear program decomposition technique, analogous to variable elimination in Bayesian networks, which reduces an exponentially large LP to a provably equivalent, polynomialsized one. One algorithm uses approximate linear programming, and the second approximate dynamic programming. Our dynamic programming algorithm is novel in that it uses an approximation based on maxnorm, a technique that more directly minimizes the terms that appear in error bounds for approximate MDP algorithms. We provide experimental results on problems with over 10^40 states, demonstrating a promising indication of the scalability of our approach, and compare our algorithm to an existing stateoftheart approach, showing, in some problems, exponential gains in computation time.
Bayesian Map Learning in Dynamic Environments
 In Neural Info. Proc. Systems (NIPS
"... We show how map learning can be formulated as inference in a graphical model, which allows us to handle changing environments in a natural manner. We describe several different approximation schemes for the problem, and illustrate some results on a simulated gridworld with doors that can open a ..."
Abstract

Cited by 157 (2 self)
 Add to MetaCart
We show how map learning can be formulated as inference in a graphical model, which allows us to handle changing environments in a natural manner. We describe several different approximation schemes for the problem, and illustrate some results on a simulated gridworld with doors that can open and close. We close by briefly discussing how to learn more general models of (partially observed) environments, which can contain a variable number of objects with changing internal state. 1 Introduction Mobile robots need to navigate in dynamic environments: on a short time scale, obstacles, such as people, can appear and disappear, and on longer time scales, structural changes, such as doors opening and closing, can occur. In this paper, we consider how to create models of dynamic environments. In particular, we are interested in modeling the location of objects, which we can represent using a map. This enables the robot to perform path planning, etc. We propose a Bayesian approach in ...
Symbolic Dynamic Programming for Firstorder MDPs
 In IJCAI
, 2001
"... We present a dynamic programming approach for the solution of firstorder Markov decisions processes. This technique uses an MDP whose dynamics is represented in a variant of the situation calculus allowing for stochastic actions. It produces a logical description of the optimal value function and p ..."
Abstract

Cited by 147 (4 self)
 Add to MetaCart
We present a dynamic programming approach for the solution of firstorder Markov decisions processes. This technique uses an MDP whose dynamics is represented in a variant of the situation calculus allowing for stochastic actions. It produces a logical description of the optimal value function and policy by constructing a set of firstorder formulae that minimally partition state space according to distinctions made by the value function and policy. This is achieved through the use of an operation known as decisiontheoretic regression. In effect, our algorithm performs value iteration without explicit enumeration of either the state or action spaces of the MDP. This allows problems involving relational fluents and quantification to be solved without requiring explicit state space enumeration or conversion to propositional form. 1
S.: Decisiontheoretic, highlevel agent programming in the situation calculus
 In: Proc. AAAI00, AAAI Press
, 2000
"... We propose a framework for robot programming which allows the seamless integration of explicit agent programming with decisiontheoretic planning. Specifically, the DTGolog model allows one to partially specify a control program in a highlevel, logical language, and provides an interpreter that, giv ..."
Abstract

Cited by 124 (5 self)
 Add to MetaCart
(Show Context)
We propose a framework for robot programming which allows the seamless integration of explicit agent programming with decisiontheoretic planning. Specifically, the DTGolog model allows one to partially specify a control program in a highlevel, logical language, and provides an interpreter that, given a logical axiomatization of a domain, will determine the optimal completion of that program (viewed as a Markov decision process). We demonstrate the utility of this model with results obtained in an office delivery robotics domain. 1
Planning under continuous time and resource uncertainty: A challenge for AI
 In Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence
, 2002
"... yQSS Group Inc. zQSS Group Inc. xRIACS experiment is assigned a scientific value). Different observations and experiments take differing amounts of time and consume differing amounts of power and data storage.There are, in general, a number of constraints that govern the rovers activities: ffl Ther ..."
Abstract

Cited by 118 (19 self)
 Add to MetaCart
(Show Context)
yQSS Group Inc. zQSS Group Inc. xRIACS experiment is assigned a scientific value). Different observations and experiments take differing amounts of time and consume differing amounts of power and data storage.There are, in general, a number of constraints that govern the rovers activities: ffl There are time, power, data storage, and positioning constraints for performing different activities. Time constraints often result from illuminationrequirementthat is, experiments may require that a target rock or sample be illuminated with a certain intensity, or from a certain angle.