Results 11  20
of
123
Temporal Abstraction in Reinforcement Learning
, 2000
"... Decision making usually involves choosing among different courses of action over a broad range of time scales. For instance, a person planning a trip to a distant location makes highlevel decisions regarding what means of transportation to use, but also chooses lowlevel actions, such as the moveme ..."
Abstract

Cited by 64 (2 self)
 Add to MetaCart
Decision making usually involves choosing among different courses of action over a broad range of time scales. For instance, a person planning a trip to a distant location makes highlevel decisions regarding what means of transportation to use, but also chooses lowlevel actions, such as the movements for getting into a car. The problem of picking an appropriate time scale for reasoning and learning has been explored in artificial intelligence, control theory and robotics. In this dissertation we develop a framework that allows novel solutions to this problem, in the context of Markov Decision Processes (MDPs) and reinforcement learning. In this dissertation, we present a general framework for prediction, control and learning at multipl...
Offpolicy temporaldifference learning with function approximation
 Proceedings of the 18th International Conference on Machine Learning
, 2001
"... We introduce the first algorithm for offpolicy temporaldifference learning that is stable with linear function approximation. Offpolicy learning is of interest because it forms the basis for popular reinforcement learning methods such as Qlearning, which has been known to diverge with linear fun ..."
Abstract

Cited by 59 (12 self)
 Add to MetaCart
(Show Context)
We introduce the first algorithm for offpolicy temporaldifference learning that is stable with linear function approximation. Offpolicy learning is of interest because it forms the basis for popular reinforcement learning methods such as Qlearning, which has been known to diverge with linear function approximation, and because it is critical to the practical utility of multiscale, multigoal, learning frameworks such as options, HAMs, and MAXQ. Our new algorithm combines TD(λ) over state–action pairs with importance sampling ideas from our previous work. We prove that, given training under any ɛsoft policy, the algorithm converges w.p.1 to a close approximation (as in Tsitsiklis and Van Roy, 1997; Tadic, 2001) to the actionvalue function for an arbitrary target policy. Variations of the algorithm designed to reduce variance introduce additional bias but are also guaranteed convergent. We also illustrate our method empirically on a small policy evaluation problem. Our current results are limited to episodic tasks with episodes of bounded length. 1 Although Qlearning remains the most popular of all reinforcement learning algorithms, it has been known since about 1996 that it is unsound with linear function approximation (see Gordon, 1995; Bertsekas and Tsitsiklis, 1996). The most telling counterexample, due to Baird (1995) is a sevenstate Markov decision process with linearly independent feature vectors, for which an exact solution exists, yet 1 This is a retypeset version of an article published in the Proceedings
Flexible Decomposition Algorithms for Weakly Coupled Markov Decision Problems
 In Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence
, 1998
"... This paper presents two new approaches to decomposing and solving large Markov decision problems (MDPs), a partial decoupling method and a complete decoupling method. In these approaches, a large, stochastic decision problem is divided into smaller pieces. The first approach builds a cache of polici ..."
Abstract

Cited by 52 (0 self)
 Add to MetaCart
(Show Context)
This paper presents two new approaches to decomposing and solving large Markov decision problems (MDPs), a partial decoupling method and a complete decoupling method. In these approaches, a large, stochastic decision problem is divided into smaller pieces. The first approach builds a cache of policies for each part of the problem independently, and then combines the pieces in a separate, lightweight step. A second approach also divides the problem into smaller pieces, but information is communicatedbetween the different problem pieces, allowing intelligent decisions to be made about which piece requires the most attention. Both approaches can be used to find optimal policies or approximately optimal policies with provable bounds. These algorithms also provide a framework for the efficient transfer of knowledge across problems that share similar structure. 1 Introduction The Markov Decision Problem (MDP) framework provides a formal framework for modeling a large variety of stochastic,...
An Overview of MAXQ Hierarchical Reinforcement Learning
 IN ABSTRACTION, REFORMULATION, AND APPROXIMATION
, 2000
"... . Reinforcement learning addresses the problem of learning optimal policies for sequential decisionmaking problems involving stochastic operators and numerical reward functions rather than the more traditional deterministic operators and logical goal predicates. In many ways, reinforcement lear ..."
Abstract

Cited by 40 (0 self)
 Add to MetaCart
(Show Context)
. Reinforcement learning addresses the problem of learning optimal policies for sequential decisionmaking problems involving stochastic operators and numerical reward functions rather than the more traditional deterministic operators and logical goal predicates. In many ways, reinforcement learning research is recapitulating the development of classical research in planning and problem solving. After studying the problem of solving "flat" problem spaces, researchers have recently turned their attention to hierarchical methods that incorporate subroutines and state abstractions. This paper gives an overview of the MAXQ value function decomposition and its support for state abstraction and action abstraction. 1 Introduction Reinforcement learning studies the problem of a learning agent that interacts with an unknown, stochastic, but fullyobservable environment. This problem can be formalized as a Markov decision process (MDP), and reinforcement learning research has develop...
Hierarchical multiagent reinforcement learning
 Proceedings of the Fifth International Conference on Autonomous Agents
, 2001
"... Consider sending a team of robots to carry out reconnaissance of an indoor environment to check for intruders. ..."
Abstract

Cited by 39 (10 self)
 Add to MetaCart
(Show Context)
Consider sending a team of robots to carry out reconnaissance of an indoor environment to check for intruders.
A Survey of POMDP Solution Techniques
, 2000
"... this paper, we assume all actions take one unit of discrete time at some (unspecied) time scale. If we allow actions to take variable lengths of time, we end up with a semiMarkov model; see e.g., [SPS99]. ..."
Abstract

Cited by 34 (0 self)
 Add to MetaCart
(Show Context)
this paper, we assume all actions take one unit of discrete time at some (unspecied) time scale. If we allow actions to take variable lengths of time, we end up with a semiMarkov model; see e.g., [SPS99].
Causal Graph Based Decomposition of Factored MDPs
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... We present Variable Influence Structure Analysis, or VISA, an algorithm that performs hierarchical decomposition of factored Markov decision processes. VISA uses a dynamic Bayesian network model of actions, and constructs a causal graph that captures relationships between state variables. In tasks ..."
Abstract

Cited by 34 (5 self)
 Add to MetaCart
We present Variable Influence Structure Analysis, or VISA, an algorithm that performs hierarchical decomposition of factored Markov decision processes. VISA uses a dynamic Bayesian network model of actions, and constructs a causal graph that captures relationships between state variables. In tasks
DecisionTheoretic Planning with Concurrent Temporally Extended Actions
 In UAI’01
, 2001
"... We investigate a model for planning under ..."
Nearly deterministic abstractions of Markov decision processes
 In AAAI2002
, 2002
"... We examine scaling issues for a restricted class of compactly representable Markov decision process planning problems. For one stochastic mobile robotics package delivery problem it is possible to decouple the stochastic localnavigation problem from the deterministic globalrouting one and to solv ..."
Abstract

Cited by 24 (5 self)
 Add to MetaCart
(Show Context)
We examine scaling issues for a restricted class of compactly representable Markov decision process planning problems. For one stochastic mobile robotics package delivery problem it is possible to decouple the stochastic localnavigation problem from the deterministic globalrouting one and to solve each with dedicated methods. Careful construction of macro actions allows us to effectively “hide ” navigational stochasticity from the global routing problem and to approximate the latter with offtheshelf combinatorial optimization routines for the traveling salesdroid problem, yielding a net exponential speedup in planning performance. We give analytic conditions on when the macros are close enough to deterministic for the approximation to be good and demonstrate the performance of our method on small and large simulated navigation problems.