Results 1  10
of
39
Efficient Learning and Planning Within the Dyna Framework
 Adaptive Behavior
, 1993
"... Sutton's Dyna framework provides a novel and computationally appealing way to integrate learning, planning, and reacting in autonomous agents. Examined here is a class of strategies designed to enhance the learning and planning power of Dyna systems by increasing their computational efficiency. The ..."
Abstract

Cited by 94 (3 self)
 Add to MetaCart
Sutton's Dyna framework provides a novel and computationally appealing way to integrate learning, planning, and reacting in autonomous agents. Examined here is a class of strategies designed to enhance the learning and planning power of Dyna systems by increasing their computational efficiency. The benefit of using these strategies is demonstrated on some simple abstract learning tasks. 1 Introduction Many problems faced by an autonomous agent in an unknown environment can be cast in the form of reinforcement learning tasks. Recent work in this area has led to a clearer understanding of the relationship between algorithms found useful for such tasks and asynchronous approaches to dynamic programming (Bertsekas & Tsitsiklis, 1989), and this understanding has led in turn to both new results relevant to the theory of dynamic programming (Barto, Bradtke, & Singh, 1991; Watkins & Dayan, 1991; Williams & Baird, 1990) and the creation of new reinforcement learning algorithms, such as Qlearn...
Learning to Solve Markovian Decision Processes
, 1994
"... This dissertation is about building learning control architectures for agents embedded in finite, stationary, and Markovian environments. Such architectures give embedded agents the ability to improve autonomously the efficiency with which they can achieve goals. Machine learning researchers have d ..."
Abstract

Cited by 48 (3 self)
 Add to MetaCart
This dissertation is about building learning control architectures for agents embedded in finite, stationary, and Markovian environments. Such architectures give embedded agents the ability to improve autonomously the efficiency with which they can achieve goals. Machine learning researchers have developed reinforcement learning (RL) algorithms based on dynamic programming (DP) that use the agent's experience in its environment to improve its decision policy incrementally. This is achieved by adapting an evaluation function in such a way that the decision policy that is "greedy" with respect to it improves with experience. This dissertation focuses on finite, stationary and Markovian environments for two reasons: it allows the develop...
Continuous CaseBased Reasoning
, 1996
"... Casebased reasoning systems have traditionally been used to perform highlevel reasoning in problem domains that can be adequately described using discrete, symbolic representations. However, many realworld problem domains, such as autonomous robotic navigation, are better characterized using cont ..."
Abstract

Cited by 47 (5 self)
 Add to MetaCart
Casebased reasoning systems have traditionally been used to perform highlevel reasoning in problem domains that can be adequately described using discrete, symbolic representations. However, many realworld problem domains, such as autonomous robotic navigation, are better characterized using continuous representations. Such problem domains also require continuous performance, such as online sensorimotor interaction with the environment, and continuous adaptation and learning during the performance task. This article introduces a new method for continuous casebased reasoning, and discusses its application to the dynamic selection, modification, and acquisition of robot behaviors in an autonomous navigation system, SINS (SelfImproving Navigation System). The computer program and the underlying method are systematically evaluated through statistical analysis of results from several empirical studies. The article concludes with a general discussion of casebased reasoning issues addr...
Exploration of MultiState Environments: Local Measures and BackPropagation of Uncertainty
, 1998
"... . This paper presents an action selection technique for reinforcement learning in stationary Markovian environments. This technique may be used in direct algorithms such as Qlearning, or in indirect algorithms such as adaptive dynamic programming. It is based on two principles. The rst is to dene a ..."
Abstract

Cited by 44 (1 self)
 Add to MetaCart
. This paper presents an action selection technique for reinforcement learning in stationary Markovian environments. This technique may be used in direct algorithms such as Qlearning, or in indirect algorithms such as adaptive dynamic programming. It is based on two principles. The rst is to dene a local measure of the uncertainty using the theory of bandit problems. We show that such a measure suers from several drawbacks. In particular, a direct application of it leads to algorithms of low quality that can be easily misled by particular congurations of the environment. The second basic principle was introduced to eliminate this drawback. It consists of assimilating the local measures of uncertainty to rewards, and backpropagating them with the dynamic programming or temporal dierence mechanisms. This allows reproducing globalscale reasoning about the uncertainty, using only local measures of it. Numerical simulations clearly show the eciency of these propositions. Keywords: ...
A Comparison of Direct and ModelBased Reinforcement Learning
 IN INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION
, 1997
"... This paper compares direct reinforcement learning (no explicit model) and modelbased reinforcement learning on a simple task: pendulum swing up. We find that in this task modelbased approaches support reinforcement learning from smaller amounts of training data and efficient handling of changing g ..."
Abstract

Cited by 43 (1 self)
 Add to MetaCart
This paper compares direct reinforcement learning (no explicit model) and modelbased reinforcement learning on a simple task: pendulum swing up. We find that in this task modelbased approaches support reinforcement learning from smaller amounts of training data and efficient handling of changing goals. 1 Introduction Many proposed reinforcement learning algorithms require large amounts of training data before achieving acceptable performance. This paper explores the training data requirements of two kinds of reinforcement learning algorithms, direct (modelfree) and indirect (modelbased), when continuous actions are available. Direct reinforcement learning algorithms learn a policy or value function without explicitly representing a model of the controlled system (Sutton et al., 1992). Modelbased approaches learn an explicit model of the system simultaneously with a value function and policy (Sutton, 1990, 1991a,b; Barto et al., 1995; Kaelbling et al., 1996). We find that in the p...
Truncating temporal differences: On the efficient implementation of TD(λ) for reinforcement learning
 Journal of Artificial Intelligence Research
, 1995
"... Temporal difference (TD) methods constitute a class of methods for learning predictions in multistep prediction problems, parameterized by a recency factor λ. Currently the most important application of these methods is to temporal credit assignment in reinforcement learning. Well known reinforceme ..."
Abstract

Cited by 25 (8 self)
 Add to MetaCart
Temporal difference (TD) methods constitute a class of methods for learning predictions in multistep prediction problems, parameterized by a recency factor λ. Currently the most important application of these methods is to temporal credit assignment in reinforcement learning. Well known reinforcement learning algorithms, such as AHC or Qlearning, may be viewed as instances of TD learning. This paper examines the issues of the efficient and general implementation of TD(λ) for arbitrary λ, for use with reinforcement learning algorithms optimizing the discounted sum of rewards. The traditional approach, based on eligibility traces, is argued to suffer from both inefficiency and lack of generality. The TTD (Truncated Temporal Differences) procedure is proposed as an alternative, that indeed only approximates TD(λ), but requires very little computation per action and can be used with arbitrary function representation methods. The idea from which it is derived is fairly simple and not new, but probably unexplored so far. Encouraging experimental results are presented, suggesting that using λ > 0 with the TTD procedure allows one to obtain a significant learning speedup at essentially the same cost as usual TD(0) learning.
Incremental Dynamic Programming for OnLine Adaptive Optimal Control
, 1994
"... Reinforcement learning algorithms based on the principles of Dynamic Programming (DP) have enjoyed a great deal of recent attention both empirically and theoretically. These algorithms have been referred to generically as Incremental Dynamic Programming (IDP) algorithms. IDP algorithms are intended ..."
Abstract

Cited by 20 (2 self)
 Add to MetaCart
Reinforcement learning algorithms based on the principles of Dynamic Programming (DP) have enjoyed a great deal of recent attention both empirically and theoretically. These algorithms have been referred to generically as Incremental Dynamic Programming (IDP) algorithms. IDP algorithms are intended for use in situations where the information or computational resources needed by traditional dynamic programming algorithms are not available. IDP algorithms attempt to find a global solution to a DP problem by incrementally improving local constraint satisfaction properties as experience is gained through interaction with the environment. This class of algorithms is not new, going back at least as far as Samuel's adaptive checkersplaying programs,...
On Reinforcement Learning of Control Actions in Noisy and NonMarkovian Domains
, 1994
"... If reinforcement learning (RL) techniques are to be used for "real world" dynamic system control, the problems of noise and plant disturbance will have to be addressed. This study investigates the effects of noise/disturbance on five different RL algorithms: Watkins' QLearning (QL); Barto, Sutton a ..."
Abstract

Cited by 18 (1 self)
 Add to MetaCart
If reinforcement learning (RL) techniques are to be used for "real world" dynamic system control, the problems of noise and plant disturbance will have to be addressed. This study investigates the effects of noise/disturbance on five different RL algorithms: Watkins' QLearning (QL); Barto, Sutton and Anderson's Adaptive Heuristic Critic (AHC); Sammut and Law's modern variant of Michie and Chamber's BOXES algorithm; and two new algorithms developed during the course of this study. Both these new algorithms are conceptually related to QL; both algorithms, called PTrace and QTrace respectively, provide for substantially faster learning than straight QL overall, and for dramatically faster learning (by up to a factor of 200) in the special case of learning in a noisy environment for the dynamic system studied here (a poleandcart simulation). As well as speeding learning, both the PTrace and QTrace algorithms have been designed to preserve the "convergence with probability 1" formal properties of standard QL, i.e. that they be provably "correct" algorithms for Markovian domains for the same conditions that QL is guaranteed to be correct. We present both arguments and experimental evidence that "trace" methods mayprove to be both faster and more powerful in general than TD (Temporal Difference) methods. The potential performance improvements using trace over pure TD methods may turn out to be particularly important when learning is to occur in noisy or stochastic environments, and in the case where the domain is not wellmodelled by Markovian processes.
A Tutorial Survey of Reinforcement Learning
"... This paper gives a compact, selfcontained tutorial survey of reinforcement learning, a tool that is increasingly nding application in the development of intelligent dynamic systems. Research on reinforcement learning during the past decade has led to the development of a variety of useful algorit ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
This paper gives a compact, selfcontained tutorial survey of reinforcement learning, a tool that is increasingly nding application in the development of intelligent dynamic systems. Research on reinforcement learning during the past decade has led to the development of a variety of useful algorithms. This paper surveys the literature and presents the algorithms in a cohesive framework.