Results 1  10
of
2,133
Policy gradient methods for reinforcement learning with function approximation.
 In NIPS,
, 1999
"... Abstract Function approximation is essential to reinforcement learning, but the standard approach of approximating a value function and determining a policy from it has so far proven theoretically intractable. In this paper we explore an alternative approach in which the policy is explicitly repres ..."
Abstract

Cited by 439 (20 self)
 Add to MetaCart
Abstract Function approximation is essential to reinforcement learning, but the standard approach of approximating a value function and determining a policy from it has so far proven theoretically intractable. In this paper we explore an alternative approach in which the policy is explicitly
Simple statistical gradientfollowing algorithms for connectionist reinforcement learning
 Machine Learning
, 1992
"... Abstract. This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are shown to make weight adjustments in a direction that lies along the gradient of expected reinfor ..."
Abstract

Cited by 449 (0 self)
 Add to MetaCart
reinforcement in both immediatereinforcement tasks and certain limited forms of delayedreinforcement tasks, and they do this without explicitly computing gradient estimates or even storing information from which such estimates could be computed. Specific examples of such algorithms are presented, some
Reinforcement Learning In Continuous Time and Space
 Neural Computation
, 2000
"... This paper presents a reinforcement learning framework for continuoustime dynamical systems without a priori discretization of time, state, and action. Based on the HamiltonJacobiBellman (HJB) equation for infinitehorizon, discounted reward problems, we derive algorithms for estimating value f ..."
Abstract

Cited by 176 (7 self)
 Add to MetaCart
This paper presents a reinforcement learning framework for continuoustime dynamical systems without a priori discretization of time, state, and action. Based on the HamiltonJacobiBellman (HJB) equation for infinitehorizon, discounted reward problems, we derive algorithms for estimating value
KernelBased Reinforcement Learning
 Machine Learning
, 1999
"... We present a kernelbased approach to reinforcement learning that overcomes the stability problems of temporaldifference learning in continuous statespaces. First, our algorithm converges to a unique solution of an approximate Bellman's equation regardless of its initialization values. Second ..."
Abstract

Cited by 153 (1 self)
 Add to MetaCart
function estimate is a Gaussian process. This information is useful in studying the biasvariance tradeo in reinforcement learning. We find that all reinforcement learning approaches to estimating the value function, parametric or nonparametric, are subject to a bias. This bias is typically larger
Reinforcement Learning in the MultiRobot Domain
 Autonomous Robots
, 1997
"... This paper describes a formulation of reinforcement learning that enables learning in noisy, dynamic environemnts such as in the complex concurrent multirobot learning domain. The methodology involves minimizing the learning space through the use behaviors and conditions, and dealing with the credi ..."
Abstract

Cited by 167 (20 self)
 Add to MetaCart
with the credit assignment problem through shaped reinforcement in the form of heterogeneous reinforcement functions and progress estimators. We experimentally validate the approach on a group of four mobile robots learning a foraging task. 1 Introduction Developing effective methods for realtime learning has
Infinitehorizon policygradient estimation
 Journal of Artificial Intelligence Research
, 2001
"... Gradientbased approaches to direct policy search in reinforcement learning have received much recent attention as a means to solve problems of partial observability and to avoid some of the problems associated with policy degradation in valuefunction methods. In this paper we introduce � � , a si ..."
Abstract

Cited by 208 (5 self)
 Add to MetaCart
Gradientbased approaches to direct policy search in reinforcement learning have received much recent attention as a means to solve problems of partial observability and to avoid some of the problems associated with policy degradation in valuefunction methods. In this paper we introduce � � , a
Reward Functions for Accelerated Learning
 In Proceedings of the Eleventh International Conference on Machine Learning
, 1994
"... This paper discusses why traditional reinforcement learning methods, and algorithms applied to those models, result in poor performance in situated domains characterized by multiple goals, noisy state, and inconsistent reinforcement. We propose a methodology for designing reinforcement functions tha ..."
Abstract

Cited by 195 (14 self)
 Add to MetaCart
that take advantage of implicit domain knowledge in order to accelerate learning in such domains. The methodology involves the use of heterogeneous reinforcement functions and progress estimators, and applies to learning in domains with a single agent or with multiple agents. The methodology
A Bayesian framework for reinforcement learning
 In Proceedings of the Seventeenth International Conference on Machine Learning
, 2000
"... The reinforcement learning problem can be decomposed into two parallel types of inference: (i) estimating the parameters of a model for the underlying process; (ii) determining behavior which maximizes return under the estimated model. Following Dearden, Friedman and Andre (1999), it is proposed tha ..."
Abstract

Cited by 109 (1 self)
 Add to MetaCart
The reinforcement learning problem can be decomposed into two parallel types of inference: (i) estimating the parameters of a model for the underlying process; (ii) determining behavior which maximizes return under the estimated model. Following Dearden, Friedman and Andre (1999), it is proposed
Learning Without StateEstimation in Partially Observable Markovian Decision Processes
 In Proceedings of the Eleventh International Conference on Machine Learning
, 1994
"... Reinforcement learning (RL) algorithms provide a sound theoretical basis for building learning control architectures for embedded agents. Unfortunately all of the theory and much of the practice (see Barto et al., 1983, for an exception) of RL is limited to Markovian decision processes (MDPs). Many ..."
Abstract

Cited by 158 (5 self)
 Add to MetaCart
Reinforcement learning (RL) algorithms provide a sound theoretical basis for building learning control architectures for embedded agents. Unfortunately all of the theory and much of the practice (see Barto et al., 1983, for an exception) of RL is limited to Markovian decision processes (MDPs). Many
Average Reward Reinforcement Learning: Foundations, Algorithms, and Empirical Results
, 1996
"... This paper presents a detailed study of average reward reinforcement learning, an undiscounted optimality framework that is more appropriate for cyclical tasks than the much better studied discounted framework. A wide spectrum of average reward algorithms are described, ranging from synchronous dyna ..."
Abstract

Cited by 130 (13 self)
 Add to MetaCart
This paper presents a detailed study of average reward reinforcement learning, an undiscounted optimality framework that is more appropriate for cyclical tasks than the much better studied discounted framework. A wide spectrum of average reward algorithms are described, ranging from synchronous
Results 1  10
of
2,133