Results 1  10
of
47
Fast and Efficient Reinforcement Learning with Truncated Temporal Differences
 In Proceedings of the Twelfth International Conference on Machine Learning (ML95
, 1995
"... The problem of temporal credit assignment in reinforcement learning is typically solved using algorithms based on the methods of temporal differences TD(lambda). Of those, Qlearning is currently best understood and most widely used. Using TDbased algorithms with ? 0 often allows one to speed up th ..."
Abstract

Cited by 9 (5 self)
 Add to MetaCart
The problem of temporal credit assignment in reinforcement learning is typically solved using algorithms based on the methods of temporal differences TD(lambda). Of those, Qlearning is currently best understood and most widely used. Using TDbased algorithms with ? 0 often allows one to speed up
Truncating temporal differences: On the efficient implementation of TD(λ) for reinforcement learning
 Journal of Artificial Intelligence Research
, 1995
"... Temporal difference (TD) methods constitute a class of methods for learning predictions in multistep prediction problems, parameterized by a recency factor λ. Currently the most important application of these methods is to temporal credit assignment in reinforcement learning. Well known reinforceme ..."
Abstract

Cited by 29 (9 self)
 Add to MetaCart
Temporal difference (TD) methods constitute a class of methods for learning predictions in multistep prediction problems, parameterized by a recency factor λ. Currently the most important application of these methods is to temporal credit assignment in reinforcement learning. Well known
Truncated Temporal Differences with Function Approximation: Successful Examples Using CMAC
 In Proceedings of the Thirteenth European Symposium on Cybernetics and Systems Research (EMCSR96
, 1996
"... Combining reinforcement learning algorithms with function approximators in order to generalize over the state space has recently received particular interest and is widely believed to be one of the crucial issues for scaling reinforcement learning to practically interesting domains. This paper exami ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
examines the combination of the TTD procedure, a computationally efficient approximate implementation of TD(lambda) methods, with CMAC, a function approximator especially suitable for reinforcement learning due to its computational efficiency and online learning capability. Most of previous studies have
Truncated Temporal Differences and Sequential Replay: Comparison, Integration, and Experiments
"... This paper examines two techniques for speeding up reinforcement learning algorithms based on the methods of temporal differences (TD). The first of them, recently developed by the author and known as the TTD procedure, is an approximate implementation of TD( ? 0), significantly more computationa ..."
Abstract
 Add to MetaCart
This paper examines two techniques for speeding up reinforcement learning algorithms based on the methods of temporal differences (TD). The first of them, recently developed by the author and known as the TTD procedure, is an approximate implementation of TD( ? 0), significantly more
Learning of Sequential Movements By Neural Network Model With DopamineLike Reinforcement Signal
, 1998
"... Dopamine neurons appear to code an error in the prediction of reward. They are activated by unpredicted rewards, are not influenced by predicted rewards, and are depressed when a predicted reward is omitted. After conditioning, they respond to rewardpredicting stimuli in a similar manner. With thes ..."
Abstract

Cited by 60 (1 self)
 Add to MetaCart
. With these characteristics, the dopamine response strongly resembles the predictive reinforcement teaching signal of neural network models implementing the temporal difference learning algorithm. This study explored a neural network model that used a rewardprediction error signal strongly resembling dopamine responses
Kernel LeastSquares Temporal Difference Learning Kernel LeastSquares Temporal Difference Learning
"... Kernel methods have attracted many research interests recently since by utilizing Mercer kernels, nonlinear and nonparametric versions of conventional supervised or unsupervised learning algorithms can be implemented and usually better generalization abilities can be obtained. However, kernel meth ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
methods in reinforcement learning have not been popularly studied in the literature. In this paper, we present a novel kernelbased leastsquares temporaldifference (TD) learning algorithm called KLSTD(λ), which can be viewed as the kernel version or nonlinear form of the previous linear LSTD
Reinforcement learning of dynamic motor sequence: Learning to stand up
 Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems
, 1998
"... In this paper, we propose a learning method for implementing humanlike sequential movements in robots. As an example of dynamic sequential movement, we consider the "standup" task for a twojoint, threelink robot. In contrast to the case of steady walking or standing, the desired trajec ..."
Abstract

Cited by 22 (3 self)
 Add to MetaCart
utilizing the momentum of its body. We use reinforcement learning, in particular, a continuous time and state temporal difference (TD) learning method. For successful results, we use 1) an efficient method of value function approximation in a highdimensional state space, and 2) a hierarchical architecture
Reward Temporal
"... Abstract Reinforcement learning is ubiquitous. Unlike other forms of learning, it involves the processing of fast yet contentpoor feedback information to correct assumptions about the nature of a task or of a set of stimuli. This feedback information is often delivered as generic rewards or punishm ..."
Abstract
 Add to MetaCart
or punishments, and has little to do with the stimulus features to be learned. How can such lowcontent feedback lead to such an efficient learning paradigm? Through a review of existing neurocomputational models of reinforcement learning, we suggest that the efficiency of this type of learning resides
Faster Temporal Credit Assignment in Learning Classifier Systems
 In Proceedings of the First Polish Conference on Evolutionary Algorithms
, 1996
"... Classifier systems are geneticsbased learning systems using the paradigm of reinforcement learning. In the most challenging case of delayed reinforcement, it involves a difficult temporal credit assignment problem. Standard classifier systems solve this problem using the bucket brigade algorithm. I ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
. In this paper we show how to make the temporal credit assignment process faster by augmenting this algorithm by some refinements borrowed from a related field of reinforcement learning algorithms based on the methods of temporal differences (TD). These algorithms usually converge significantly faster
A Universal Generalization for TemporalDifference Learning Using Haar Basis Functions
 Proceedings of the Seventeenth International Conference on Machine Learnging
, 2000
"... We propose an algorithm efficiently implementing TD() using (the infinite tree of) Haar basis functions. The algorithm can maintain and update the information of the infinite tree of coefficients in its finitely compressed form by taking advantage of the fact that the information obtained from finit ..."
Abstract
 Add to MetaCart
We propose an algorithm efficiently implementing TD() using (the infinite tree of) Haar basis functions. The algorithm can maintain and update the information of the infinite tree of coefficients in its finitely compressed form by taking advantage of the fact that the information obtained from
Results 1  10
of
47