Results 1 - 10
of
116,030
Q-learning with linear function approximation
- Proceedings of the 20th Annual Conference on Learning Theory
, 2007
"... In this paper, we analyze the convergence of Q-learning with linear function approximation. We identify a set of conditions that implies the convergence of this method with probability 1, when a fixed learning policy is used. We discuss the differences and similarities between our results and those ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
In this paper, we analyze the convergence of Q-learning with linear function approximation. We identify a set of conditions that implies the convergence of this method with probability 1, when a fixed learning policy is used. We discuss the differences and similarities between our results and those
Optimality of reinforcement learning algorithms with linear function approximation
- In NIPS
, 2002
"... There are several reinforcement learning algorithms that yield ap-proximate solutions for the problem of policy evaluation when the value function is represented with a linear function approximator. In this paper we show that each of the solutions is optimal with respect to a specific objective func ..."
Abstract
-
Cited by 32 (2 self)
- Add to MetaCart
There are several reinforcement learning algorithms that yield ap-proximate solutions for the problem of policy evaluation when the value function is represented with a linear function approximator. In this paper we show that each of the solutions is optimal with respect to a specific objective
Convergence of Q-learning with linear function approximation
"... Abstract — In this paper, we analyze the convergence properties of Q-learning using linear function approximation. This algorithm can be seen as an extension to stochastic control settings of TD-learning using linear function approximation, as described in [1]. We derive a set of conditions that imp ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Abstract — In this paper, we analyze the convergence properties of Q-learning using linear function approximation. This algorithm can be seen as an extension to stochastic control settings of TD-learning using linear function approximation, as described in [1]. We derive a set of conditions
On the Convergence of Temporal-Difference Learning with Linear Function Approximation
"... Abstract. The asymptotic properties of temporal-difference learning algorithms with linear function approxi-mation are analyzed in this paper. The analysis is carried out in the context of the approximation of a discounted cost-to-go function associated with an uncontrolled Markov chain with an unco ..."
Abstract
- Add to MetaCart
Abstract. The asymptotic properties of temporal-difference learning algorithms with linear function approxi-mation are analyzed in this paper. The analysis is carried out in the context of the approximation of a discounted cost-to-go function associated with an uncontrolled Markov chain
Convergence of Synchronous Reinforcement Learning with Linear Function Approximation
, 2004
"... Synchronous reinforcement learning (RL) algorithms with linear function approximation are representable as inhomogeneous matrix iterations of a special form (Schoknecht & Merke, 2003). In this paper we state conditions of convergence for general inhomogeneous matrix iterations and prove th ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Synchronous reinforcement learning (RL) algorithms with linear function approximation are representable as inhomogeneous matrix iterations of a special form (Schoknecht & Merke, 2003). In this paper we state conditions of convergence for general inhomogeneous matrix iterations and prove
Improved Temporal Difference Methods with Linear Function Approximation
"... This chapter considers temporal difference algorithms within the context of infinite-horizon finite-state dynamic programming problems with discounted cost and linear cost function approximation. This problem arises as a subproblem in the policy iteration method of dynamic programming. Additional d ..."
Abstract
-
Cited by 32 (7 self)
- Add to MetaCart
This chapter considers temporal difference algorithms within the context of infinite-horizon finite-state dynamic programming problems with discounted cost and linear cost function approximation. This problem arises as a subproblem in the policy iteration method of dynamic programming. Additional
Convergent fitted value iteration with linear function approximation
- In Advances in Neural Information Processing Systems
, 2011
"... Abstract Fitted value iteration (FVI) with ordinary least squares regression is known to diverge. We present a new method, "Expansion-Constrained Ordinary Least Squares" (ECOLS), that produces a linear approximation but also guarantees convergence when used with FVI. To ensure convergence ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract Fitted value iteration (FVI) with ordinary least squares regression is known to diverge. We present a new method, "Expansion-Constrained Ordinary Least Squares" (ECOLS), that produces a linear approximation but also guarantees convergence when used with FVI. To ensure
Non-linear Functional Approximation of Heterogeneous Dynamics
, 2005
"... In modeling phenomena continuously observed and/or sampled at discrete time sequences, on problem is that often dynamics come from heterogeneous sources of uncertainty. This turns out particularly challenging with a low signal-to-noise ratio, due to the structural or experimental conditions; for ins ..."
Abstract
- Add to MetaCart
; for instance, information appears dispersed in a wide spectrum of frequency bands or resolution levels. We aim to design ad hoc approximation instruments dealing with a particularly complex class of random processes, the one that generates financial returns, or their aggregates as index returns. The underlying
The Stability of General Discounted Reinforcement Learning with Linear Function Approximation
- In Proceedings of the UK Workshop on Computational Intelligence (UKCI-02
, 2002
"... This paper shows that general discounted return estimating reinforcement learning algorithms cannot diverge to infinity when a form of linear function approximator is used for approximating the value-function or Q-function. The results are significant insofar as examples of divergence of the value-f ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
This paper shows that general discounted return estimating reinforcement learning algorithms cannot diverge to infinity when a form of linear function approximator is used for approximating the value-function or Q-function. The results are significant insofar as examples of divergence of the value-function
A Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation
"... Multi-layer feedforward neural networks with sigmoidal activation functions have been termed "universal function approximators". Although these types of networks can approximate any continuous function to a desired degree of accuracy, this approximation may require an inordinate number of ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
of hidden nodes and is only accurate over a finite interval. These short comings are due to the standard multi-layer perceptron's (MLP) architecture not being well suited to unbounded non-linear function approximation. A new architecture incorporating a logarithmic hidden layer proves to be superior
Results 1 - 10
of
116,030