• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 116,030
Next 10 →

Q-learning with linear function approximation

by Francisco S. Melo, M. Isabel Ribeiro - Proceedings of the 20th Annual Conference on Learning Theory , 2007
"... In this paper, we analyze the convergence of Q-learning with linear function approximation. We identify a set of conditions that implies the convergence of this method with probability 1, when a fixed learning policy is used. We discuss the differences and similarities between our results and those ..."
Abstract - Cited by 6 (1 self) - Add to MetaCart
In this paper, we analyze the convergence of Q-learning with linear function approximation. We identify a set of conditions that implies the convergence of this method with probability 1, when a fixed learning policy is used. We discuss the differences and similarities between our results and those

Optimality of reinforcement learning algorithms with linear function approximation

by Ralf Schoknecht - In NIPS , 2002
"... There are several reinforcement learning algorithms that yield ap-proximate solutions for the problem of policy evaluation when the value function is represented with a linear function approximator. In this paper we show that each of the solutions is optimal with respect to a specific objective func ..."
Abstract - Cited by 32 (2 self) - Add to MetaCart
There are several reinforcement learning algorithms that yield ap-proximate solutions for the problem of policy evaluation when the value function is represented with a linear function approximator. In this paper we show that each of the solutions is optimal with respect to a specific objective

Convergence of Q-learning with linear function approximation

by Francisco S. Melo
"... Abstract — In this paper, we analyze the convergence properties of Q-learning using linear function approximation. This algorithm can be seen as an extension to stochastic control settings of TD-learning using linear function approximation, as described in [1]. We derive a set of conditions that imp ..."
Abstract - Cited by 6 (0 self) - Add to MetaCart
Abstract — In this paper, we analyze the convergence properties of Q-learning using linear function approximation. This algorithm can be seen as an extension to stochastic control settings of TD-learning using linear function approximation, as described in [1]. We derive a set of conditions

On the Convergence of Temporal-Difference Learning with Linear Function Approximation

by unknown authors
"... Abstract. The asymptotic properties of temporal-difference learning algorithms with linear function approxi-mation are analyzed in this paper. The analysis is carried out in the context of the approximation of a discounted cost-to-go function associated with an uncontrolled Markov chain with an unco ..."
Abstract - Add to MetaCart
Abstract. The asymptotic properties of temporal-difference learning algorithms with linear function approxi-mation are analyzed in this paper. The analysis is carried out in the context of the approximation of a discounted cost-to-go function associated with an uncontrolled Markov chain

Convergence of Synchronous Reinforcement Learning with Linear Function Approximation

by Artur Merke, Ralf Schoknecht , 2004
"... Synchronous reinforcement learning (RL) algorithms with linear function approximation are representable as inhomogeneous matrix iterations of a special form (Schoknecht & Merke, 2003). In this paper we state conditions of convergence for general inhomogeneous matrix iterations and prove th ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Synchronous reinforcement learning (RL) algorithms with linear function approximation are representable as inhomogeneous matrix iterations of a special form (Schoknecht & Merke, 2003). In this paper we state conditions of convergence for general inhomogeneous matrix iterations and prove

Improved Temporal Difference Methods with Linear Function Approximation

by Dimitri P. Bertsekas, Angelia Nedich, Vivek S. Borkar
"... This chapter considers temporal difference algorithms within the context of infinite-horizon finite-state dynamic programming problems with discounted cost and linear cost function approximation. This problem arises as a subproblem in the policy iteration method of dynamic programming. Additional d ..."
Abstract - Cited by 32 (7 self) - Add to MetaCart
This chapter considers temporal difference algorithms within the context of infinite-horizon finite-state dynamic programming problems with discounted cost and linear cost function approximation. This problem arises as a subproblem in the policy iteration method of dynamic programming. Additional

Convergent fitted value iteration with linear function approximation

by Daniel J Lizotte , David R Cheriton - In Advances in Neural Information Processing Systems , 2011
"... Abstract Fitted value iteration (FVI) with ordinary least squares regression is known to diverge. We present a new method, "Expansion-Constrained Ordinary Least Squares" (ECOLS), that produces a linear approximation but also guarantees convergence when used with FVI. To ensure convergence ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Abstract Fitted value iteration (FVI) with ordinary least squares regression is known to diverge. We present a new method, "Expansion-Constrained Ordinary Least Squares" (ECOLS), that produces a linear approximation but also guarantees convergence when used with FVI. To ensure

Non-linear Functional Approximation of Heterogeneous Dynamics

by Enrico Capobianco , 2005
"... In modeling phenomena continuously observed and/or sampled at discrete time sequences, on problem is that often dynamics come from heterogeneous sources of uncertainty. This turns out particularly challenging with a low signal-to-noise ratio, due to the structural or experimental conditions; for ins ..."
Abstract - Add to MetaCart
; for instance, information appears dispersed in a wide spectrum of frequency bands or resolution levels. We aim to design ad hoc approximation instruments dealing with a particularly complex class of random processes, the one that generates financial returns, or their aggregates as index returns. The underlying

The Stability of General Discounted Reinforcement Learning with Linear Function Approximation

by Stuart I. Reynolds - In Proceedings of the UK Workshop on Computational Intelligence (UKCI-02 , 2002
"... This paper shows that general discounted return estimating reinforcement learning algorithms cannot diverge to infinity when a form of linear function approximator is used for approximating the value-function or Q-function. The results are significant insofar as examples of divergence of the value-f ..."
Abstract - Cited by 5 (0 self) - Add to MetaCart
This paper shows that general discounted return estimating reinforcement learning algorithms cannot diverge to infinity when a form of linear function approximator is used for approximating the value-function or Q-function. The results are significant insofar as examples of divergence of the value-function

A Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation

by J. Wesley Hines
"... Multi-layer feedforward neural networks with sigmoidal activation functions have been termed "universal function approximators". Although these types of networks can approximate any continuous function to a desired degree of accuracy, this approximation may require an inordinate number of ..."
Abstract - Cited by 5 (0 self) - Add to MetaCart
of hidden nodes and is only accurate over a finite interval. These short comings are due to the standard multi-layer perceptron's (MLP) architecture not being well suited to unbounded non-linear function approximation. A new architecture incorporating a logarithmic hidden layer proves to be superior
Next 10 →
Results 1 - 10 of 116,030
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University