Results 1  10
of
12
Reinforcement Learning with Replacing Eligibility Traces
 MACHINE LEARNING
, 1996
"... The eligibility trace is one of the basic mechanisms used in reinforcement learning to handle delayed reward. In this paper we introduce a new kind of eligibility trace, the replacing trace, analyze it theoretically, and show that it results in faster, more reliable learning than the conventional ..."
Abstract

Cited by 186 (11 self)
 Add to MetaCart
The eligibility trace is one of the basic mechanisms used in reinforcement learning to handle delayed reward. In this paper we introduce a new kind of eligibility trace, the replacing trace, analyze it theoretically, and show that it results in faster, more reliable learning than the conventional trace. Both kinds of trace assign credit to prior events according to how recently they occurred, but only the conventional trace gives greater credit to repeated events. Our analysis is for conventional and replacetrace versions of the offline TD(1) algorithm applied to undiscounted absorbing Markov chains. First, we show that these methods converge under repeated presentations of the training set to the same predictions as two well known Monte Carlo methods. We then analyze the relative efficiency of the two Monte Carlo methods. We show that the method corresponding to conventional TD is biased, whereas the method corresponding to replacetrace TD is unbiased. In addition, we show that t...
Monte Carlo Matrix Inversion and Reinforcement Learning
 In Advances in Neural Information Processing Systems 6
, 1994
"... We describe the relationship between certain reinforcement learning (RL) methods based on dynamic programming (DP) and a class of unorthodox Monte Carlo methods for solving systems of linear equations proposed in the 1950's. These methods recast the solution of the linear system as the expected valu ..."
Abstract

Cited by 21 (1 self)
 Add to MetaCart
We describe the relationship between certain reinforcement learning (RL) methods based on dynamic programming (DP) and a class of unorthodox Monte Carlo methods for solving systems of linear equations proposed in the 1950's. These methods recast the solution of the linear system as the expected value of a statistic suitably defined over sample paths of a Markov chain. The significance of our observations lies in arguments (Curtiss, 1954) that these Monte Carlo methods scale better with respect to statespace size than do standard, iterative techniques for solving systems of linear equations. This analysis also establishes convergence rate estimates. Because methods used in RL systems for approximating the evaluation function of a fixed control policy also approximate solutions to systems of linear equations, the connection to these Monte Carlo methods establishes that algorithms very similar to TD algorithms (Sutton, 1988) are asymptotically more efficient in a precise sense than other...
V.Alexandrov A New Highly Convergent Monte Carlo Method for Matrix Computations
 Proc. of IMACS Monte Carlo Seminar
, 1997
"... In this paper a second degree iterative Monte Carlo method for solving Systems of Linear Algebraic Equations and Matrix Inversion is presented. Comparisons are made with iterative Monte Carlo methods with degree one. It is shown that the mean value of the number of chains N, and the chain length T, ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
In this paper a second degree iterative Monte Carlo method for solving Systems of Linear Algebraic Equations and Matrix Inversion is presented. Comparisons are made with iterative Monte Carlo methods with degree one. It is shown that the mean value of the number of chains N, and the chain length T, required to reach given precision can be reduced. The following estimate on N is obtained N = Nc / cN + bN 1/2 �2 c, where Nc is the number of chains in the usual degree one method. In addition it is shown that b> 0 and that N < Nc/c2 N. This result shows that for our method the number of realizations N can be at least c2 N times less than the number of realizations Nc of the existing Monte Carlo method. For parallel implementation, i.e. regular arrays or MIMD distributed memory architectures, these results imply faster algorithms and the reduction of the size of the arrays. This leads also in applying such methods to the problems with smaller sizes, since until now Monte Carlo methods are applicable for large scale problems and when the component of the solution vector or element or row of the inverse matrix has to be found.
Stabilization of Stochastic Iterative Methods for Singular and Nearly Singular Linear Systems
, 2012
"... We consider linear systems of equations, Ax = b, of various types frequently arising in largescale applications, withanemphasisonthecasewhereAissingular. Undercertainconditions, necessaryaswell as sufficient, linear deterministic iterative methods generate sequences {xk} that converge to a solution ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
We consider linear systems of equations, Ax = b, of various types frequently arising in largescale applications, withanemphasisonthecasewhereAissingular. Undercertainconditions, necessaryaswell as sufficient, linear deterministic iterative methods generate sequences {xk} that converge to a solution, as long as there exists at least one solution. We show that this convergence property is frequently lost when these methods are implemented with simulation, as is often done in important classes of largescale problems. We introduce additional conditions and novel algorithmic stabilization schemes under which {xk} converges to a solution when A is singular, and may also be used with substantial benefit when A is nearly singular. Moreover, we establish the mathematical foundation for related work that deals with special cases of singular systems, including some arising in approximate dynamic programming, where convergence may be obtained without a stabilization mechanism. 1
A SimulationBased Approach to Stochastic Dynamic Programming
 Applied Stochastic Models
, 2011
"... In this paper we develop a simulationbased approach to stochastic dynamic programming. To solve the Bellman equation we construct Monte Carlo estimates of Qvalues. Our method is scalable to high dimensions and works in both continuous and discrete state and decision spaces whilst avoiding discreti ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
In this paper we develop a simulationbased approach to stochastic dynamic programming. To solve the Bellman equation we construct Monte Carlo estimates of Qvalues. Our method is scalable to high dimensions and works in both continuous and discrete state and decision spaces whilst avoiding discretization errors that plague traditional methods. We provide a geometric convergence rate. We illustrate our methodology with a dynamic stochastic investment problem. Keywords:
ARTICLE IN PRESS Journal of Computational and Applied Mathematics ( ) – Contents lists available at ScienceDirect Journal of Computational and Applied
"... journal homepage: www.elsevier.com/locate/cam Projected equation methods for approximate solution of large ..."
Abstract
 Add to MetaCart
journal homepage: www.elsevier.com/locate/cam Projected equation methods for approximate solution of large
ARTICLE IN PRESS Available online at www.sciencedirect.com Mathematics and Computers in Simulation xxx (2009) xxx–xxx
"... Monte Carlo (MC) linear solvers can be considered stochastic realizations of deterministic stationary iterative processes. That is, they estimate the result of a stationary iterative technique for solving linear systems. There are typically two sources of errors: (i) those from the underlying determ ..."
Abstract
 Add to MetaCart
Monte Carlo (MC) linear solvers can be considered stochastic realizations of deterministic stationary iterative processes. That is, they estimate the result of a stationary iterative technique for solving linear systems. There are typically two sources of errors: (i) those from the underlying deterministic iterative process and (ii) those from the MC process that performs the estimation. Much progress has been made in reducing the stochastic errors of the MC process. However, MC linear solvers suffer from the drawback that, due to efficiency considerations, they are usually stochastic realizations of the Jacobi method (a diagonal splitting), which has poor convergence properties. This has limited the application of MC linear solvers. The main goal of this paper is to show that efficient MC implementations of nondiagonal splittings too are feasible, by constructing efficient implementations for one such splitting. As a secondary objective, we also derive conditions under which this scheme can perform better than MC Jacobi, and demonstrate this experimentally. The significance of this work lies in proposing an approach that can lead to efficient MC implementations of a wider variety of deterministic iterative processes. © 2009 IMACS. Published by Elsevier B.V. All rights reserved.
Contents lists available at ScienceDirect Journal of Computational and Applied
"... journal homepage: www.elsevier.com/locate/cam Projected equation methods for approximate solution of large ..."
Abstract
 Add to MetaCart
journal homepage: www.elsevier.com/locate/cam Projected equation methods for approximate solution of large