Results 1 
8 of
8
The partigame algorithm for variable resolution reinforcement learning in multidimensional statespaces
 MACHINE LEARNING
, 1995
"... Partigame is a new algorithm for learning feasible trajectories to goal regions in high dimensional continuous statespaces. In high dimensions it is essential that learning does not plan uniformly over a statespace. Partigame maintains a decisiontree partitioning of statespace and applies tec ..."
Abstract

Cited by 257 (8 self)
 Add to MetaCart
(Show Context)
Partigame is a new algorithm for learning feasible trajectories to goal regions in high dimensional continuous statespaces. In high dimensions it is essential that learning does not plan uniformly over a statespace. Partigame maintains a decisiontree partitioning of statespace and applies techniques from gametheory and computational geometry to efficiently and adaptively concentrate high resolution only on critical areas. The current version of the algorithm is designed to find feasible paths or trajectories to goal regions in high dimensional spaces. Future versions will be designed to find a solution that optimizes a realvalued criterion. Many simulated problems have been tested, ranging from twodimensional to ninedimensional statespaces, including mazes, path planning, nonlinear dynamics, and planar snake robots in restricted spaces. In all cases, a good solution is found in less than ten trials and a few minutes.
Techniques in Computational Stochastic Dynamic Programming
 in Control and Dynamic Systems
, 1996
"... INTRODUCTION When Bellman introduced dynamic programming in his original monograph [8], computers were not as powerful as current personal computers. Hence, his description of the extreme computational demands as the Curse of Dimensionality [9] would not have had the super and massively parallel p ..."
Abstract

Cited by 12 (8 self)
 Add to MetaCart
(Show Context)
INTRODUCTION When Bellman introduced dynamic programming in his original monograph [8], computers were not as powerful as current personal computers. Hence, his description of the extreme computational demands as the Curse of Dimensionality [9] would not have had the super and massively parallel processors of today in mind. However, massive and super computers can not overcome the Curse of Dimensionality alone, but parallel and vector computation can permit the solution of higher dimension than was previously possible and thus permit more realistic dynamic programming applications. Today such large problems are called Grand and National Challenge problems [45, 46] in high performance computing. Today's availability of high performance vector supercomputers and massively parallel processors have made it possible to compute optimal policies and values of control systems for much larger dimensions than was possible earlier. Advance
Parallel Stochastic Dynamic Programming: Finite Element Methods
 Lin. Alg. Applic
, 1992
"... ABSTRACT A finite element method for stochastic dynamic programming is developed. The computational method is valid for a general class of optimal control problems that are nonlinear and perturbed by general Markov noise in continuous time, including jump Poisson noise. Stability and convergence of ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
(Show Context)
ABSTRACT A finite element method for stochastic dynamic programming is developed. The computational method is valid for a general class of optimal control problems that are nonlinear and perturbed by general Markov noise in continuous time, including jump Poisson noise. Stability and convergence of the method are verified and its storage utilization efficiency over the traditional finite difference method is demonstrated. This advanced numerical technique, together with parallel computation, helps to alleviate Bellman’s curse of dimensionality by permitting the solution of larger problems. 1.
Multigrid Methods for Policy Evaluation and Reinforcement Learning
"... Abstract – We introduce a new class of multigrid temporaldifference learning algorithms for speeding up the estimation of the value function related to a stationary policy, within the context of discounted cost Markov Decision Processes with linear functional approximation. The proposed scheme buil ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Abstract – We introduce a new class of multigrid temporaldifference learning algorithms for speeding up the estimation of the value function related to a stationary policy, within the context of discounted cost Markov Decision Processes with linear functional approximation. The proposed scheme builds on the multigrid framework which is used in numerical analysis to enhance the iterative solution of linear equations. We first apply the multigrid approach to policy evaluation in the known model case. We then extend this approach to the learning case, and propose a scheme in which the basic TD(λ) learning algorithm is applied at various resolution scales. The efficacy of the proposed algorithms is demonstrated through simulation experiments. 1.
1A New Adaptive Aggregation Algorithm for Infinite Horizon Dynamic Programming*
"... Abstract — Dynamic programming suffers the “curse of dimensionality ” when it is employed for complex control systems. State aggregation is used to solve the problem and accelerate computation by looking for a suboptimal policy. In this paper, a new method, which converges much faster than conven ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract — Dynamic programming suffers the “curse of dimensionality ” when it is employed for complex control systems. State aggregation is used to solve the problem and accelerate computation by looking for a suboptimal policy. In this paper, a new method, which converges much faster than conventional aggregated value iteration based on TD(0), is proposed for computing the value functions of the aggregated system. Preliminary results show that the new method increases the speed of convergence impressively. Aggregation introduces errors inevitably. An adaptive aggregation scheme employing the new computation method is also proposed to reduce the aggregation errors. I.
Computational Stochastic Control: Basic Foundations, Complexity and Techniques
"... Much research in control systems is purely mathematical, but advances in stochastic control problem solving can be used beyond the limits of where theoretical mathematics can help. Theoretical and computational mathematics are complementary. Computation is important where the problem is mathematic ..."
Abstract
 Add to MetaCart
Much research in control systems is purely mathematical, but advances in stochastic control problem solving can be used beyond the limits of where theoretical mathematics can help. Theoretical and computational mathematics are complementary. Computation is important where the problem is mathematically intractable, of high dimension as in stochastic dynamic programming or solving the problem is urgent as in competitive financial engineering predictions. Many advances in solving large scale control problems have been gained through technical improvements in computing hardware, but as many advances have been made in the development of new and better algorithms, the theoretical side of computation. Both analysis and computation are important in solving problems. Both rely on mathematics, but rely on them in different ways. An important part of educational training is general preparation for problem solving since the postgraduate job is uncertain in the current world.
A. MARKOV CHAIN APPROXIMATION Excerpted Section Techniques in Computational Stochastic Dynamic Programming
"... Another approach to finite differences is the well developed Markov Chain Approximation (MCA) of Kushner [3, 4]. Recent developments are surveyed and further advanced by Kushner [5], and by Kushner and Dupuis [6], with special attention to methods for jump and reflected diffusions. This method appli ..."
Abstract
 Add to MetaCart
(Show Context)
Another approach to finite differences is the well developed Markov Chain Approximation (MCA) of Kushner [3, 4]. Recent developments are surveyed and further advanced by Kushner [5], and by Kushner and Dupuis [6], with special attention to methods for jump and reflected diffusions. This method applies a Markov chain approximation to continuous time, continuous state stochastic control problems by renormalizing finite differences forms as proper Markov chain transition probabilities. These transition probabilities arise when deriving finite difference versions of the dynamic programming equation. An important advantage of this method is that the Markov chain approximation facilitates convergence proofs for the numerical methods in terms of probabilistic arguments. Probabilistic interpretation of the approximation is a major motivation for the formulation of this method. Here, the MCA method is given a formal presentation, in the spirit of the SDP notation and formulation to facilitate comparison. The reader should refer to the above references for the greater detail, especially Kushner and Dupuis [6] for a multitude of variations and convergence proofs. 1. MCA Dynamic Programming Model Formulation Consider the stochastic diffusion without Poisson jumps governed by the stochastic differential equation (SDE) where the notation is the same as in the full paper [2]. It is assumed that drift � and Gaussian coefficient �
AN OPTIMAL MULTIGRID ALGORITHM FOR DISCRETETIME STOCHASTIC CONTROL t
, 1989
"... We consider the numerical solution of discretetime, stationary, infinite horizon, discounted stochastic control problems, for the case where the state space is continuous and the problem is to be solved approximately, within a desired accuracy. After a discussion of problem discretization, we intr ..."
Abstract
 Add to MetaCart
(Show Context)
We consider the numerical solution of discretetime, stationary, infinite horizon, discounted stochastic control problems, for the case where the state space is continuous and the problem is to be solved approximately, within a desired accuracy. After a discussion of problem discretization, we introduce a multigrid version of the successive approximation algorithm, and analyze its computational requirements as a function of the desired accuracy and of the discount factor. We show that the multigrid algorithm improves upon the complexity of its singlegrid variant and is, in a certain sense, optimal. We also study the effects of a certain mixing (accessibility) condition on the problem's complexity.