Results 1  10
of
61
Local Gain Adaptation in Stochastic Gradient Descent
 In Proc. Intl. Conf. Artificial Neural Networks
, 1999
"... Gain adaptation algorithms for neural networks typically adjust learning rates by monitoring the correlation between successive gradients. Here we discuss the limitations of this approach, and develop an alternative by extending Sutton's work on linear systems to the general, nonlinear case. The res ..."
Abstract

Cited by 57 (12 self)
 Add to MetaCart
Gain adaptation algorithms for neural networks typically adjust learning rates by monitoring the correlation between successive gradients. Here we discuss the limitations of this approach, and develop an alternative by extending Sutton's work on linear systems to the general, nonlinear case. The resulting online algorithms are computationally little more expensive than other acceleration techniques, do not assume statistical independence between successive training patterns, and do not require an arbitrary smoothing parameter. In our benchmark experiments, they consistently outperform other acceleration methods, and show remarkable robustness when faced with noni. i.d. sampling of the input space.
Problem Solving With Reinforcement Learning
, 1995
"... This dissertation is submitted for consideration for the dwree of Doctor' of Philosophy at the Uziver'sity of Cambr'idge Summary This thesis is concerned with practical issues surrounding the application of reinforcement lear'ning techniques to tasks that take place in high dimensional continuous ..."
Abstract

Cited by 45 (0 self)
 Add to MetaCart
This dissertation is submitted for consideration for the dwree of Doctor' of Philosophy at the Uziver'sity of Cambr'idge Summary This thesis is concerned with practical issues surrounding the application of reinforcement lear'ning techniques to tasks that take place in high dimensional continuous statespace environments. In particular, the extension of online updating methods is considered, where the term implies systems that learn as each experience arrives, rather than storing the experiences for use in a separate offline learning phase. Firstly, the use of alternative update rules in place of standard Qlearning (Watkins 1989) is examined to provide faster convergence rates. Secondly, the use of multilayer perceptton (MLP) neural networks (Rumelhart, Hinton and Williams 1986) is investigated to provide suitable generalising function approximators. Finally, consideration is given to the combination of Adaptive Heuristic Critic (AHC) methods and Qlearning to produce systems combining the benefits of realvalued actions and discrete switching
Improving the Rprop Learning Algorithm
 PROCEEDINGS OF THE SECOND INTERNATIONAL SYMPOSIUM ON NEURAL COMPUTATION (NC 2000)
, 2000
"... The Rprop algorithm proposed by Riedmiller and Braun is one of the best performing firstorder learning methods for neural networks. We introduce modifications of the algorithm that improve its learning speed. The resulting speedup is experimentally shown for a set of neural network learning tasks a ..."
Abstract

Cited by 41 (7 self)
 Add to MetaCart
The Rprop algorithm proposed by Riedmiller and Braun is one of the best performing firstorder learning methods for neural networks. We introduce modifications of the algorithm that improve its learning speed. The resulting speedup is experimentally shown for a set of neural network learning tasks as well as for artificial error surfaces.
Rprop  Description and Implementation Details
, 1994
"... F31.64> 4 ij (t). This is based on a signdependent adaptation process, similar to the learningrate adaptation in [4], [5]. 4 (t) ij = 8 ? ? ! ? ? : j + 4 (t\Gamma1) ij ; if @E @w ij (t\Gamma1) @E @w ij (t) ? 0 j \Gamma 4 (t\Gamma1) ij ; if @E @w ij (t\Gamma1) @E @w ij ..."
Abstract

Cited by 32 (0 self)
 Add to MetaCart
F31.64> 4 ij (t). This is based on a signdependent adaptation process, similar to the learningrate adaptation in [4], [5]. 4 (t) ij = 8 ? ? ! ? ? : j + 4 (t\Gamma1) ij ; if @E @w ij (t\Gamma1) @E @w ij (t) ? 0 j \Gamma 4 (t\Gamma1) ij ; if @E @w ij (t\Gamma1) @E @w ij (t) ! 0 4 (t\Gamma1) ij ; else (2) where 0 ! j \Gamma ! 1 ! j + In words, the adaptationrule works as follows: Every time the partial
Evaluation of policy gradient methods and variants on the cartpole benchmark
 IN: ADPRL
, 2007
"... In this paper, we evaluate different versions from the three main kinds of modelfree policy gradient methods, i.e., finite difference gradients, âvanillaâ policy gradients and natural policy gradients. Each of these methods is first presented in its simple form and subsequently refined and opt ..."
Abstract

Cited by 19 (3 self)
 Add to MetaCart
In this paper, we evaluate different versions from the three main kinds of modelfree policy gradient methods, i.e., finite difference gradients, âvanillaâ policy gradients and natural policy gradients. Each of these methods is first presented in its simple form and subsequently refined and optimized. By carrying out numerous experiments on the cart pole regulator benchmark we aim to provide a useful baseline for future research on parameterized policy search algorithms. Portable C++ code is provided for both plant and algorithms; thus, the results in this paper can be reevaluated, reused and new algorithms can be inserted with ease.
On the Correspondence between Neural Folding Architectures and Tree Automata
, 1998
"... The folding architecture together with adequate supervised training algorithms is a special recurrent neural network model designed to solve inductive inference tasks on structured domains. Recently, the generic architecture has been proven as a universal approximator of mappings from rooted labeled ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
The folding architecture together with adequate supervised training algorithms is a special recurrent neural network model designed to solve inductive inference tasks on structured domains. Recently, the generic architecture has been proven as a universal approximator of mappings from rooted labeled ordered trees to real vector spaces. In this article we explore formal correspondences to the automata (language) theory in order to characterize the computational power (representational capabilities) of different instances of the generic folding architecture. As the main result we prove that simple instances of the folding architecture have the computational power of at least the class of deterministic bottomup tree automata. It is shown how architectural constraints like the number of layers, the type of the activation functions (firstorder vs. higherorder) and the transfer functions (threshold vs. sigmoid) influence the representational capabilities. All proofs are carried out in a c...
Application of Sequential Reinforcement Learning to Control Dynamic Systems
 In IEEE Intenational Conference on Neural Networks (ICNN '96
, 1996
"... The article describes the structure of a neural reinforcement learning controller, based on the approach of asynchronous dynamic programming [BBS93]. The learning controller is applied to a wellknown benchmark problem, the cartpole system. In crucial difference to previous approaches, the goal of ..."
Abstract

Cited by 9 (7 self)
 Add to MetaCart
The article describes the structure of a neural reinforcement learning controller, based on the approach of asynchronous dynamic programming [BBS93]. The learning controller is applied to a wellknown benchmark problem, the cartpole system. In crucial difference to previous approaches, the goal of learning is not only to avoid failure, but moreover to stabilize the cart in the middle of the track, with the pole standing in an upright position. The aim is to learn high quality control trajectories known from conventional controller design, by providing only a minimum amount of a priori knowledge and teaching information. 1. Introduction In many tasks to be solved by learning controllers we are faced with the following situation: An unknown system has to be manipulated by an agent or more technically, by a controller, to show a desired behavior. Often, this can only be done by a sequence of control decisions or actions, and the result of the control strategy can only be judged at the ...
Evolutionary Optimization of Neural Networks for Face Detection
, 2004
"... For face recognition from video streams speed and accuracy are vital aspects. The first decision whether a preprocessed image region represents a human face or not is often made by a neural network, e.g., in the ViisageFaceFINDER video surveillance system. We describe the optimization of such a net ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
For face recognition from video streams speed and accuracy are vital aspects. The first decision whether a preprocessed image region represents a human face or not is often made by a neural network, e.g., in the ViisageFaceFINDER video surveillance system. We describe the optimization of such a network by a hybrid algorithm combining evolutionary computation and gradientbased learning. The evolved solutions perform considerably faster than an expertdesigned architecture without loss of accuracy.