Results 1  10
of
48
Local Gain Adaptation in Stochastic Gradient Descent
 In Proc. Intl. Conf. Artificial Neural Networks
, 1999
"... Gain adaptation algorithms for neural networks typically adjust learning rates by monitoring the correlation between successive gradients. Here we discuss the limitations of this approach, and develop an alternative by extending Sutton's work on linear systems to the general, nonlinear case. Th ..."
Abstract

Cited by 70 (12 self)
 Add to MetaCart
(Show Context)
Gain adaptation algorithms for neural networks typically adjust learning rates by monitoring the correlation between successive gradients. Here we discuss the limitations of this approach, and develop an alternative by extending Sutton's work on linear systems to the general, nonlinear case. The resulting online algorithms are computationally little more expensive than other acceleration techniques, do not assume statistical independence between successive training patterns, and do not require an arbitrary smoothing parameter. In our benchmark experiments, they consistently outperform other acceleration methods, and show remarkable robustness when faced with noni. i.d. sampling of the input space.
Improving the Rprop Learning Algorithm
 PROCEEDINGS OF THE SECOND INTERNATIONAL SYMPOSIUM ON NEURAL COMPUTATION (NC 2000)
, 2000
"... The Rprop algorithm proposed by Riedmiller and Braun is one of the best performing firstorder learning methods for neural networks. We introduce modifications of the algorithm that improve its learning speed. The resulting speedup is experimentally shown for a set of neural network learning tasks a ..."
Abstract

Cited by 57 (7 self)
 Add to MetaCart
The Rprop algorithm proposed by Riedmiller and Braun is one of the best performing firstorder learning methods for neural networks. We introduce modifications of the algorithm that improve its learning speed. The resulting speedup is experimentally shown for a set of neural network learning tasks as well as for artificial error surfaces.
Comparison of Optimized Backpropagation Algorithms
 Proc. of ESANN'93, Brussels
, 1993
"... Backpropagation is one of the most famous training algorithms for multilayer perceptrons. Unfortunately it can be very slow for practical applications. Over the last years many improvement strategies have been developed to speed up backpropagation. It's very difficult to compare these different ..."
Abstract

Cited by 44 (1 self)
 Add to MetaCart
Backpropagation is one of the most famous training algorithms for multilayer perceptrons. Unfortunately it can be very slow for practical applications. Over the last years many improvement strategies have been developed to speed up backpropagation. It's very difficult to compare these different techniques, because most of them have been tested on various specific data sets. Most of the reported results are based on some kind of tiny and artificial training sets like XOR, encoder or decoder. It's very doubtful if these results hold for more complicate practical application. In this report an overview of many different speedup techniques is given. All of them were assessed by a very hard practical classification task, which consists of a big medical data set. As you will see many of these optimized algorithms fail in learning the data set. 1 Introduction This report is intended to summarize our experience using many different speedup techniques for the backpropagation algorithm. We have...
3D Hand Tracking by Rapid Stochastic Gradient Descent using a Skinning Model
 1st European Conference on Visual Media Production (CVMP
, 2004
"... Abstract The main challenge of tracking articulated structures like hands is their large number of degrees of freedom (DOFs). A realistic 3D model of the human hand has at least 26 DOFs. The arsenal of tracking approaches that can track such structures fast and reliably is still very small. This pa ..."
Abstract

Cited by 30 (3 self)
 Add to MetaCart
(Show Context)
Abstract The main challenge of tracking articulated structures like hands is their large number of degrees of freedom (DOFs). A realistic 3D model of the human hand has at least 26 DOFs. The arsenal of tracking approaches that can track such structures fast and reliably is still very small. This paper proposes a tracker based on ‘Stochastic MetaDescent ’ (SMD) for optimizations in such highdimensional state spaces. This new algorithm is based on a gradient descent approach with adaptive and parameterspecific step sizes. The SMD tracker facilitates the integration of constraints, and combined with a stochastic sampling technique, can get out of spurious local minima. Furthermore, the integration of a deformable hand model based on linear blend skinning and anthropometrical measurements reinforce the robustness of our tracker. Experiments show the efficiency of the SMD algorithm in comparison with common optimization methods. 1
On Learning by Exchanging Advice
, 2003
"... One of the main questions concerning learning in MultiAgent Systems is: "(How) can agents benefit from mutual interaction during the learning process?". This paper describes the study of an interactive adviceexchange mechanism as a possible way to improve agents' learning perform ..."
Abstract

Cited by 22 (6 self)
 Add to MetaCart
(Show Context)
One of the main questions concerning learning in MultiAgent Systems is: "(How) can agents benefit from mutual interaction during the learning process?". This paper describes the study of an interactive adviceexchange mechanism as a possible way to improve agents' learning performance. The adviceexchange technique, discussed here, uses supervised learning (backpropagation), where reinforcement is not directly coming from the environment but is based on advice given by peers with better performance score (higher confidence), to enhance the performance of a heterogeneous group of Learning Agents (LAs). The LAs are facing similar problems, in an environment where only reinforcement information is available. Each LA applies a different, well known, learning technique: Random Walk, Simulated Annealing, Evolutionary Algorithms and QLearning. The problem used for evaluation is a simplified trafficcontrol simulation. In the following text the reader can find a description of the traffic simulation and Learning Agents (focused on the adviceexchange mechanism), a discussion of the first results obtained and suggested techniques to overcome the problems that have been observed. Initial results indicate that adviceexchange can improve learning speed, although "bad advice" and/or blind reliance can disturb the learning performance. The use of supervised learning to incorporate advice given from nonexpert peers using different learning algorithms, in problems where no supervision information is available, is, to the best of the authors' knowledge, a new concept in the area of MultiAgent Systems Learning.
Online local gain adaptation for multi–layer perceptrons
, 1998
"... We introduce a new method for adapting the step size of each individual weight in a multilayer perceptron trained by stochastic gradient descent. Our technique derives from the K1 algorithm for linear systems (Sutton, 1992b), which in turn is based on a diagonalized Kalman Filter. We expand upon Su ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
(Show Context)
We introduce a new method for adapting the step size of each individual weight in a multilayer perceptron trained by stochastic gradient descent. Our technique derives from the K1 algorithm for linear systems (Sutton, 1992b), which in turn is based on a diagonalized Kalman Filter. We expand upon Sutton’s work in two regards: K1 is a) extended to multilayer perceptrons, and b) made more efficient by linearizing an exponentiation operation. The resulting elk1 (extended, linearized K1) algorithm is computationally little more expensive than alternative proposals (Zimmermann, 1994; Almeida et al., 1997, 1998), and does not require an arbitrary smoothing parameter. In our benchmark experiments, elk1 consistently outperforms these alternatives, as well as stochastic gradient descent with momentum, even when the number of floatingpoint operations required per weight update is taken into account. Unlike the method of Almeida et al. (1997, 1998), elk1 does not require statistical independence between successive training patterns, and handles large initial learning rates well. 1
An Investigation of Feedforward Neural Networks with Respect to the Detection of Spurious Patterns
, 1995
"... This thesis investigates feedforward neural networks in the context of classification tasks with respect to the detection of patterns that do not belong to the same categories of patterns used to train the network. This refers to the problem of the detection and/or rejection of spurious or novel pat ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
This thesis investigates feedforward neural networks in the context of classification tasks with respect to the detection of patterns that do not belong to the same categories of patterns used to train the network. This refers to the problem of the detection and/or rejection of spurious or novel patterns. In particular, the multilayer perceptron network (MLP) trained with the backpropagation algorithm is examined in this respect and different strategies for improving its performance in the detection of spurious patterns are considered. The problem is investigated from different points of view that vary from the modification of the multilayer perceptron network with different configurations that make it more intrinsically able to detect spurious information, to the introduction of novel auxiliary mechanisms which, when integrated with the MLP network, can provide an overall enhancement in the system's rejection capabilities. These different network configurations are examined with respe...
Gradientbased manipulation of nonparametric entropy estimates
 IEEE Trans. Neural Networks
, 2004
"... Abstract — This paper derives a family of differential learning rules that optimize the Shannon entropy at the output of an adaptive system via kernel density estimation. In contrast to parametric formulations of entropy, this nonparametric approach assumes no particular functional form of the outpu ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
(Show Context)
Abstract — This paper derives a family of differential learning rules that optimize the Shannon entropy at the output of an adaptive system via kernel density estimation. In contrast to parametric formulations of entropy, this nonparametric approach assumes no particular functional form of the output density. We address problems associated with quantized data and finite sample size, and implement efficient maximum likelihood techniques for optimizing the regularizer. We also develop a normalized entropy estimate that is invariant with respect to affine transformations, facilitating optimization of the shape, rather than the scale, of the output density. Kernel density estimates are smooth and differentiable; this makes the derived entropy estimates amenable to manipulation by gradient descent. The resulting weight updates are surprisingly simple and efficient learning rules that operate on pairs of input samples. They can be tuned for datalimited or memorylimited situations, or modified to give a fully online implementation. Index Terms — affineinvariant entropy, entropy manipulation, expectationmaximization, kernel density, maximum likelihood kernel, overrelaxation, Parzen windows, step size adaptation. I.
Online Step Size Adaptation
 INESC. 9 Rua Alves Redol, 1000
, 1997
"... Subcategory: online learning algorithms ..."