Results 1  10
of
13
Local Gain Adaptation in Stochastic Gradient Descent
 In Proc. Intl. Conf. Artificial Neural Networks
, 1999
"... Gain adaptation algorithms for neural networks typically adjust learning rates by monitoring the correlation between successive gradients. Here we discuss the limitations of this approach, and develop an alternative by extending Sutton's work on linear systems to the general, nonlinear case. The res ..."
Abstract

Cited by 58 (13 self)
 Add to MetaCart
Gain adaptation algorithms for neural networks typically adjust learning rates by monitoring the correlation between successive gradients. Here we discuss the limitations of this approach, and develop an alternative by extending Sutton's work on linear systems to the general, nonlinear case. The resulting online algorithms are computationally little more expensive than other acceleration techniques, do not assume statistical independence between successive training patterns, and do not require an arbitrary smoothing parameter. In our benchmark experiments, they consistently outperform other acceleration methods, and show remarkable robustness when faced with noni. i.d. sampling of the input space.
Fast Curvature MatrixVector Products for SecondOrder Gradient Descent
 Neural Computation
, 2002
"... We propose a generic method for iteratively approximating various secondorder gradient steps  Newton, GaussNewton, LevenbergMarquardt, and natural gradient  in linear time per iteration, using special curvature matrixvector products that can be computed in O(n). Two recent acceleration techn ..."
Abstract

Cited by 38 (14 self)
 Add to MetaCart
We propose a generic method for iteratively approximating various secondorder gradient steps  Newton, GaussNewton, LevenbergMarquardt, and natural gradient  in linear time per iteration, using special curvature matrixvector products that can be computed in O(n). Two recent acceleration techniques for online learning, matrix momentum and stochastic metadescent (SMD), in fact implement this approach. Since both were originally derived by very different routes, this o ers fresh insight into their operation, resulting in further improvements to SMD.
Learning Techniques in a Dataglove Based Telemanipulation System for the DLR Hand
 In Proceedings of the IEEE Int. Conference on Robotics and Automation, pages 1603 – 1608
, 1998
"... We present a setup to control a fourfinger anthropomorphic robot hand using a dataglove. To be able to accurately use the dataglove we implemented a nonlinear learning calibration using a novel neural network technique. Experiments show that a resulting positioning error not exceeding 1.8mm, but ty ..."
Abstract

Cited by 17 (2 self)
 Add to MetaCart
We present a setup to control a fourfinger anthropomorphic robot hand using a dataglove. To be able to accurately use the dataglove we implemented a nonlinear learning calibration using a novel neural network technique. Experiments show that a resulting positioning error not exceeding 1.8mm, but typically 0.5mm, per finger can be obtained; this accuracy is sufficiently precise for grasping tasks. Based on the dataglove calibration we present a solution for the mapping of human and artificial hand workspaces that enables an operator to intuitively and easily telemanipulate objects with the artificial hand. 1 Introduction The aim of our research is to provide a human operator with an interface enabling her to control the DLR dextrous hand as flexible, easy, and precise as possible. As the kinematics and workspaces of a human hand and an artificial hand are generally different, we want to control the finger tip positions of the artificial hand such that they correspond with the finger t...
Online local gain adaptation for multi–layer perceptrons
, 1998
"... We introduce a new method for adapting the step size of each individual weight in a multilayer perceptron trained by stochastic gradient descent. Our technique derives from the K1 algorithm for linear systems (Sutton, 1992b), which in turn is based on a diagonalized Kalman Filter. We expand upon Su ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
We introduce a new method for adapting the step size of each individual weight in a multilayer perceptron trained by stochastic gradient descent. Our technique derives from the K1 algorithm for linear systems (Sutton, 1992b), which in turn is based on a diagonalized Kalman Filter. We expand upon Sutton’s work in two regards: K1 is a) extended to multilayer perceptrons, and b) made more efficient by linearizing an exponentiation operation. The resulting elk1 (extended, linearized K1) algorithm is computationally little more expensive than alternative proposals (Zimmermann, 1994; Almeida et al., 1997, 1998), and does not require an arbitrary smoothing parameter. In our benchmark experiments, elk1 consistently outperforms these alternatives, as well as stochastic gradient descent with momentum, even when the number of floatingpoint operations required per weight update is taken into account. Unlike the method of Almeida et al. (1997, 1998), elk1 does not require statistical independence between successive training patterns, and handles large initial learning rates well. 1
Centering Neural Network Gradient Factors
 Neural Networks: Tricks of the Trade, volume 1524 of Lecture Notes in Computer Science
, 1997
"... It has long been known that neural networks can learn faster when their input and hidden unit activities are centered about zero; recently we have extended this approach to also encompass the centering of error signals [2]. Here we generalize this notion to all factors involved in the network's grad ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
It has long been known that neural networks can learn faster when their input and hidden unit activities are centered about zero; recently we have extended this approach to also encompass the centering of error signals [2]. Here we generalize this notion to all factors involved in the network's gradient, leading us to propose centering the slope of hidden unit activation functions as well. Slope centering removes the linear component of backpropagated error; this improves credit assignment in networks with shortcut connections. Benchmark results show that this can speed up learning significantly without adversely affecting the trained network's generalization ability.
Online learning with adaptive local step sizes
 In M. Marinaro & R. Tagliaferri (Eds.), Neural Nets—WIRN Vietri99: Proceedings of the 11th Italian Workshop on Neural Nets
, 1999
"... Almeida et al. have recently proposed online algorithms for local step size adaptation in nonlinear systems trained by gradient descent. Here we develop an alternative to their approach by extending Sutton’s work on linear systems to the general, nonlinear case. The resulting algorithms are computat ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
Almeida et al. have recently proposed online algorithms for local step size adaptation in nonlinear systems trained by gradient descent. Here we develop an alternative to their approach by extending Sutton’s work on linear systems to the general, nonlinear case. The resulting algorithms are computationally little more expensive than other acceleration techniques, do not assume statistical independence between successive training patterns, and do not require an arbitrary smoothing parameter. In our benchmark experiments, they consistently outperform other acceleration methods as well as stochastic gradient descent with fixed learning rate and momentum. 1
Slope Centering: Making Shortcut Weights Effective
 Proceedings of the 8th International Conference on Artificial Neural Networks, Perspectives in Neural Computing
, 1998
"... Shortcut connections are a popular architectural feature of multilayer perceptrons. It is generally assumed that by implementing a linear submapping, shortcuts assist the learning process in the remainder of the network. Here we find that this is not always the case: shortcut weights may also act a ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
Shortcut connections are a popular architectural feature of multilayer perceptrons. It is generally assumed that by implementing a linear submapping, shortcuts assist the learning process in the remainder of the network. Here we find that this is not always the case: shortcut weights may also act as distractors that slow down convergence and can lead to inferior solutions. This problem can be addressed with slope centering, a particular form of gradient factor centering [2]. By removing the linear component of the error signal at a hidden node, slope centering effectively decouples that node from the shortcuts that bypass it. This eliminates the possibility of destructive interference from shortcut weights, and thus ensures that the benefits of shortcut connections are fully realized.
On Centering Neural Network Weight Updates
 Tricks of the Trade
, 1997
"... It has long been known that neural networks can learn faster when their input and hidden unit activities are centered about zero; recently we have extended this approach to also encompass the centering of error signals (Schraudolph and Sejnowski, 1996). Here we generalize this notion to all factors ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
It has long been known that neural networks can learn faster when their input and hidden unit activities are centered about zero; recently we have extended this approach to also encompass the centering of error signals (Schraudolph and Sejnowski, 1996). Here we generalize this notion to all factors involved in the weight update, leading us to propose centering the slope of hidden unit activation functions as well. Slope centering removes the linear component of backpropagated error; this improves credit assignment in networks with shortcut connections. Benchmark results show that this can speed up learning significantly without adversely affecting the trained network's generalization ability.
Accelerated Gradient Descent by FactorCentering Decomposition
, 1998
"... Gradient factor centering is a new methodology for decomposing neural networks into biased and centered subnets which are then trained in parallel. The decomposition can be applied to any patterndependent factor in the network's gradient, and is designed such that the subnets are more amenable to o ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Gradient factor centering is a new methodology for decomposing neural networks into biased and centered subnets which are then trained in parallel. The decomposition can be applied to any patterndependent factor in the network's gradient, and is designed such that the subnets are more amenable to optimization by gradient descent than the original network: biased subnets because of their simplified architecture, centered subnets due to a modified gradient that improves conditioning. The architectural and algorithmic modifications mandated by this approach include both familiar and novel elements, often in prescribed combinations. The framework suggests for instance that shortcut connections  a wellknown architectural feature  should work best in conjunction with slope centering, a new technique described herein. Our benchmark experiments bear out this prediction, and show that factorcentering decomposition can speed up learning significantly without adversely affecting the train...
Locked, unlocked and semilocked network weights for four different camera calibration problems
 IEEE International Joint Conference on Neural Networks (IJCNN'2001
"... Backpropagation training algorithms typically view network weights as a single vector of isotropic parameters to be minimized. In contrast, we present a neural network in which each network weight has its own physical meaning and its di erent role during network training. The network is used to solv ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Backpropagation training algorithms typically view network weights as a single vector of isotropic parameters to be minimized. In contrast, we present a neural network in which each network weight has its own physical meaning and its di erent role during network training. The network is used to solve four di erent types of calibration problems found in computer vision applications. A network weight may be unlocked, locked or semilocked during training according to the available information about the problem. Experiments show the network trained with the available backpropagationbased algorithms can provide superior results to some other widelyused calibration techniques. 1