Results 1 
7 of
7
Fast Exact Multiplication by the Hessian
 Neural Computation
, 1994
"... Just storing the Hessian H (the matrix of second derivatives d^2 E/dw_i dw_j of the error E with respect to each pair of weights) of a large neural network is difficult. Since a common use of a large matrix like H is to compute its product with various vectors, we derive a technique that directly ca ..."
Abstract

Cited by 70 (4 self)
 Add to MetaCart
Just storing the Hessian H (the matrix of second derivatives d^2 E/dw_i dw_j of the error E with respect to each pair of weights) of a large neural network is difficult. Since a common use of a large matrix like H is to compute its product with various vectors, we derive a technique that directly calculates Hv, where v is an arbitrary vector. This allows H to be treated as a generalized sparse matrix. To calculate Hv, we first define a differential operator R{f(w)} = (d/dr)f(w + rv)_{r=0}, note that R{grad_w} = Hv and R{w} = v, and then apply R{} to the equations used to compute grad_w. The result is an exact and numerically stable procedure for computing Hv, which takes about as much computation, and is about as local, as a gradient evaluation. We then apply the technique to backpropagation networks, recurrent backpropagation, and stochastic Boltzmann Machines. Finally, we show that this technique can be used at the heart of many iterative techniques for computing various properties of H, obviating the need for direct methods.
A Fast Stochastic ErrorDescent Algorithm for Supervised Learning and Optimization
 In
, 1993
"... A parallel stochastic algorithm is investigated for errordescent learning and optimization in deterministic networks of arbitrary topology. No explicit information about internal network structure is needed. The method is based on the modelfree distributed learning mechanism of Dembo and Kaila ..."
Abstract

Cited by 35 (7 self)
 Add to MetaCart
A parallel stochastic algorithm is investigated for errordescent learning and optimization in deterministic networks of arbitrary topology. No explicit information about internal network structure is needed. The method is based on the modelfree distributed learning mechanism of Dembo and Kailath. A modified parameter update rule is proposed by which each individual parameter vector perturbation contributes a decrease in error. A substantially faster learning speed is hence allowed. Furthermore, the modified algorithm supports learning timevarying features in dynamical networks. We analyze the convergence and scaling properties of the algorithm, and present simulation results for dynamic trajectory learning in recurrent networks. 1 Background and Motivation We address general optimization tasks that require finding a set of constant parameter values p i that minimize a given error functional E(p). For supervised learning, the error functional consists of some quantitativ...
Analog VLSI Stochastic Perturbative Learning Architectures
 J. Analog Integrated Circuits and Signal Processing
, 1997
"... We present analog VLSI neuromorphic architectures for a general class of learning tasks, which include supervised learning, reinforcement learning, and temporal di erence learning. The presented architectures are parallel, cellular, sparse in global interconnects, distributed in representation, and ..."
Abstract

Cited by 15 (7 self)
 Add to MetaCart
We present analog VLSI neuromorphic architectures for a general class of learning tasks, which include supervised learning, reinforcement learning, and temporal di erence learning. The presented architectures are parallel, cellular, sparse in global interconnects, distributed in representation, and robust to noise and mismatches in the implementation. They use a parallel stochastic perturbation technique to estimate the e ect of weight changes on network outputs, rather than calculating derivatives based on a model of the network. This \modelfree " technique avoids errors due to mismatchesinthephysical implementation of the network, and more generally allows to train networks of which the exact characteristics and structure are not known. With additional mechanisms of reinforcement learning, networks of fairly general structure are trained e ectively from an arbitrarily supplied reward signal. No prior assumptions are required on the structure of the network nor on the speci cs of the desired network response.
A Learning Analog Neural Network Chip with ContinuousTime Recurrent Dynamics
 In
, 1994
"... We present experimental results on supervised learning of dynamical features in an analog VLSI neural network chip. The recurrent network, containing six continuoustime analog neurons and 42 free parameters (connection strengths and thresholds), is trained to generate timevarying outputs approxima ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
We present experimental results on supervised learning of dynamical features in an analog VLSI neural network chip. The recurrent network, containing six continuoustime analog neurons and 42 free parameters (connection strengths and thresholds), is trained to generate timevarying outputs approximating given periodic signals presented to the network. The chip implements a stochastic perturbative algorithm, which observes the error gradient along random directions in the parameter space for errordescent learning. In addition to the integrated learning functions and the generation of pseudorandom perturbations, the chip provides for teacher forcing and longterm storage of the volatile parameters. The network learns a 1 kHz circular trajectory in 100 sec. The chip occupies 2mm \Theta 2mm in a 2¯m CMOS process, and dissipates 1:2 mW. 1 Introduction Exact gradientdescent algorithms for supervised learning in dynamic recurrent networks [13] are fairly complex and do not provide for a ...
Image sharpness and beam focus vlsi sensors for adaptive optics
 IEEE Sensors Journal
, 2002
"... Abstract—Highresolution wavefront control for adaptive optics requires accurate sensing of a measure of optical quality. We present two analog verylargescaleintegration (VLSI) imageplane sensors that supply realtime metrics of image and beam quality, for applications in imaging and lineofsig ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
Abstract—Highresolution wavefront control for adaptive optics requires accurate sensing of a measure of optical quality. We present two analog verylargescaleintegration (VLSI) imageplane sensors that supply realtime metrics of image and beam quality, for applications in imaging and lineofsight laser communication. The image metric VLSI sensor quantifies sharpness of the received image in terms of average rectified spatial gradients. The beam metric VLSI sensor returns first and second order spatial moments of the received laser beam to quantify centroid and width. Closedloop wavefront control of a laser beam through turbulence is demonstrated using a spatial phase modulator and analog VLSI controller that performs stochastic parallel gradient descent of the beam width metric. Index Terms—Adaptive optics, analog very large scale integration (VLSI), focalplane image processing, image sensors, optical communication. I.
Accurate and Precise Computation using Analog VLSI, with Applications to Computer Graphics and Neural Networks
, 1993
"... This thesis develops an engineering practice and design methodology to enable us to use CMOS analog VLSI chips to perform more accurate and precise computation. These techniques form the basis of an approach that permits us to build computer graphics and neural network applications using analog VLSI ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
This thesis develops an engineering practice and design methodology to enable us to use CMOS analog VLSI chips to perform more accurate and precise computation. These techniques form the basis of an approach that permits us to build computer graphics and neural network applications using analog VLSI. The nature of the design methodology focuses on defining goals for circuit behavior to be met as part of the design process. To increase the accuracy of analog computation, we develop techniques for creating compensated circuit building blocks, where compensation implies the cancellation of device variations, offsets, and nonlinearities. These compensated building blocks can be used as components in larger and more complex circuits, which can then also be compensated. To this end, we develop techniques for automatically determining appropriate parameters for circuits, using constrained optimization. We also fabricate circuits that implement multidimensional gradient estimation for a grad...
An Investigation of the Gradient Descent Process in Neural Networks
, 1996
"... not be interpreted as representing Usually gradient descent is merely a way to find a minimum, abandoned if a more efficient technique is available. Here we investigate the detailed properties of the gradient descent process, and the related topics of how gradients can be computed, what the limitati ..."
Abstract
 Add to MetaCart
not be interpreted as representing Usually gradient descent is merely a way to find a minimum, abandoned if a more efficient technique is available. Here we investigate the detailed properties of the gradient descent process, and the related topics of how gradients can be computed, what the limitations on gradient descent are, and how the secondorder information that governs the dynamics of gradient descent can be probed. To develop our intuitions, gradient descent is applied to a simple robot arm dynamics compensation problem, using backpropagation on a temporal windows architecture. The results suggest that smooth filters can be easily learned, but that the deterministic gradient descent process can be slow and can exhibit oscillations. Algorithms to compute the gradient of recurrent networks are then surveyed in a general framework, leading to some unifications, a deeper understanding of recurrent networks, and some algorithmic extensions. By regarding deterministic gradient descent as a dynamic system we obtain results concerning its convergence, and a quantitative theory of its behavior