Results 1  10
of
22
Fast Exact Multiplication by the Hessian
 Neural Computation
, 1994
"... Just storing the Hessian H (the matrix of second derivatives d^2 E/dw_i dw_j of the error E with respect to each pair of weights) of a large neural network is difficult. Since a common use of a large matrix like H is to compute its product with various vectors, we derive a technique that directly ca ..."
Abstract

Cited by 70 (4 self)
 Add to MetaCart
Just storing the Hessian H (the matrix of second derivatives d^2 E/dw_i dw_j of the error E with respect to each pair of weights) of a large neural network is difficult. Since a common use of a large matrix like H is to compute its product with various vectors, we derive a technique that directly calculates Hv, where v is an arbitrary vector. This allows H to be treated as a generalized sparse matrix. To calculate Hv, we first define a differential operator R{f(w)} = (d/dr)f(w + rv)_{r=0}, note that R{grad_w} = Hv and R{w} = v, and then apply R{} to the equations used to compute grad_w. The result is an exact and numerically stable procedure for computing Hv, which takes about as much computation, and is about as local, as a gradient evaluation. We then apply the technique to backpropagation networks, recurrent backpropagation, and stochastic Boltzmann Machines. Finally, we show that this technique can be used at the heart of many iterative techniques for computing various properties of H, obviating the need for direct methods.
A Fast Stochastic ErrorDescent Algorithm for Supervised Learning and Optimization
 In
, 1993
"... A parallel stochastic algorithm is investigated for errordescent learning and optimization in deterministic networks of arbitrary topology. No explicit information about internal network structure is needed. The method is based on the modelfree distributed learning mechanism of Dembo and Kaila ..."
Abstract

Cited by 35 (7 self)
 Add to MetaCart
A parallel stochastic algorithm is investigated for errordescent learning and optimization in deterministic networks of arbitrary topology. No explicit information about internal network structure is needed. The method is based on the modelfree distributed learning mechanism of Dembo and Kailath. A modified parameter update rule is proposed by which each individual parameter vector perturbation contributes a decrease in error. A substantially faster learning speed is hence allowed. Furthermore, the modified algorithm supports learning timevarying features in dynamical networks. We analyze the convergence and scaling properties of the algorithm, and present simulation results for dynamic trajectory learning in recurrent networks. 1 Background and Motivation We address general optimization tasks that require finding a set of constant parameter values p i that minimize a given error functional E(p). For supervised learning, the error functional consists of some quantitativ...
Alopex: a correlationbased learning algorithm for feedforward and recurrent neural networks
 Neural Computation
, 1994
"... We present a learning algorithm for neural networks, called Alopex. Instead of error gradient, Alopex uses local correlations between changes in individual weights and changes in the global error measure. The algorithm does not make any assumptions about transfer functions of individual neurons, an ..."
Abstract

Cited by 24 (1 self)
 Add to MetaCart
We present a learning algorithm for neural networks, called Alopex. Instead of error gradient, Alopex uses local correlations between changes in individual weights and changes in the global error measure. The algorithm does not make any assumptions about transfer functions of individual neurons, and does not explicitly depend on the functional form of the error measure. Hence, it can be used in networks with arbitrary transfer functions and for minimizing a large class of error measures. The learning algorithm is the same for feedforward and recurrent networks. All the weights in a network are updated simultaneously, using only local computations. This allows complete parallelization of the algorithm. The algorithm is stochastic and it uses a ‘temperature ’ parameter in a manner similar to that in simulated annealing. A heuristic ‘ annealing schedule ’ is presented which is effective in finding global minima of error surfaces. In this paper, we report extensive simulation studies illustrating these advantages and show that learning times are comparable to those for standard gradient descent methods. Feedforward networks trained with Alopex are used to solve the MONK’s problems and symmetry problems. Recurrent networks trained with the same algorithm are used for solving temporal XOR problems. Scaling properties of the algorithm are demonstrated using encoder problems of different sizes and advantages of appropriate error measures are illustrated using a variety of problems.
Neural Network Adaptations to Hardware Implementations
, 1997
"... In order to take advantage of the massive parallelism offered by artificial neural networks, hardware implementations are essential. However, most standard neural network models are not very suitable for implementation in hardware and adaptations are needed. In this section an overview is given of t ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
In order to take advantage of the massive parallelism offered by artificial neural networks, hardware implementations are essential. However, most standard neural network models are not very suitable for implementation in hardware and adaptations are needed. In this section an overview is given of the various issues that are encountered when mapping an ideal neural network model onto a compact and reliable neural network hardware implementation, like quantization, handling nonuniformities and nonideal responses, and restraining computational complexity. Furthermore, a broad range of hardwarefriendly learning rules is presented, which allow for simpler and more reliable hardware implementations. The relevance of these neural network adaptations to hardware is illustrated by their application in existing hardware implementations.
Analog VLSI Stochastic Perturbative Learning Architectures
 J. Analog Integrated Circuits and Signal Processing
, 1997
"... We present analog VLSI neuromorphic architectures for a general class of learning tasks, which include supervised learning, reinforcement learning, and temporal di erence learning. The presented architectures are parallel, cellular, sparse in global interconnects, distributed in representation, and ..."
Abstract

Cited by 15 (7 self)
 Add to MetaCart
We present analog VLSI neuromorphic architectures for a general class of learning tasks, which include supervised learning, reinforcement learning, and temporal di erence learning. The presented architectures are parallel, cellular, sparse in global interconnects, distributed in representation, and robust to noise and mismatches in the implementation. They use a parallel stochastic perturbation technique to estimate the e ect of weight changes on network outputs, rather than calculating derivatives based on a model of the network. This \modelfree " technique avoids errors due to mismatchesinthephysical implementation of the network, and more generally allows to train networks of which the exact characteristics and structure are not known. With additional mechanisms of reinforcement learning, networks of fairly general structure are trained e ectively from an arbitrarily supplied reward signal. No prior assumptions are required on the structure of the network nor on the speci cs of the desired network response.
Learning Rules for NeuroController via Simultaneous Perturbation
 IEEE Trans. Neural Networks
, 1997
"... Abstract—This paper describes learning rules using simultaneous perturbation for a neurocontroller that controls an unknown plant. When we apply a direct control scheme by a neural network, the neural network must learn an inverse system of the unknown plant. In the case, we must know the sensitivi ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
Abstract—This paper describes learning rules using simultaneous perturbation for a neurocontroller that controls an unknown plant. When we apply a direct control scheme by a neural network, the neural network must learn an inverse system of the unknown plant. In the case, we must know the sensitivity function of the plant to use a kind of the gradient method as a learning rule of the neural network. On the other hand, the learning rules described here do not require information about the sensitivity function. Some numerical simulations of a twolink planar arm and a tracking problem for a nonlinear dynamic plant are shown. Index Terms — Dynamical systems, indirect inverse modeling, neural networks, neurocontroller, simultaneous perturbation, tracking problems. Fig. 1. Basic arrangements for indirect inverse modeling. I.
A Learning Analog Neural Network Chip with ContinuousTime Recurrent Dynamics
 In
, 1994
"... We present experimental results on supervised learning of dynamical features in an analog VLSI neural network chip. The recurrent network, containing six continuoustime analog neurons and 42 free parameters (connection strengths and thresholds), is trained to generate timevarying outputs approxima ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
We present experimental results on supervised learning of dynamical features in an analog VLSI neural network chip. The recurrent network, containing six continuoustime analog neurons and 42 free parameters (connection strengths and thresholds), is trained to generate timevarying outputs approximating given periodic signals presented to the network. The chip implements a stochastic perturbative algorithm, which observes the error gradient along random directions in the parameter space for errordescent learning. In addition to the integrated learning functions and the generation of pseudorandom perturbations, the chip provides for teacher forcing and longterm storage of the volatile parameters. The network learns a 1 kHz circular trajectory in 100 sec. The chip occupies 2mm \Theta 2mm in a 2¯m CMOS process, and dissipates 1:2 mW. 1 Introduction Exact gradientdescent algorithms for supervised learning in dynamic recurrent networks [13] are fairly complex and do not provide for a ...
HardwareFriendly Learning Algorithms for Neural Networks: an Overview
, 1996
"... The hardware implementation of artificial neural networks and their learning algorithms is a fascinating area of research with farreaching applications. However, the mapping from an ideal mathematical model to compact and reliable hardware is far from evident. This paper presents an overview of var ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
The hardware implementation of artificial neural networks and their learning algorithms is a fascinating area of research with farreaching applications. However, the mapping from an ideal mathematical model to compact and reliable hardware is far from evident. This paper presents an overview of various methods that simplify the hardware implementation of neural network models. Adaptations that are proper to specific learning rules or network architectures are discussed. These range from the use of perturbation in multilayer feedforward networks and local learning algorithms to quantization effects in selforganizing feature maps. Moreover, in more general terms, the problems of inaccuracy, limited precision, and robustness are treated.
Multichannel coherent detection for delayinsensitive modelfree adaptive control
 in Proc. Int. Symp. Circuits and Systems (ISCAS ’07
, 2007
"... Abstract — A mixedsignal architecture for continuoustime multidimensional modelfree optimization is presented. It is based on multichannel coherent modulation and detection that reliably estimates the objective function’s gradient, with respect to the system parameters, in the presence of time d ..."
Abstract

Cited by 5 (5 self)
 Add to MetaCart
Abstract — A mixedsignal architecture for continuoustime multidimensional modelfree optimization is presented. It is based on multichannel coherent modulation and detection that reliably estimates the objective function’s gradient, with respect to the system parameters, in the presence of time delays. The narrowband nature of the excitation signals reduces the unknown dynamics of the objective function to a single parameter per control channel, the phase delay. An efficient implementation of the adaptive control architecture is presented; it incorporates parallel control channels with individually selectable 6level phase delay adjustment. Initial experimental results indicate wide operating range covering almost 7 decades of excitation frequencies. I.
Accurate and Precise Computation using Analog VLSI, with Applications to Computer Graphics and Neural Networks
, 1993
"... This thesis develops an engineering practice and design methodology to enable us to use CMOS analog VLSI chips to perform more accurate and precise computation. These techniques form the basis of an approach that permits us to build computer graphics and neural network applications using analog VLSI ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
This thesis develops an engineering practice and design methodology to enable us to use CMOS analog VLSI chips to perform more accurate and precise computation. These techniques form the basis of an approach that permits us to build computer graphics and neural network applications using analog VLSI. The nature of the design methodology focuses on defining goals for circuit behavior to be met as part of the design process. To increase the accuracy of analog computation, we develop techniques for creating compensated circuit building blocks, where compensation implies the cancellation of device variations, offsets, and nonlinearities. These compensated building blocks can be used as components in larger and more complex circuits, which can then also be compensated. To this end, we develop techniques for automatically determining appropriate parameters for circuits, using constrained optimization. We also fabricate circuits that implement multidimensional gradient estimation for a grad...