Results 1 
4 of
4
Fast Exact Multiplication by the Hessian
 Neural Computation
, 1994
"... Just storing the Hessian H (the matrix of second derivatives d^2 E/dw_i dw_j of the error E with respect to each pair of weights) of a large neural network is difficult. Since a common use of a large matrix like H is to compute its product with various vectors, we derive a technique that directly ca ..."
Abstract

Cited by 70 (4 self)
 Add to MetaCart
Just storing the Hessian H (the matrix of second derivatives d^2 E/dw_i dw_j of the error E with respect to each pair of weights) of a large neural network is difficult. Since a common use of a large matrix like H is to compute its product with various vectors, we derive a technique that directly calculates Hv, where v is an arbitrary vector. This allows H to be treated as a generalized sparse matrix. To calculate Hv, we first define a differential operator R{f(w)} = (d/dr)f(w + rv)_{r=0}, note that R{grad_w} = Hv and R{w} = v, and then apply R{} to the equations used to compute grad_w. The result is an exact and numerically stable procedure for computing Hv, which takes about as much computation, and is about as local, as a gradient evaluation. We then apply the technique to backpropagation networks, recurrent backpropagation, and stochastic Boltzmann Machines. Finally, we show that this technique can be used at the heart of many iterative techniques for computing various properties of H, obviating the need for direct methods.
Computing Second Derivatives in FeedForward Networks: a Review
 IEEE Transactions on Neural Networks
, 1994
"... . The calculation of second derivatives is required by recent training and analyses techniques of connectionist networks, such as the elimination of superfluous weights, and the estimation of confidence intervals both for weights and network outputs. We here review and develop exact and approximate ..."
Abstract

Cited by 27 (4 self)
 Add to MetaCart
. The calculation of second derivatives is required by recent training and analyses techniques of connectionist networks, such as the elimination of superfluous weights, and the estimation of confidence intervals both for weights and network outputs. We here review and develop exact and approximate algorithms for calculating second derivatives. For networks with jwj weights, simply writing the full matrix of second derivatives requires O(jwj 2 ) operations. For networks of radial basis units or sigmoid units, exact calculation of the necessary intermediate terms requires of the order of 2h + 2 backward/forwardpropagation passes where h is the number of hidden units in the network. We also review and compare three approximations (ignoring some components of the second derivative, numerical differentiation, and scoring). Our algorithms apply to arbitrary activation functions, networks, and error functions (for instance, with connections that skip layers, or radial basis functions, or ...
New millennium AI and the convergence of history
 Challenges to Computational Intelligence
, 2007
"... Artificial Intelligence (AI) has recently become a real formal science: the new millennium brought the first mathematically sound, asymptotically optimal, universal problem solvers, providing a new, rigorous foundation for the previously largely heuristic field of General AI and embedded agents. At ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
Artificial Intelligence (AI) has recently become a real formal science: the new millennium brought the first mathematically sound, asymptotically optimal, universal problem solvers, providing a new, rigorous foundation for the previously largely heuristic field of General AI and embedded agents. At the same time there has been rapid progress in practical methods for learning true sequenceprocessing programs, as opposed to traditional methods limited to stationary pattern association. Here we will briefly review some of the new results, and speculate about future developments, pointing out that the time intervals between the most notable events in over 40,000 years or 2 9 lifetimes of human history have sped up exponentially, apparently converging to zero within the next few decades. Or is this impression just a byproduct of the way humans allocate memory space to past events? 1
Unified Formulation for Training Recurrent Networks with Derivative Adaptive Critics
 in PROC. International Conference on Neural Networks ICNN’97 in
"... We present a procedure for obtaining derivatives used in training a recurrent network that combines in a unified framework the techniques of backpropagation through time and derivative adaptive critics. The resulting formulation is consistent with previous descriptions, but has the advantage of allo ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
We present a procedure for obtaining derivatives used in training a recurrent network that combines in a unified framework the techniques of backpropagation through time and derivative adaptive critics. The resulting formulation is consistent with previous descriptions, but has the advantage of allowing the mentioned techniques to be used together in a proportion that is appropriate to a given problem. 1 Introduction Substantial interest has been generated regarding the use of various methods which can be regarded as forms of approximate dynamic programming (ADP). Commonly used terms for such methods include adaptive critics, reinforcement learning, heuristic dynamic programming, neurodynamic programming, and others. By most measures, systems that involve discrete states have dominated research in this area. Recently, however, attention is being directed to systems with continuous state variables. Several examples of the application of ADP to such systems have appeared, but most if no...