Results 1 
6 of
6
Fast Exact Multiplication by the Hessian
 Neural Computation
, 1994
"... Just storing the Hessian H (the matrix of second derivatives d^2 E/dw_i dw_j of the error E with respect to each pair of weights) of a large neural network is difficult. Since a common use of a large matrix like H is to compute its product with various vectors, we derive a technique that directly ca ..."
Abstract

Cited by 70 (4 self)
 Add to MetaCart
Just storing the Hessian H (the matrix of second derivatives d^2 E/dw_i dw_j of the error E with respect to each pair of weights) of a large neural network is difficult. Since a common use of a large matrix like H is to compute its product with various vectors, we derive a technique that directly calculates Hv, where v is an arbitrary vector. This allows H to be treated as a generalized sparse matrix. To calculate Hv, we first define a differential operator R{f(w)} = (d/dr)f(w + rv)_{r=0}, note that R{grad_w} = Hv and R{w} = v, and then apply R{} to the equations used to compute grad_w. The result is an exact and numerically stable procedure for computing Hv, which takes about as much computation, and is about as local, as a gradient evaluation. We then apply the technique to backpropagation networks, recurrent backpropagation, and stochastic Boltzmann Machines. Finally, we show that this technique can be used at the heart of many iterative techniques for computing various properties of H, obviating the need for direct methods.
Efficient Training of FeedForward Neural Networks
, 1997
"... : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 61 A.2 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 61 A.2.1 Motivation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 61 A.3 Optimization strategy : : : : : : : : : : : : ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 61 A.2 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 61 A.2.1 Motivation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 61 A.3 Optimization strategy : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 62 A.4 The Backpropagation algorithm : : : : : : : : : : : : : : : : : : : : : : : : 63 A.5 Conjugate direction methods : : : : : : : : : : : : : : : : : : : : : : : : : : 63 A.5.1 Conjugate gradients : : : : : : : : : : : : : : : : : : : : : : : : : : 65 A.5.2 The CGL algorithm : : : : : : : : : : : : : : : : : : : : : : : : : : : 67 A.5.3 The BFGS algorithm : : : : : : : : : : : : : : : : : : : : : : : : : : 67 A.6 The SCG algorithm : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 67 A.7 Test results : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 70 A.7.1 Comparison metric : : : : : : : : : : : : : : : : : : : : : : : :...
Exact Calculation of the Product of the Hessian Matrix of FeedForward Network Error Functions and a Vector in O(N) Time
"... Several methods for training feedforward neural networks require second order information from the Hessian matrix of the error function. Although it is possible to calculate the Hessian matrix exactly it is often not desirable because of the computation and memory requirements involved. Some learni ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
Several methods for training feedforward neural networks require second order information from the Hessian matrix of the error function. Although it is possible to calculate the Hessian matrix exactly it is often not desirable because of the computation and memory requirements involved. Some learning techniques does, however, only need the Hessian matrix times a vector. This paper presents a method to calculate the Hessian matrix times a vector in O(N ) time, where N is the number of variables in the network. This is in the same order as the calculation of the gradient to the error function. The usefulness of this algorithm is demonstrated by improvement of existing learning techniques. 1 Introduction The second derivative information of the error function associated with feedforward neural networks forms an N \Theta N matrix, which is usually referred to as the Hessian matrix. Second derivative information is needed in several learning algorithms, e.g., in some conjugate gradient a...
Performance Evaluation of Feedforward Networks Using Computational Methods
 In Proceedings of NEURAP'95. NEURAP
, 1996
"... . We will demonstrate that the performance evaluation of feedforward neural networks using computational methods may result in overly pessimistic estimates of the prediction error. In fact, they capture unwanted variability in the distribution of weights, introduced by local maxima in the likelihood ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
. We will demonstrate that the performance evaluation of feedforward neural networks using computational methods may result in overly pessimistic estimates of the prediction error. In fact, they capture unwanted variability in the distribution of weights, introduced by local maxima in the likelihood function in connection with deficiencies of gradient based learning procedures. An analysis of the influence of local maxima is hampered due to a nontrivial algebraic structure of the weight space; we will show that typical feedforward networks exhibit a large number of symmetries due to a nontrivial symmetry group acting on the weight space. We will present an algorithm which divides out these symmetries. In the resulting much smaller effective weight space, clustering algorithms may be used to improve the assessment of prediction errors. We will demonstrate that this method can be successfully applied. 1 Introduction Feedforward networks can be interpreted as a form of nonlinear regressi...
Two Papers on FeedForward Networks
, 1991
"... REPORT DOCUMENTATION PAGE I OMB No. 07040188 Pubtic reporting burden for this collection of information _s estimated to average 1 hour per response, including the time for reviewing instructlo=ns, searching existing data source_, gather ncj and maintaining the data needed and compieting and review ..."
Abstract
 Add to MetaCart
REPORT DOCUMENTATION PAGE I OMB No. 07040188 Pubtic reporting burden for this collection of information _s estimated to average 1 hour per response, including the time for reviewing instructlo=ns, searching existing data source_, gather ncj and maintaining the data needed and compieting and reviewing the col)ection of information Send comments rec_arding this burden estimate or any other a.sp_c_=t of t,hrs collection of information, including suggestions for reducing this Ouraen, to Washington Headquarters Services, Directorate tot reformat on Operat ons and Reports, 215 Jenersun
parameters such as cr and A, that is Pr(lw , or) = Pr(). Posterior probabilities of network weights are as follows. For regression with Gaussian error and unknown a,
"... This paper has covered Bayesian theory relevant to the problem of training feedforward connectionist networks. We now sketch out how this might be put together in practice, assuming a standard gradient descent algorithm as used during search ..."
Abstract
 Add to MetaCart
This paper has covered Bayesian theory relevant to the problem of training feedforward connectionist networks. We now sketch out how this might be put together in practice, assuming a standard gradient descent algorithm as used during search