Results 1  10
of
11
First and SecondOrder Methods for Learning: between Steepest Descent and Newton's Method
 Neural Computation
, 1992
"... Online first order backpropagation is sufficiently fast and effective for many largescale classification problems but for very high precision mappings, batch processing may be the method of choice. This paper reviews first and secondorder optimization methods for learning in feedforward neura ..."
Abstract

Cited by 174 (7 self)
 Add to MetaCart
Online first order backpropagation is sufficiently fast and effective for many largescale classification problems but for very high precision mappings, batch processing may be the method of choice. This paper reviews first and secondorder optimization methods for learning in feedforward neural networks. The viewpoint is that of optimization: many methods can be cast in the language of optimization techniques, allowing the transfer to neural nets of detailed results about computational complexity and safety procedures to ensure convergence and to avoid numerical problems. The review is not intended to deliver detailed prescriptions for the most appropriate methods in specific applications, but to illustrate the main characteristics of the different methods and their mutual relations.
Computing gradients in largescale optimization using automatic differentiation
 INFORMS J. COMPUTING
, 1997
"... The accurate and ecient computation of gradients for partially separable functions is central to the solution of largescale optimization problems, since these functions are ubiquitous in largescale problems. We describe two approaches for computing gradients of partially separable functions via au ..."
Abstract

Cited by 31 (11 self)
 Add to MetaCart
The accurate and ecient computation of gradients for partially separable functions is central to the solution of largescale optimization problems, since these functions are ubiquitous in largescale problems. We describe two approaches for computing gradients of partially separable functions via automatic differentiation. In our experiments we employ the ADIFOR (Automatic Differentiation of Fortran) tool and the SparsLinC (Sparse Linear Combination) library. We use applications from the MINPACK2 test problem collection to compare the numerical reliability and computational efficiency of these approaches with handcoded derivatives and approximations based on differences of function values. Our conclusion is that automatic differentiation is the method of choice, providing code for the efficient computation of the gradient without the need for tedious handcoding.
A generalized learning paradigm exploiting the structure of feedforward neural networks
 IEEE Trans. Neural Networks
, 1996
"... In this paper a general class of fast learning algorithms for feedforward neural networks is introduced and described. The approach exploits the separability of each layer into linear and nonlinear blocks and consists of two steps. The first step is the descent of the error functional in the space o ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
(Show Context)
In this paper a general class of fast learning algorithms for feedforward neural networks is introduced and described. The approach exploits the separability of each layer into linear and nonlinear blocks and consists of two steps. The first step is the descent of the error functional in the space of the outputs of the linear blocks (descent in the neuron space), which can be performed using any preferred optimization strategy. In the second step, each linear block is optimized separately by using a Least Squares (LS) criterion. To demonstrate the effectiveness of the new approach, a detailed treatment of a gradient descent in the neuron space is conducted. The main properties of this approach are the higher speed of convergence with respect to methods that employ an ordinary gradient descent in the weight space (Backpropagation, BP), better numerical conditioning and lower computational cost compared to techniques based on the Hessian matrix. The numerical stability is assured by the use of robust LS linear system solvers, operating directly on the input data of each layer. Experimental results obtained in three problems are described, which confirm the effectiveness of the new method.
A Multidimensional Filter Algorithm for Nonlinear Equations and Nonlinear Least Squares
 SIAM J. Optim
, 2003
"... We introduce a new algorithm for the solution of systems of nonlinear equations and nonlinear leastsquares problems that attempts to combine the eciency of lter techniques and the robustness of trustregion methods. The algorithm is shown, under reasonable assumptions, to globally converge to zero ..."
Abstract

Cited by 13 (6 self)
 Add to MetaCart
We introduce a new algorithm for the solution of systems of nonlinear equations and nonlinear leastsquares problems that attempts to combine the eciency of lter techniques and the robustness of trustregion methods. The algorithm is shown, under reasonable assumptions, to globally converge to zeros of the system, or to rstorder stationary points of the Euclidean norm of its residual. Preliminary numerical experience is presented that shows substantial gains in eciency over the traditional monotone trustregion approach.
LargeScale Nonlinear Constrained Optimization: A Current Survey
, 1994
"... . Much progress has been made in constrained nonlinear optimization in the past ten years, but most largescale problems still represent a considerable obstacle. In this survey paper we will attempt to give an overview of the current approaches, including interior and exterior methods and algorithm ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
. Much progress has been made in constrained nonlinear optimization in the past ten years, but most largescale problems still represent a considerable obstacle. In this survey paper we will attempt to give an overview of the current approaches, including interior and exterior methods and algorithms based upon trust regions and line searches. In addition, the importance of software, numerical linear algebra and testing will be addressed. We will try to explain why the difficulties arise, how attempts are being made to overcome them and some of the problems that still remain. Although there will be some emphasis on the LANCELOT and CUTE projects, the intention is to give a broad picture of the stateoftheart. 1 IBM T.J. Watson Research Center, P.O.Box 218, Yorktown Heights, NY 10598, USA 2 Parallel Algorithms Team, CERFACS, 42 Ave. G. Coriolis, 31057 Toulouse Cedex, France 3 Central Computing Department, Rutherford Appleton Laboratory, Chilton, Oxfordshire, OX11 0QX, England ...
A Fast, SpaceEfficient Algorithm for the Approximation of Images by an Optimal Sum of Gaussians
 In Graphics Interface
, 2000
"... Gaussian decomposition of images leads to many promising applications in computer graphics. Gaussian representations can be used for image smoothing, motion analysis, and feature selection for image recognition. Furthermore, image construction from a Gaussian representation is fast, since the Gaussi ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Gaussian decomposition of images leads to many promising applications in computer graphics. Gaussian representations can be used for image smoothing, motion analysis, and feature selection for image recognition. Furthermore, image construction from a Gaussian representation is fast, since the Gaussians only need to be added together. The most optimal algorithms [3, 6, 7] minimize the number of Gaussians needed for decomposition, but they involve nonlinear leastsquares approximations, e.g. the use of the Marquardt algorithm [10]. This presents a problem, since, in the Marquardt algorithm, enormous amounts of computations are required and the resulting matrices use a lot of space. In this work, a method is offered, which we call the Quickstep method, that substantially reduces the number of computations and the amount of space used. Unlike the Marquardt algorithm, each iteration has linear time complexity in the number of variables and no Jacobian or Hessian matrices are formed. Yet, Quickstep produces optimal results, similar to those produced by the Marquardt algorithm.
On iterative Krylovdogleg trustregion steps for solving neural networks nonlinear least squares problems
, 2000
"... This paper describes a method of dogleg trustregion steps, or restricted LevenbergMarquardt steps, based on a projection process onto the Krylov subspaces for neural networks nonlinear least squares problems. In particular, the linear conjugate gradient (CG) method works as the inner iterative ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
This paper describes a method of dogleg trustregion steps, or restricted LevenbergMarquardt steps, based on a projection process onto the Krylov subspaces for neural networks nonlinear least squares problems. In particular, the linear conjugate gradient (CG) method works as the inner iterative algorithm for solving the linearized GaussNewton normal equation, whereas the outer nonlinear algorithm repeatedly takes socalled "Krylovdogleg" steps, relying only on matrixvector multiplication without explicitly forming the Jacobian matrix or the GaussNewton model Hessian. That is, our iterative dogleg algorithm can reduce both operational counts and memory space by a factor of O(n) (the number of parameters) in comparison with a direct linearequation solver. This memoryless property is useful for largescale problems.
Improving the Decomposition of Partially Separable Functions in the Context of LargeScale Optimization: A First Approach
, 1993
"... . This paper examines the question of modifying the decomposition of a partially separable function in order to improve computational efficiency of largescale minimization algorithms using a conjugategradient inner iteration. The context and motivation are given and the application of a simple str ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
. This paper examines the question of modifying the decomposition of a partially separable function in order to improve computational efficiency of largescale minimization algorithms using a conjugategradient inner iteration. The context and motivation are given and the application of a simple strategy discussed on examples extracted from the CUTE test problem collection. 1 IBM T.J. Watson Research Center, P.O.Box 218, Yorktown Heights, NY 10598, USA Email : arconn@watson.ibm.com 2 CERFACS, 42 Avenue Gustave Coriolis, 31057 Toulouse Cedex, France, EC Email : gould@cerfacs.fr or nimg@directory.rl.ac.uk 3 Department of Mathematics, Facult'es Universitaires ND de la Paix, 61, rue de Bruxelles, B5000 Namur, Belgium, EC Email : pht@math.fundp.ac.be Keywords : exploitation of structure, algorithmic efficiency, partially separable functions. This research was supported in part by the Advanced Research Projects Agency of the Department of Defense and was monitored by the Air Fo...