Results 1 -
8 of
8
First and Second-Order Methods for Learning: between Steepest Descent and Newton's Method
- Neural Computation
, 1992
"... On-line first order backpropagation is sufficiently fast and effective for many large-scale classification problems but for very high precision mappings, batch processing may be the method of choice. This paper reviews first- and second-order optimization methods for learning in feedforward neura ..."
Abstract
-
Cited by 108 (6 self)
- Add to MetaCart
On-line first order backpropagation is sufficiently fast and effective for many large-scale classification problems but for very high precision mappings, batch processing may be the method of choice. This paper reviews first- and second-order optimization methods for learning in feedforward neural networks. The viewpoint is that of optimization: many methods can be cast in the language of optimization techniques, allowing the transfer to neural nets of detailed results about computational complexity and safety procedures to ensure convergence and to avoid numerical problems. The review is not intended to deliver detailed prescriptions for the most appropriate methods in specific applications, but to illustrate the main characteristics of the different methods and their mutual relations.
A generalized learning paradigm exploiting the structure of feedforward neural networks
- IEEE Trans. Neural Networks
, 1996
"... In this paper a general class of fast learning algorithms for feedforward neural networks is introduced and described. The approach exploits the separability of each layer into linear and nonlinear blocks and consists of two steps. The first step is the descent of the error functional in the space o ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
In this paper a general class of fast learning algorithms for feedforward neural networks is introduced and described. The approach exploits the separability of each layer into linear and nonlinear blocks and consists of two steps. The first step is the descent of the error functional in the space of the outputs of the linear blocks (descent in the neuron space), which can be performed using any preferred optimization strategy. In the second step, each linear block is optimized separately by using a Least Squares (LS) criterion. To demonstrate the effectiveness of the new approach, a detailed treatment of a gradient descent in the neuron space is conducted. The main properties of this approach are the higher speed of convergence with respect to methods that employ an ordinary gradient descent in the weight space (Backpropagation, BP), better numerical conditioning and lower computational cost compared to techniques based on the Hessian matrix. The numerical stability is assured by the use of robust LS linear system solvers, operating directly on the input data of each layer. Experimental results obtained in three problems are described, which confirm the effectiveness of the new method.
Large-Scale Nonlinear Constrained Optimization: A Current Survey
, 1994
"... . Much progress has been made in constrained nonlinear optimization in the past ten years, but most large-scale problems still represent a considerable obstacle. In this survey paper we will attempt to give an overview of the current approaches, including interior and exterior methods and algorithm ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
. Much progress has been made in constrained nonlinear optimization in the past ten years, but most large-scale problems still represent a considerable obstacle. In this survey paper we will attempt to give an overview of the current approaches, including interior and exterior methods and algorithms based upon trust regions and line searches. In addition, the importance of software, numerical linear algebra and testing will be addressed. We will try to explain why the difficulties arise, how attempts are being made to overcome them and some of the problems that still remain. Although there will be some emphasis on the LANCELOT and CUTE projects, the intention is to give a broad picture of the state-of-the-art. 1 IBM T.J. Watson Research Center, P.O.Box 218, Yorktown Heights, NY 10598, USA 2 Parallel Algorithms Team, CERFACS, 42 Ave. G. Coriolis, 31057 Toulouse Cedex, France 3 Central Computing Department, Rutherford Appleton Laboratory, Chilton, Oxfordshire, OX11 0QX, England ...
Improving the Decomposition of Partially Separable Functions in the Context of Large-Scale Optimization: A First Approach
, 1993
"... . This paper examines the question of modifying the decomposition of a partially separable function in order to improve computational efficiency of large-scale minimization algorithms using a conjugate-gradient inner iteration. The context and motivation are given and the application of a simple str ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
. This paper examines the question of modifying the decomposition of a partially separable function in order to improve computational efficiency of large-scale minimization algorithms using a conjugate-gradient inner iteration. The context and motivation are given and the application of a simple strategy discussed on examples extracted from the CUTE test problem collection. 1 IBM T.J. Watson Research Center, P.O.Box 218, Yorktown Heights, NY 10598, USA Email : arconn@watson.ibm.com 2 CERFACS, 42 Avenue Gustave Coriolis, 31057 Toulouse Cedex, France, EC Email : gould@cerfacs.fr or nimg@directory.rl.ac.uk 3 Department of Mathematics, Facult'es Universitaires ND de la Paix, 61, rue de Bruxelles, B-5000 Namur, Belgium, EC Email : pht@math.fundp.ac.be Keywords : exploitation of structure, algorithmic efficiency, partially separable functions. This research was supported in part by the Advanced Research Projects Agency of the Department of Defense and was monitored by the Air Fo...
A Fast, Space-Efficient Algorithm for the Approximation of Images by an Optimal Sum of Gaussians
- In Graphics Interface
, 2000
"... Gaussian decomposition of images leads to many promising applications in computer graphics. Gaussian representations can be used for image smoothing, motion analysis, and feature selection for image recognition. Furthermore, image construction from a Gaussian representation is fast, since the Gaussi ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Gaussian decomposition of images leads to many promising applications in computer graphics. Gaussian representations can be used for image smoothing, motion analysis, and feature selection for image recognition. Furthermore, image construction from a Gaussian representation is fast, since the Gaussians only need to be added together. The most optimal algorithms [3, 6, 7] minimize the number of Gaussians needed for decomposition, but they involve nonlinear least-squares approximations, e.g. the use of the Marquardt algorithm [10]. This presents a problem, since, in the Marquardt algorithm, enormous amounts of computations are required and the resulting matrices use a lot of space. In this work, a method is offered, which we call the Quickstep method, that substantially reduces the number of computations and the amount of space used. Unlike the Marquardt algorithm, each iteration has linear time complexity in the number of variables and no Jacobian or Hessian matrices are formed. Yet, Quickstep produces optimal results, similar to those produced by the Marquardt algorithm.
A Multidimensional Filter Algorithm for Nonlinear Equations and Nonlinear Least Squares
- SIAM J. Optim
, 2003
"... We introduce a new algorithm for the solution of systems of nonlinear equations and nonlinear least-squares problems that attempts to combine the eciency of lter techniques and the robustness of trust-region methods. The algorithm is shown, under reasonable assumptions, to globally converge to zero ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
We introduce a new algorithm for the solution of systems of nonlinear equations and nonlinear least-squares problems that attempts to combine the eciency of lter techniques and the robustness of trust-region methods. The algorithm is shown, under reasonable assumptions, to globally converge to zeros of the system, or to rst-order stationary points of the Euclidean norm of its residual. Preliminary numerical experience is presented that shows substantial gains in eciency over the traditional monotone trust-region approach.
On iterative Krylov-dogleg trust-region steps for solving neural networks nonlinear least squares problems
, 2000
"... This paper describes a method of dogleg trust-region steps, or restricted Levenberg-Marquardt steps, based on a projection process onto the Krylov subspaces for neural networks nonlinear least squares problems. In particular, the linear conjugate gradient (CG) method works as the inner iterative ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This paper describes a method of dogleg trust-region steps, or restricted Levenberg-Marquardt steps, based on a projection process onto the Krylov subspaces for neural networks nonlinear least squares problems. In particular, the linear conjugate gradient (CG) method works as the inner iterative algorithm for solving the linearized Gauss-Newton normal equation, whereas the outer nonlinear algorithm repeatedly takes so-called "Krylov-dogleg" steps, relying only on matrix-vector multiplication without explicitly forming the Jacobian matrix or the Gauss-Newton model Hessian. That is, our iterative dogleg algorithm can reduce both operational counts and memory space by a factor of O(n) (the number of parameters) in comparison with a direct linear-equation solver. This memory-less property is useful for large-scale problems.
Local Convergence Of The Symmetric Rank-One Iteration
- Opt. Appl
, 1998
"... . We consider conditions under which the SR1 iteration is locally convergent. We apply the result to a pointwise structured SR1 method that has been used in optimal control. Key words. SR1 update, Pointwise quasi-Newton method, optimal control AMS(MOS) subject classifications. 47H17, 49K15, 65F10, ..."
Abstract
- Add to MetaCart
. We consider conditions under which the SR1 iteration is locally convergent. We apply the result to a pointwise structured SR1 method that has been used in optimal control. Key words. SR1 update, Pointwise quasi-Newton method, optimal control AMS(MOS) subject classifications. 47H17, 49K15, 65F10, 65H10, 49M15, 65J15, 65K10 1. Introduction. The symmetric rank-one (SR1) update [1] is a quasi-Newton method that preserves symmetry of an approximate Hessian (optimization problems) or Jacobian (nonlinear equations). The analysis in this paper is from the nonlinear equations point of view. Our purpose is to prove a local convergence result using the concept of uniform linear independence from [5], extend that result to structured updates where part of the Jacobian can be computed exactly, and then apply those results to the pointwise SR1 update considered in [14] in the context of optimal control. We we begin with a nonlinear equation F (x) = 0 (1.1) in R N . We make the standard assump...

