• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Efficient training of feed-forward neural networks (1993)

by M Møller
Add To MetaCart

Tools

Sorted by:
Results 1 - 9 of 9

Gradient-based learning applied to document recognition

by Yann Lecun, Léon Bottou, Yoshua Bengio, Patrick Haffner - Proceedings of the IEEE , 1998
"... Multilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradientbased learning technique. Given an appropriate network architecture, gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify hi ..."
Abstract - Cited by 487 (38 self) - Add to MetaCart
Multilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradientbased learning technique. Given an appropriate network architecture, gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters, with minimal preprocessing. This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task. Convolutional neural networks, which are specifically designed to deal with the variability of two dimensional (2-D) shapes, are shown to outperform all other techniques. Real-life document recognition systems are composed of multiple modules including field extraction, segmentation, recognition, and language modeling. A new learning paradigm, called graph transformer networks (GTN’s), allows such multimodule systems to be trained globally using gradient-based methods so as to minimize an overall performance measure. Two systems for online handwriting recognition are described. Experiments demonstrate the advantage of global training, and the flexibility of graph transformer networks. A graph transformer network for reading a bank check is also described. It uses convolutional neural network character recognizers combined with global training techniques to provide record accuracy on business and personal checks. It is deployed commercially and reads several million checks per day.

Selected Training Exemplars for Neural Network Learning

by Mark Plutowski , 1994
"... The dissertation of Mark Plutowski is approved, and it is acceptable in quality and form for publication on microfilm: Co-Chair Co-Chair ..."
Abstract - Cited by 8 (0 self) - Add to MetaCart
The dissertation of Mark Plutowski is approved, and it is acceptable in quality and form for publication on microfilm: Co-Chair Co-Chair

Variable Selection with Neural Networks

by Tautvydas Cibas, Franoise Fogelman Soulié, Patrick Gallinari, Sarunas Raudys - Neurocomputing , 1996
"... this paper a regularization approach to variable selection, where the regularization term (gaussian and gaussian mixture priors) allows to discard, during training, least useful variables, and we compare performances to the stepwise procedure. When implementing these different methods, one has to de ..."
Abstract - Cited by 8 (3 self) - Add to MetaCart
this paper a regularization approach to variable selection, where the regularization term (gaussian and gaussian mixture priors) allows to discard, during training, least useful variables, and we compare performances to the stepwise procedure. When implementing these different methods, one has to decide when to start (pruning, or regularizing) and how much (how many weights, or with how large a regularizing factor). Techniques such as OCD require that the optimum be reached. However, extensive evidence [Finnoff et al., 93] shows that it is more efficient to use "non convergent methods", where modification is started before full convergence is reached. We thus also compare our regularization approach to the pruning technique described in [Cibas et al., 94]. We illustrate our method on two relatively small problems: prediction of a synthetic time series and classification of waveforms [Breiman et al., 84]. The paper is organized as follows: section 2 introduces notations and results from the literature; section 3 problems used to test our methods and section 4 variable selection by regularization. 2. Variable Selection 2.1. Definitions Let a random variable pair (X, Y) N x p be given, with probability distribution P. Based on a sample D m ={(x 1 ,y 1 )...(x m ,y m )}, drawn from (X, Y), we train a NN a, and produce an estimator F , which depends upon a and D m . Different nets may have different performances: here we select nets depending upon their respective empirical errors. In this paper, we chiefly compare nets differing only in their input dimension: some components of the original input X N are eliminated to produce a vector x

Augmented Statistical Models for Classifying Sequence Data

by Martin Layton , 2006
"... Declaration This dissertation is the result of my own work and includes nothing that is the outcome of work done in collaboration. It has not been submitted in whole or in part for a degree at any other university. Some of the work has been published previously in conference proceedings [66,69], two ..."
Abstract - Cited by 7 (0 self) - Add to MetaCart
Declaration This dissertation is the result of my own work and includes nothing that is the outcome of work done in collaboration. It has not been submitted in whole or in part for a degree at any other university. Some of the work has been published previously in conference proceedings [66,69], two journal articles [36,68], two workshop papers [35,67] and a tech-nical report [65]. The length of this thesis including appendices, bibliography, footnotes, tables and equations is approximately 60,000 words. This thesis contains 27 figures and 20 tables. i

Linear-Least-Squares Initialization of Multilayer Perceptrons Through Backpropagation of the Desired Response

by Deniz Erdogmus, Oscar Fontenla-romero, Jose C. Principe, Amparo Alonso-betanzos, Enrique Castillo
"... Abstract—Training multilayer neural networks is typically carried out using descent techniques such as the gradient-based backpropagation (BP) of error or the quasi-Newton approaches including the Levenberg–Marquardt algorithm. This is basically due to the fact that there are no analytical methods t ..."
Abstract - Cited by 4 (1 self) - Add to MetaCart
Abstract—Training multilayer neural networks is typically carried out using descent techniques such as the gradient-based backpropagation (BP) of error or the quasi-Newton approaches including the Levenberg–Marquardt algorithm. This is basically due to the fact that there are no analytical methods to find the optimal weights, so iterative local or global optimization techniques are necessary. The success of iterative optimization procedures is strictly dependent on the initial conditions, therefore, in this paper, we devise a principled novel method of backpropagating the desired response through the layers of a multilayer perceptron (MLP), which enables us to accurately initialize these neural networks in the minimum mean-square-error sense, using the analytic linear least squares solution. The generated solution can be used as an initial condition to standard iterative optimization algorithms. However, simulations demonstrate that in most cases, the performance achieved through the proposed initialization scheme leaves little room for further improvement in the mean-square-error (MSE) over the training set. In addition, the performance of the network optimized with the proposed approach also generalizes well to testing data. A rigorous derivation of the initialization algorithm is presented and its high performance is verified with a number of benchmark training problems including chaotic time-series prediction, classification, and nonlinear system identification with MLPs. Index Terms—Approximate least-squares training of multilayer perceptrons (MLPs), backpropagation (BP) of desired response, neural network initialization. I.

Another hybrid algorithm for finding a global mimimum of MLP error functions

by Bruno Orsier, This Report Presents P , 1996
"... This report presents P scg , a new global optimization method for training multilayered perceptrons. Instead of local minima, global minima of the error function are found. This new method is hybrid in the sense that it combines three very different optimization techniques: Random Line Search, S ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
This report presents P scg , a new global optimization method for training multilayered perceptrons. Instead of local minima, global minima of the error function are found. This new method is hybrid in the sense that it combines three very different optimization techniques: Random Line Search, Scaled Conjugate Gradient and a 1-dimensional minimization algorithm named P . The best points of each component are retained by the hybrid method: simplicity of Random Line Search, efficiency of Scaled Conjugate Gradient, efficiency and convergence toward a global minimum for P . P scg is empirically shown to perform better or much better than three other global random optimization methods and a global deterministic optimization method. The aim of this research is to provide easy-to-use learning methods for several research projects; in particular these methods will be employed by knowledge-based systems. Changes from previous version of January 19, 1996 : in sections 7.4.2 and 7.4...

Neural networks for financial time series prediction: Overview over recent research

by Dimitri Pissarenko , 2002
"... ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
Abstract not found

Multiple Multivariate Regression And Global Sequence Optimization: An Application To Large Scale Models Of Radiation Intensity.

by To Appear, H. Zaragoza
"... We investigate the strengths and weaknesses of several neural network architectures for a largescale thermodynamical application in which sequences of measurements from gas columns must be integrated to construct the columns` spectral radiation intensity profiles. This is a problem of interest for t ..."
Abstract - Add to MetaCart
We investigate the strengths and weaknesses of several neural network architectures for a largescale thermodynamical application in which sequences of measurements from gas columns must be integrated to construct the columns` spectral radiation intensity profiles. This is a problem of interest for the aeronautical industry. The approaches proposed for its solution can be applied to a wide range of signal problems. Physical models often make use of a number of fitted functions as a simplified parametric base to approximate a high-dimensional nonlinear (and usually computationally intractable) function. Realistically models of radiation contain thousands of fitted functions. The use of Neural Networks in applications of this scale are rare, and most effective conjunctions techniques rely on cross-validation methods or involve other heavy computational overhead that are impracticable when a very large number of models need to be trained. We have employed here two different approaches: mul...

Contents lists available at ScienceDirect Neural Networks

by Luís M. Silva A, J. Marques De Sá A, Luís A. Alex, Re C
"... journal homepage: www.elsevier.com/locate/neunet ..."
Abstract - Add to MetaCart
journal homepage: www.elsevier.com/locate/neunet
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University