Results 1  10
of
12
Gradientbased learning applied to document recognition
 Proceedings of the IEEE
, 1998
"... Multilayer neural networks trained with the backpropagation algorithm constitute the best example of a successful gradientbased learning technique. Given an appropriate network architecture, gradientbased learning algorithms can be used to synthesize a complex decision surface that can classify hi ..."
Abstract

Cited by 731 (58 self)
 Add to MetaCart
Multilayer neural networks trained with the backpropagation algorithm constitute the best example of a successful gradientbased learning technique. Given an appropriate network architecture, gradientbased learning algorithms can be used to synthesize a complex decision surface that can classify highdimensional patterns, such as handwritten characters, with minimal preprocessing. This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task. Convolutional neural networks, which are specifically designed to deal with the variability of two dimensional (2D) shapes, are shown to outperform all other techniques. Reallife document recognition systems are composed of multiple modules including field extraction, segmentation, recognition, and language modeling. A new learning paradigm, called graph transformer networks (GTN’s), allows such multimodule systems to be trained globally using gradientbased methods so as to minimize an overall performance measure. Two systems for online handwriting recognition are described. Experiments demonstrate the advantage of global training, and the flexibility of graph transformer networks. A graph transformer network for reading a bank check is also described. It uses convolutional neural network character recognizers combined with global training techniques to provide record accuracy on business and personal checks. It is deployed commercially and reads several million checks per day.
Augmented Statistical Models for Classifying Sequence Data
, 2006
"... Declaration This dissertation is the result of my own work and includes nothing that is the outcome of work done in collaboration. It has not been submitted in whole or in part for a degree at any other university. Some of the work has been published previously in conference proceedings [66,69], two ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
Declaration This dissertation is the result of my own work and includes nothing that is the outcome of work done in collaboration. It has not been submitted in whole or in part for a degree at any other university. Some of the work has been published previously in conference proceedings [66,69], two journal articles [36,68], two workshop papers [35,67] and a technical report [65]. The length of this thesis including appendices, bibliography, footnotes, tables and equations is approximately 60,000 words. This thesis contains 27 figures and 20 tables. i
Application of neural network to turbulence control for drag reduction
 Phys. Fluids
, 1997
"... A new adaptive controller based on a neural network was constructed and applied to turbulent channel ow for drag reduction. A simple control network, which employs blowing and suction at the wall based only on quantities measured at the wall, was shown to reduce the skin friction by asmuch as 20 % i ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
A new adaptive controller based on a neural network was constructed and applied to turbulent channel ow for drag reduction. A simple control network, which employs blowing and suction at the wall based only on quantities measured at the wall, was shown to reduce the skin friction by asmuch as 20 % in direct numerical simulations of lowReynolds number turbulent channel ow. Also, a pattern was observed in the distribution of weights associated with the neural network. This allowed us to derive a simple control scheme that produced the same amount of drag reduction more e ciently. 1
Variable Selection with Neural Networks
 Neurocomputing
, 1996
"... this paper a regularization approach to variable selection, where the regularization term (gaussian and gaussian mixture priors) allows to discard, during training, least useful variables, and we compare performances to the stepwise procedure. When implementing these different methods, one has to de ..."
Abstract

Cited by 13 (3 self)
 Add to MetaCart
this paper a regularization approach to variable selection, where the regularization term (gaussian and gaussian mixture priors) allows to discard, during training, least useful variables, and we compare performances to the stepwise procedure. When implementing these different methods, one has to decide when to start (pruning, or regularizing) and how much (how many weights, or with how large a regularizing factor). Techniques such as OCD require that the optimum be reached. However, extensive evidence [Finnoff et al., 93] shows that it is more efficient to use "non convergent methods", where modification is started before full convergence is reached. We thus also compare our regularization approach to the pruning technique described in [Cibas et al., 94]. We illustrate our method on two relatively small problems: prediction of a synthetic time series and classification of waveforms [Breiman et al., 84]. The paper is organized as follows: section 2 introduces notations and results from the literature; section 3 problems used to test our methods and section 4 variable selection by regularization. 2. Variable Selection 2.1. Definitions Let a random variable pair (X, Y) N x p be given, with probability distribution P. Based on a sample D m ={(x 1 ,y 1 )...(x m ,y m )}, drawn from (X, Y), we train a NN a, and produce an estimator F , which depends upon a and D m . Different nets may have different performances: here we select nets depending upon their respective empirical errors. In this paper, we chiefly compare nets differing only in their input dimension: some components of the original input X N are eliminated to produce a vector x
Selected Training Exemplars for Neural Network Learning
, 1994
"... The dissertation of Mark Plutowski is approved, and it is acceptable in quality and form for publication on microfilm: CoChair CoChair ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
The dissertation of Mark Plutowski is approved, and it is acceptable in quality and form for publication on microfilm: CoChair CoChair
LinearLeastSquares Initialization of Multilayer Perceptrons Through Backpropagation of the Desired Response
"... Abstract—Training multilayer neural networks is typically carried out using descent techniques such as the gradientbased backpropagation (BP) of error or the quasiNewton approaches including the Levenberg–Marquardt algorithm. This is basically due to the fact that there are no analytical methods t ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
Abstract—Training multilayer neural networks is typically carried out using descent techniques such as the gradientbased backpropagation (BP) of error or the quasiNewton approaches including the Levenberg–Marquardt algorithm. This is basically due to the fact that there are no analytical methods to find the optimal weights, so iterative local or global optimization techniques are necessary. The success of iterative optimization procedures is strictly dependent on the initial conditions, therefore, in this paper, we devise a principled novel method of backpropagating the desired response through the layers of a multilayer perceptron (MLP), which enables us to accurately initialize these neural networks in the minimum meansquareerror sense, using the analytic linear least squares solution. The generated solution can be used as an initial condition to standard iterative optimization algorithms. However, simulations demonstrate that in most cases, the performance achieved through the proposed initialization scheme leaves little room for further improvement in the meansquareerror (MSE) over the training set. In addition, the performance of the network optimized with the proposed approach also generalizes well to testing data. A rigorous derivation of the initialization algorithm is presented and its high performance is verified with a number of benchmark training problems including chaotic timeseries prediction, classification, and nonlinear system identification with MLPs. Index Terms—Approximate leastsquares training of multilayer perceptrons (MLPs), backpropagation (BP) of desired response, neural network initialization. I.
Another hybrid algorithm for finding a global mimimum of MLP error functions
, 1996
"... This report presents P scg , a new global optimization method for training multilayered perceptrons. Instead of local minima, global minima of the error function are found. This new method is hybrid in the sense that it combines three very different optimization techniques: Random Line Search, S ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
This report presents P scg , a new global optimization method for training multilayered perceptrons. Instead of local minima, global minima of the error function are found. This new method is hybrid in the sense that it combines three very different optimization techniques: Random Line Search, Scaled Conjugate Gradient and a 1dimensional minimization algorithm named P . The best points of each component are retained by the hybrid method: simplicity of Random Line Search, efficiency of Scaled Conjugate Gradient, efficiency and convergence toward a global minimum for P . P scg is empirically shown to perform better or much better than three other global random optimization methods and a global deterministic optimization method. The aim of this research is to provide easytouse learning methods for several research projects; in particular these methods will be employed by knowledgebased systems. Changes from previous version of January 19, 1996 : in sections 7.4.2 and 7.4...
Neural networks for financial time series prediction: Overview over recent research
, 2002
"... ..."
Multiple Multivariate Regression And Global Sequence Optimization: An Application To Large Scale Models Of Radiation Intensity.
"... We investigate the strengths and weaknesses of several neural network architectures for a largescale thermodynamical application in which sequences of measurements from gas columns must be integrated to construct the columns` spectral radiation intensity profiles. This is a problem of interest for t ..."
Abstract
 Add to MetaCart
We investigate the strengths and weaknesses of several neural network architectures for a largescale thermodynamical application in which sequences of measurements from gas columns must be integrated to construct the columns` spectral radiation intensity profiles. This is a problem of interest for the aeronautical industry. The approaches proposed for its solution can be applied to a wide range of signal problems. Physical models often make use of a number of fitted functions as a simplified parametric base to approximate a highdimensional nonlinear (and usually computationally intractable) function. Realistically models of radiation contain thousands of fitted functions. The use of Neural Networks in applications of this scale are rare, and most effective conjunctions techniques rely on crossvalidation methods or involve other heavy computational overhead that are impracticable when a very large number of models need to be trained. We have employed here two different approaches: mul...
Contents lists available at ScienceDirect Neural Networks
"... journal homepage: www.elsevier.com/locate/neunet ..."