Results 1  10
of
637
Gradientbased learning applied to document recognition
 Proceedings of the IEEE
, 1998
"... Multilayer neural networks trained with the backpropagation algorithm constitute the best example of a successful gradientbased learning technique. Given an appropriate network architecture, gradientbased learning algorithms can be used to synthesize a complex decision surface that can classify hi ..."
Abstract

Cited by 731 (58 self)
 Add to MetaCart
Multilayer neural networks trained with the backpropagation algorithm constitute the best example of a successful gradientbased learning technique. Given an appropriate network architecture, gradientbased learning algorithms can be used to synthesize a complex decision surface that can classify highdimensional patterns, such as handwritten characters, with minimal preprocessing. This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task. Convolutional neural networks, which are specifically designed to deal with the variability of two dimensional (2D) shapes, are shown to outperform all other techniques. Reallife document recognition systems are composed of multiple modules including field extraction, segmentation, recognition, and language modeling. A new learning paradigm, called graph transformer networks (GTN’s), allows such multimodule systems to be trained globally using gradientbased methods so as to minimize an overall performance measure. Two systems for online handwriting recognition are described. Experiments demonstrate the advantage of global training, and the flexibility of graph transformer networks. A graph transformer network for reading a bank check is also described. It uses convolutional neural network character recognizers combined with global training techniques to provide record accuracy on business and personal checks. It is deployed commercially and reads several million checks per day.
Regularization Theory and Neural Networks Architectures
 Neural Computation
, 1995
"... We had previously shown that regularization principles lead to approximation schemes which are equivalent to networks with one layer of hidden units, called Regularization Networks. In particular, standard smoothness functionals lead to a subclass of regularization networks, the well known Radial Ba ..."
Abstract

Cited by 309 (31 self)
 Add to MetaCart
We had previously shown that regularization principles lead to approximation schemes which are equivalent to networks with one layer of hidden units, called Regularization Networks. In particular, standard smoothness functionals lead to a subclass of regularization networks, the well known Radial Basis Functions approximation schemes. This paper shows that regularization networks encompass a much broader range of approximation schemes, including many of the popular general additive models and some of the neural networks. In particular, we introduce new classes of smoothness functionals that lead to different classes of basis functions. Additive splines as well as some tensor product splines can be obtained from appropriate classes of smoothness functionals. Furthermore, the same generalization that extends Radial Basis Functions (RBF) to Hyper Basis Functions (HBF) also leads from additive models to ridge approximation models, containing as special cases Breiman's hinge functions, som...
A Theory of Networks for Approximation and Learning
 Laboratory, Massachusetts Institute of Technology
, 1989
"... Learning an inputoutput mapping from a set of examples, of the type that many neural networks have been constructed to perform, can be regarded as synthesizing an approximation of a multidimensional function, that is solving the problem of hypersurface reconstruction. From this point of view, t ..."
Abstract

Cited by 194 (24 self)
 Add to MetaCart
Learning an inputoutput mapping from a set of examples, of the type that many neural networks have been constructed to perform, can be regarded as synthesizing an approximation of a multidimensional function, that is solving the problem of hypersurface reconstruction. From this point of view, this form of learning is closely related to classical approximation techniques, such as generalized splines and regularization theory. This paper considers the problems of an exact representation and, in more detail, of the approximation of linear and nonlinear mappings in terms of simpler functions of fewer variables. Kolmogorov's theorem concerning the representation of functions of several variables in terms of functions of one variable turns out to be almost irrelevant in the context of networks for learning. Wedevelop a theoretical framework for approximation based on regularization techniques that leads to a class of threelayer networks that we call Generalized Radial Basis Functions (GRBF), since they are mathematically related to the wellknown Radial Basis Functions, mainly used for strict interpolation tasks. GRBF networks are not only equivalent to generalized splines, but are also closely related to pattern recognition methods suchasParzen windows and potential functions and to several neural network algorithms, suchas Kanerva's associative memory,backpropagation and Kohonen's topology preserving map. They also haveaninteresting interpretation in terms of prototypes that are synthesized and optimally combined during the learning stage. The paper introduces several extensions and applications of the technique and discusses intriguing analogies with neurobiological data.
Competitive Coevolution through Evolutionary Complexification
 Journal of Artificial Intelligence Research
, 2002
"... Two major goals in machine learning are the discovery of complex multidimensional solutions and continual improvement of existing solutions. In this paper, we argue that complexification, i.e. the incremental elaboration of solutions through adding new structure, achieves both these goals. We demons ..."
Abstract

Cited by 149 (57 self)
 Add to MetaCart
Two major goals in machine learning are the discovery of complex multidimensional solutions and continual improvement of existing solutions. In this paper, we argue that complexification, i.e. the incremental elaboration of solutions through adding new structure, achieves both these goals. We demonstrate the power of complexification through the NeuroEvolution of Augmenting Topologies (NEAT) method, which evolves increasingly complex neural network architectures. NEAT is applied to an openended coevolutionary robot duel domain where robot controllers compete head to head. Because the robot duel domain supports a wide range of sophisticated strategies, and because coevolution benefits from an escalating arms race, it serves as a suitable testbed for observing the effect of evolving increasingly complex controllers. The result is an arms race of increasingly sophisticated strategies. When compared to the evolution of networks with fixed structure, complexifying networks discover significantly more sophisticated strategies. The results suggest that in order to realize the full potential of evolution, and search in general, solutions must be allowed to complexify as well as optimize.
Nonlinear BlackBox Modeling in System Identification: a Unified Overview
 Automatica
, 1995
"... A nonlinear black box structure for a dynamical system is a model structure that is prepared to describe virtually any nonlinear dynamics. There has been considerable recent interest in this area with structures based on neural networks, radial basis networks, wavelet networks, hinging hyperplanes, ..."
Abstract

Cited by 136 (15 self)
 Add to MetaCart
A nonlinear black box structure for a dynamical system is a model structure that is prepared to describe virtually any nonlinear dynamics. There has been considerable recent interest in this area with structures based on neural networks, radial basis networks, wavelet networks, hinging hyperplanes, as well as wavelet transform based methods and models based on fuzzy sets and fuzzy rules. This paper describes all these approaches in a common framework, from a user's perspective. It focuses on what are the common features in the different approaches, the choices that have to be made and what considerations are relevant for a successful system identification application of these techniques. It is pointed out that the nonlinear structures can be seen as a concatenation of a mapping from observed data to a regression vector and a nonlinear mapping from the regressor space to the output space. These mappings are discussed separately. The latter mapping is usually formed as a basis function e...
The neural basis of cognitive development: A constructivist manifesto
 Behavioral and Brain Sciences
, 1997
"... Quartz, S. & Sejnowski, T.J. (1997). The neural basis of cognitive development: A constructivist manifesto. ..."
Abstract

Cited by 128 (2 self)
 Add to MetaCart
Quartz, S. & Sejnowski, T.J. (1997). The neural basis of cognitive development: A constructivist manifesto.
Multilayer Feedforward Networks With a Nonpolynomial Activation Function Can Approximate Any Function
, 1993
"... Several researchers characterized the activation fimction under which multilayer feedforward networks can act as universal approximators. We show that most of all the characterizations that were reported thus far in the literature are special cases of the following general result: A standard multila ..."
Abstract

Cited by 117 (2 self)
 Add to MetaCart
Several researchers characterized the activation fimction under which multilayer feedforward networks can act as universal approximators. We show that most of all the characterizations that were reported thus far in the literature are special cases of the following general result: A standard multilayer feedforward network with a locally bounded piecewise continuous activation fimction can approximate an3, continuous function to any degree of accuracy if and only if the network's activation function is not a polynomial. We also emphasize the important role of the threshold, asserting that without it the last theorem does not hold.
Ridgelets: A key to higherdimensional intermittency?
, 1999
"... In dimensions two and higher, wavelets can efficiently represent only a small range of the full diversity of interesting behavior. In effect, wavelets are welladapted for pointlike phenomena, whereas in dimensions greater than one, interesting phenomena can be organized along lines, hyperplanes, and ..."
Abstract

Cited by 112 (10 self)
 Add to MetaCart
In dimensions two and higher, wavelets can efficiently represent only a small range of the full diversity of interesting behavior. In effect, wavelets are welladapted for pointlike phenomena, whereas in dimensions greater than one, interesting phenomena can be organized along lines, hyperplanes, and other nonpointlike structures, for which wavelets are poorly adapted. We discuss in this paper a new subject, ridgelet analysis, which can effectively deal with linelike phenomena in dimension 2, planelike phenomena in dimension 3 and so on. It encompasses a collection of tools which all begin from the idea of analysis by ridge functions ψ(u1x1+...+unxn) whose ridge profiles ψ are wavelets, or alternatively from performing a wavelet analysis in the Radon domain. The paper reviews recent work on the continuous ridgelet transform (CRT), ridgelet frames, ridgelet orthonormal bases, ridgelets and edges and describes a new notion of smoothness naturally attached to this new representation.
A nonparametric approach to pricing and hedging derivative securities via learning networks
 Journal of Finance
, 1994
"... http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, noncom ..."
Abstract

Cited by 104 (4 self)
 Add to MetaCart
http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, noncommercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
Networks and the Best Approximation Property
 Biological Cybernetics
, 1989
"... Networks can be considered as approximation schemes. Multilayer networks of the backpropagation type can approximate arbitrarily well continuous functions (Cybenko, 1989# Funahashi, 1989# Stinchcombe and White, 1989). Weprovethatnetworks derived from regularization theory and including Radial Bas ..."
Abstract

Cited by 95 (7 self)
 Add to MetaCart
Networks can be considered as approximation schemes. Multilayer networks of the backpropagation type can approximate arbitrarily well continuous functions (Cybenko, 1989# Funahashi, 1989# Stinchcombe and White, 1989). Weprovethatnetworks derived from regularization theory and including Radial Basis Functions (Poggio and Girosi, 1989), have a similar property.From the point of view of approximation theory, however, the property of approximating continuous functions arbitrarily well is not sufficientforcharacterizing good approximation schemes. More critical is the property of best approximation. The main result of this paper is that multilayer networks, of the type used in backpropagation, are not best approximation. For regularization networks (in particular Radial Basis Function networks) we prove existence and uniqueness of best approximation.