Results 1  10
of
16
Gradientbased learning applied to document recognition
 Proceedings of the IEEE
, 1998
"... Multilayer neural networks trained with the backpropagation algorithm constitute the best example of a successful gradientbased learning technique. Given an appropriate network architecture, gradientbased learning algorithms can be used to synthesize a complex decision surface that can classify hi ..."
Abstract

Cited by 727 (58 self)
 Add to MetaCart
Multilayer neural networks trained with the backpropagation algorithm constitute the best example of a successful gradientbased learning technique. Given an appropriate network architecture, gradientbased learning algorithms can be used to synthesize a complex decision surface that can classify highdimensional patterns, such as handwritten characters, with minimal preprocessing. This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task. Convolutional neural networks, which are specifically designed to deal with the variability of two dimensional (2D) shapes, are shown to outperform all other techniques. Reallife document recognition systems are composed of multiple modules including field extraction, segmentation, recognition, and language modeling. A new learning paradigm, called graph transformer networks (GTN’s), allows such multimodule systems to be trained globally using gradientbased methods so as to minimize an overall performance measure. Two systems for online handwriting recognition are described. Experiments demonstrate the advantage of global training, and the flexibility of graph transformer networks. A graph transformer network for reading a bank check is also described. It uses convolutional neural network character recognizers combined with global training techniques to provide record accuracy on business and personal checks. It is deployed commercially and reads several million checks per day.
Simple statistical gradientfollowing algorithms for connectionist reinforcement learning
 Machine Learning
, 1992
"... Abstract. This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are shown to make weight adjustments in a direction that lies along the gradient of expected reinfor ..."
Abstract

Cited by 318 (0 self)
 Add to MetaCart
Abstract. This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediatereinforcement tasks and certain limited forms of delayedreinforcement tasks, and they do this without explicitly computing gradient estimates or even storing information from which such estimates could be computed. Specific examples of such algorithms are presented, some of which bear a close relationship to certain existing algorithms while others are novel but potentially interesting in their own right. Also given are results that show how such algorithms can be naturally integrated with backpropagation. We close with a brief discussion of a number of additional issues surrounding the use of such algorithms, including what is known about their limiting behaviors as well as further considerations that might be used to help develop similar but potentially more powerful reinforcement learning algorithms.
Deep Dyslexia: A Case Study of Connectionist Neuropsychology
, 1993
"... Deep dyslexia is an acquired reading disorder marked by the occurrence of semantic errors (e.g., reading RIVER as "ocean"). In addition, patients exhibit a number of other symptoms, including visual and morphological effects in their errors, a partofspeech effect, and an advantage for concrete ove ..."
Abstract

Cited by 138 (27 self)
 Add to MetaCart
Deep dyslexia is an acquired reading disorder marked by the occurrence of semantic errors (e.g., reading RIVER as "ocean"). In addition, patients exhibit a number of other symptoms, including visual and morphological effects in their errors, a partofspeech effect, and an advantage for concrete over abstract words. Deep dyslexia poses a distinct challenge for cognitive neuropsychology because there is little understanding of why such a variety of symptoms should cooccur in virtually all known patients. Hinton and Shallice (1991) replicated the cooccurrence of visual and semantic errors by lesioning a recurrent connectionist network trained to map from orthography to semantics. While the success of their simulations is encouraging, there is little understanding of what underlying principles are responsible for them. In this paper we evaluate and, where possible, improve on the most important design decisions made by Hinton and Shallice, relating to the task, the network architecture, the training procedure, and the testing procedure. We identify four properties of networks that underly their ability to reproduce the deep dyslexic symptomcomplex: distributed orthographic and semantic representations, gradient descent learning, attractors for word meanings, and greater richness of concrete vs. abstract semantics. The first three of these are general connectionist principles and the last is based on earlier theorizing. Taken together, the results demonstrate the usefulness of a connectionist approach to understanding deep dyslexia in particular, and the viability of connectionist neuropsychology in general.
Discovering Neural Nets With Low Kolmogorov Complexity And High Generalization Capability
 Neural Networks
, 1997
"... Many neural net learning algorithms aim at finding "simple" nets to explain training data. The expectation is: the "simpler" the networks, the better the generalization on test data (! Occam's razor). Previous implementations, however, use measures for "simplicity" that lack the power, universali ..."
Abstract

Cited by 49 (30 self)
 Add to MetaCart
Many neural net learning algorithms aim at finding "simple" nets to explain training data. The expectation is: the "simpler" the networks, the better the generalization on test data (! Occam's razor). Previous implementations, however, use measures for "simplicity" that lack the power, universality and elegance of those based on Kolmogorov complexity and Solomonoff's algorithmic probability. Likewise, most previous approaches (especially those of the "Bayesian" kind) suffer from the problem of choosing appropriate priors. This paper addresses both issues. It first reviews some basic concepts of algorithmic complexity theory relevant to machine learning, and how the SolomonoffLevin distribution (or universal prior) deals with the prior problem. The universal prior leads to a probabilistic method for finding "algorithmically simple" problem solutions with high generalization capability. The method is based on Levin complexity (a timebounded generalization of Kolmogorov comple...
ContinuousTime Temporal BackPropagation with Adaptable Time Delays
, 1992
"... This paper extends backpropagation to continuoustime feedforward networks with internal, adaptable time delays. The new technique is suitable for parallel hardware implementation, with continuous multidimensional training signals. The resulting networks can be used for signal prediction, signal p ..."
Abstract

Cited by 19 (0 self)
 Add to MetaCart
This paper extends backpropagation to continuoustime feedforward networks with internal, adaptable time delays. The new technique is suitable for parallel hardware implementation, with continuous multidimensional training signals. The resulting networks can be used for signal prediction, signal production, and spatiotemporal pattern recognition tasks. Unlike conventional backpropagation networks, they can easily adapt while performing true signal prediction. We present simulation results for networks trained to predict future values of the MackeyGlass chaotic signal, using its present value as an input. For this application, networks with adaptable delays had less than half the prediction error of networks with fixed delays, and about onequarter the error of conventional networks. After training, the network can be operated in a signal production configuration, where it autonomously generates a close approximation to the MackeyGlass signal. 1 This work was supported by the Natu...
Discovering Predictable Classifications
 Neural Computation
, 1992
"... Prediction problems are among the most common learning problems for neural networks (e.g. in the context of time series prediction, control, etc.). With many such problems, however, perfect prediction is inherently impossible. For such cases we present novel unsupervised systems that learn to clas ..."
Abstract

Cited by 18 (9 self)
 Add to MetaCart
Prediction problems are among the most common learning problems for neural networks (e.g. in the context of time series prediction, control, etc.). With many such problems, however, perfect prediction is inherently impossible. For such cases we present novel unsupervised systems that learn to classify patterns such that the classifications are predictable while still being as specific as possible. The approach can be related to the IMAX method of Hinton, Becker and Zemel (1989, 1991). Experiments include Becker's and Hinton's stereo task, which can be solved more readily by our system. 1 1 MOTIVATION AND BASIC APPROACH Many neural net systems (e.g. for control, time series prediction, etc.) rely on adaptive submodules for learning to predict patterns from other patterns. Perfect prediction, however, is often inherently impossible. In this paper we study the problem of finding pattern classifications such that the classes are predictable, while still being as specific as possibl...
Discovering Problem Solutions with Low Kolmogorov Complexity and High Generalization Capability
 MACHINE LEARNING: PROCEEDINGS OF THE TWELFTH INTERNATIONAL CONFERENCE
, 1994
"... Many machine learning algorithms aim at finding "simple" rules to explain training data. The expectation is: the "simpler" the rules, the better the generalization on test data (! Occam's razor). Most practical implementations, however, use measures for "simplicity" that lack the power, universality ..."
Abstract

Cited by 16 (8 self)
 Add to MetaCart
Many machine learning algorithms aim at finding "simple" rules to explain training data. The expectation is: the "simpler" the rules, the better the generalization on test data (! Occam's razor). Most practical implementations, however, use measures for "simplicity" that lack the power, universality and elegance of those based on Kolmogorov complexity and Solomonoff's algorithmic probability. Likewise, most previous approaches (especially those of the "Bayesian" kind) suffer from the problem of choosing appropriate priors. This paper addresses both issues. It first reviews some basic concepts of algorithmic complexity theory relevant to machine learning, and how the SolomonoffLevin distribution (or universal prior) deals with the prior problem. The universal prior leads to a probabilistic method for finding "algorithmically simple" problem solutions with high generalization capability. The method is based on Levin complexity (a timebounded generalization of Kolmogorov complexity) and...
Learning Algorithms for Networks with Internal and External Feedback
 IN D. S. TOURETZKY , J. L. ELMAN , T. J. SEJNOWSKI , G. E. HINTON , PROC OF THE CONNECTIONIST MODELS SUMMER SCHOOL, PAGES 5261. SAN MATEO, CA: MORGAN KAUFMANN, 1990.
, 1990
"... This paper gives an overview of some novel algorithms for reinforcement learning in nonstationary possibly reactive environments. I have decided to describe many ideas briefly rather than going into great detail on any one idea. The paper is structured as follows: In the first section some terminolo ..."
Abstract

Cited by 10 (8 self)
 Add to MetaCart
This paper gives an overview of some novel algorithms for reinforcement learning in nonstationary possibly reactive environments. I have decided to describe many ideas briefly rather than going into great detail on any one idea. The paper is structured as follows: In the first section some terminology is introduced. Then there follow five sections, each headed by a short abstract. The second section describes the entirely local `neural bucket brigade algorithm'. The third section applies Sutton's TDmethods to fully recurrent continually running probabilistic networks. The fourth section describes an algorithm based on system identification and on two interacting fully recurrent `selfsupervised' learning networks. The fifth section describes an application of adaptive control techniques to adaptive attentive vision: It demonstrates how `selective attention' can be learned. Finally, the sixth section critisizes methods based on system identification and adaptive critics, and describes ...
Accelerated Learning in BackPropagation Nets
, 1989
"... Two of the most serious problems with backpropagation (bp) (Werbos, 1974)(Parker, 1985)(Rumelhart et al., 1986)(Almeida, 1987) are insufficient speed and the danger of getting stuck in local minima. We offer an approach to cope with both of these problems: Instead of using bp to find zeropoints of ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
Two of the most serious problems with backpropagation (bp) (Werbos, 1974)(Parker, 1985)(Rumelhart et al., 1986)(Almeida, 1987) are insufficient speed and the danger of getting stuck in local minima. We offer an approach to cope with both of these problems: Instead of using bp to find zeropoints of the gradient of the errorsurface we are looking for zeropoints of the errorsurface itself. This can be done with less computational effort than there is in second order methods. Experimental results indicate that in cases where only a small fraction of units is active simultaneously (sparse coding), this method can be applied successfully. Furthermore it can be significantly faster than conventional bp. Keywords: Backpropagation, sparse coding, speed, learning rate, local minima. 1 The Method Numerous gradient descent methods for adjusting weights in neural nets are described in the literature (see e.g. articles by Parker, Dahl, and Watrous in IEEE 1st Int. Conf. on Neural Networks, V...
Planning Simple Trajectories Using Neural Subgoal Generators
 From Animals to Animats 2: Proceedings of the Second International Conference on Simulation of Adaptive Behavior
, 1992
"... We consider the problem of reaching a given goal state from a given start state by letting an `animat' produce a sequence of actions in an environment with multiple obstacles. Simple trajectory planning tasks are solved with the help of `neural' gradientbased algorithms for learning without a teac ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
We consider the problem of reaching a given goal state from a given start state by letting an `animat' produce a sequence of actions in an environment with multiple obstacles. Simple trajectory planning tasks are solved with the help of `neural' gradientbased algorithms for learning without a teacher to generate sequences of appropriate subgoals in response to novel start/goal combinations. Relevant topic areas: Problem solving and planning, goaldirected behavior, action selection and behavioral sequences, hierarchical and parallel organizations, neural correlates of behavior, perception and motor control. 1 INTRODUCTION Many researchers in neurocontrol and reinforcement learning believe that some `compositional' method for learning to reach new goals by combining familiar action sequences into more complex new action sequences is necessary to overcome scaling problems associated with noncompositional algorithms. The few previous ideas for attacking `compositional neural seque...