Results 1 - 10
of
13
Gradient-based learning applied to document recognition
- Proceedings of the IEEE
, 1998
"... Multilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradientbased learning technique. Given an appropriate network architecture, gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify hi ..."
Abstract
-
Cited by 487 (38 self)
- Add to MetaCart
Multilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradientbased learning technique. Given an appropriate network architecture, gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters, with minimal preprocessing. This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task. Convolutional neural networks, which are specifically designed to deal with the variability of two dimensional (2-D) shapes, are shown to outperform all other techniques. Real-life document recognition systems are composed of multiple modules including field extraction, segmentation, recognition, and language modeling. A new learning paradigm, called graph transformer networks (GTN’s), allows such multimodule systems to be trained globally using gradient-based methods so as to minimize an overall performance measure. Two systems for online handwriting recognition are described. Experiments demonstrate the advantage of global training, and the flexibility of graph transformer networks. A graph transformer network for reading a bank check is also described. It uses convolutional neural network character recognizers combined with global training techniques to provide record accuracy on business and personal checks. It is deployed commercially and reads several million checks per day.
Simple statistical gradient-following algorithms for connectionist reinforcement learning
- Machine Learning
, 1992
"... Abstract. This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are shown to make weight adjustments in a direction that lies along the gradient of expected reinfor ..."
Abstract
-
Cited by 262 (0 self)
- Add to MetaCart
Abstract. This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reinforcement tasks, and they do this without explicitly computing gradient estimates or even storing information from which such estimates could be computed. Specific examples of such algorithms are presented, some of which bear a close relationship to certain existing algorithms while others are novel but potentially interesting in their own right. Also given are results that show how such algorithms can be naturally integrated with backpropagation. We close with a brief discussion of a number of additional issues surrounding the use of such algorithms, including what is known about their limiting behaviors as well as further considerations that might be used to help develop similar but potentially more powerful reinforcement learning algorithms.
Deep Dyslexia: A Case Study of Connectionist Neuropsychology
, 1993
"... Deep dyslexia is an acquired reading disorder marked by the occurrence of semantic errors (e.g., reading RIVER as "ocean"). In addition, patients exhibit a number of other symptoms, including visual and morphological effects in their errors, a part-of-speech effect, and an advantage for concrete ove ..."
Abstract
-
Cited by 110 (25 self)
- Add to MetaCart
Deep dyslexia is an acquired reading disorder marked by the occurrence of semantic errors (e.g., reading RIVER as "ocean"). In addition, patients exhibit a number of other symptoms, including visual and morphological effects in their errors, a part-of-speech effect, and an advantage for concrete over abstract words. Deep dyslexia poses a distinct challenge for cognitive neuropsychology because there is little understanding of why such a variety of symptoms should co-occur in virtually all known patients. Hinton and Shallice (1991) replicated the co-occurrence of visual and semantic errors by lesioning a recurrent connectionist network trained to map from orthography to semantics. While the success of their simulations is encouraging, there is little understanding of what underlying principles are responsible for them. In this paper we evaluate and, where possible, improve on the most important design decisions made by Hinton and Shallice, relating to the task, the network architecture, the training procedure, and the testing procedure. We identify four properties of networks that underly their ability to reproduce the deep dyslexic symptom-complex: distributed orthographic and semantic representations, gradient descent learning, attractors for word meanings, and greater richness of concrete vs. abstract semantics. The first three of these are general connectionist principles and the last is based on earlier theorizing. Taken together, the results demonstrate the usefulness of a connectionist approach to understanding deep dyslexia in particular, and the viability of connectionist neuropsychology in general.
Discovering Neural Nets With Low Kolmogorov Complexity And High Generalization Capability
- Neural Networks
, 1997
"... Many neural net learning algorithms aim at finding "simple" nets to explain training data. The expectation is: the "simpler" the networks, the better the generalization on test data (! Occam's razor). Previous implementations, however, use measures for "simplicity" that lack the power, universali ..."
Abstract
-
Cited by 41 (23 self)
- Add to MetaCart
Many neural net learning algorithms aim at finding "simple" nets to explain training data. The expectation is: the "simpler" the networks, the better the generalization on test data (! Occam's razor). Previous implementations, however, use measures for "simplicity" that lack the power, universality and elegance of those based on Kolmogorov complexity and Solomonoff's algorithmic probability. Likewise, most previous approaches (especially those of the "Bayesian" kind) suffer from the problem of choosing appropriate priors. This paper addresses both issues. It first reviews some basic concepts of algorithmic complexity theory relevant to machine learning, and how the Solomonoff-Levin distribution (or universal prior) deals with the prior problem. The universal prior leads to a probabilistic method for finding "algorithmically simple" problem solutions with high generalization capability. The method is based on Levin complexity (a time-bounded generalization of Kolmogorov comple...
Continuous-Time Temporal Back-Propagation with Adaptable Time Delays
, 1992
"... This paper extends back-propagation to continuous-time feed-forward networks with internal, adaptable time delays. The new technique is suitable for parallel hardware implementation, with continuous multidimensional training signals. The resulting networks can be used for signal prediction, signal p ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
This paper extends back-propagation to continuous-time feed-forward networks with internal, adaptable time delays. The new technique is suitable for parallel hardware implementation, with continuous multidimensional training signals. The resulting networks can be used for signal prediction, signal production, and spatiotemporal pattern recognition tasks. Unlike conventional back-propagation networks, they can easily adapt while performing true signal prediction. We present simulation results for networks trained to predict future values of the Mackey-Glass chaotic signal, using its present value as an input. For this application, networks with adaptable delays had less than half the prediction error of networks with fixed delays, and about one-quarter the error of conventional networks. After training, the network can be operated in a signal production configuration, where it autonomously generates a close approximation to the Mackey-Glass signal. 1 This work was supported by the Natu...
Discovering Problem Solutions with Low Kolmogorov Complexity and High Generalization Capability
- MACHINE LEARNING: PROCEEDINGS OF THE TWELFTH INTERNATIONAL CONFERENCE
, 1994
"... Many machine learning algorithms aim at finding "simple" rules to explain training data. The expectation is: the "simpler" the rules, the better the generalization on test data (! Occam's razor). Most practical implementations, however, use measures for "simplicity" that lack the power, universality ..."
Abstract
-
Cited by 17 (9 self)
- Add to MetaCart
Many machine learning algorithms aim at finding "simple" rules to explain training data. The expectation is: the "simpler" the rules, the better the generalization on test data (! Occam's razor). Most practical implementations, however, use measures for "simplicity" that lack the power, universality and elegance of those based on Kolmogorov complexity and Solomonoff's algorithmic probability. Likewise, most previous approaches (especially those of the "Bayesian" kind) suffer from the problem of choosing appropriate priors. This paper addresses both issues. It first reviews some basic concepts of algorithmic complexity theory relevant to machine learning, and how the Solomonoff-Levin distribution (or universal prior) deals with the prior problem. The universal prior leads to a probabilistic method for finding "algorithmically simple" problem solutions with high generalization capability. The method is based on Levin complexity (a time-bounded generalization of Kolmogorov complexity) and...
Discovering Predictable Classifications
- Neural Computation
, 1992
"... Prediction problems are among the most common learning problems for neural networks (e.g. in the context of time series prediction, control, etc.). With many such problems, however, perfect prediction is inherently impossible. For such cases we present novel unsupervised systems that learn to clas ..."
Abstract
-
Cited by 17 (9 self)
- Add to MetaCart
Prediction problems are among the most common learning problems for neural networks (e.g. in the context of time series prediction, control, etc.). With many such problems, however, perfect prediction is inherently impossible. For such cases we present novel unsupervised systems that learn to classify patterns such that the classifications are predictable while still being as specific as possible. The approach can be related to the IMAX method of Hinton, Becker and Zemel (1989, 1991). Experiments include Becker's and Hinton's stereo task, which can be solved more readily by our system. 1 1 MOTIVATION AND BASIC APPROACH Many neural net systems (e.g. for control, time series prediction, etc.) rely on adaptive submodules for learning to predict patterns from other patterns. Perfect prediction, however, is often inherently impossible. In this paper we study the problem of finding pattern classifications such that the classes are predictable, while still being as specific as possibl...
Accelerated Learning in Back-Propagation Nets
, 1989
"... Two of the most serious problems with back-propagation (bp) (Werbos, 1974)(Parker, 1985)(Rumelhart et al., 1986)(Almeida, 1987) are insufficient speed and the danger of getting stuck in local minima. We offer an approach to cope with both of these problems: Instead of using bp to find zero-points of ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Two of the most serious problems with back-propagation (bp) (Werbos, 1974)(Parker, 1985)(Rumelhart et al., 1986)(Almeida, 1987) are insufficient speed and the danger of getting stuck in local minima. We offer an approach to cope with both of these problems: Instead of using bp to find zero-points of the gradient of the error-surface we are looking for zero-points of the error-surface itself. This can be done with less computational effort than there is in second order methods. Experimental results indicate that in cases where only a small fraction of units is active simultaneously (sparse coding), this method can be applied successfully. Furthermore it can be significantly faster than conventional bp. Keywords: Back-propagation, sparse coding, speed, learning rate, local minima. 1 The Method Numerous gradient descent methods for adjusting weights in neural nets are described in the literature (see e.g. articles by Parker, Dahl, and Watrous in IEEE 1st Int. Conf. on Neural Networks, V...
Planning Simple Trajectories Using Neural Subgoal Generators
- From Animals to Animats 2: Proceedings of the Second International Conference on Simulation of Adaptive Behavior
, 1992
"... We consider the problem of reaching a given goal state from a given start state by letting an `animat' produce a sequence of actions in an environment with multiple obstacles. Simple trajectory planning tasks are solved with the help of `neural' gradient-based algorithms for learning without a teac ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
We consider the problem of reaching a given goal state from a given start state by letting an `animat' produce a sequence of actions in an environment with multiple obstacles. Simple trajectory planning tasks are solved with the help of `neural' gradient-based algorithms for learning without a teacher to generate sequences of appropriate subgoals in response to novel start/goal combinations. Relevant topic areas: Problem solving and planning, goal-directed behavior, action selection and behavioral sequences, hierarchical and parallel organizations, neural correlates of behavior, perception and motor control. 1 INTRODUCTION Many researchers in neuro-control and reinforcement learning believe that some `compositional' method for learning to reach new goals by combining familiar action sequences into more complex new action sequences is necessary to overcome scaling problems associated with non-compositional algorithms. The few previous ideas for attacking `compositional neural seque...
A Hypothesis-driven Constructive Induction Approach to Expanding Neural Networks
- Proceedings of ML-COLT'94
, 1994
"... With most machine learning methods, if the given knowledge representation space is inadequate then the learning process will fail. This is also true with methods using neural networks as the form of the representation space. To overcome this limitation, an automatic construction method for a neural ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
With most machine learning methods, if the given knowledge representation space is inadequate then the learning process will fail. This is also true with methods using neural networks as the form of the representation space. To overcome this limitation, an automatic construction method for a neural network is proposed. This paper describes the BP-HCI method for a hypothesis-driven constructive induction in a neural network trained by the backpropagation algorithm. The method searches for a better representation space by analyzing the hypotheses generated in each step of an iterative learning process. The method was applied to ten problems, which include, in particular, exclusiveor, MONK2, parity-6BIT and inverse parity-6BIT problems. All problems were successfully solved with the same initial set of parameters; the extension of representation space was no more than necessary extension for each problem. 1 INTRODUCTION Most research on inductive learning from examples has been concerne...

