Results 1 - 10
of
21
Gradient calculation for dynamic recurrent neural networks: a survey
- IEEE Transactions on Neural Networks
, 1995
"... Abstract | We survey learning algorithms for recurrent neural networks with hidden units, and put the various techniques into a common framework. We discuss xedpoint learning algorithms, namely recurrent backpropagation and deterministic Boltzmann Machines, and non- xedpoint algorithms, namely backp ..."
Abstract
-
Cited by 119 (1 self)
- Add to MetaCart
Abstract | We survey learning algorithms for recurrent neural networks with hidden units, and put the various techniques into a common framework. We discuss xedpoint learning algorithms, namely recurrent backpropagation and deterministic Boltzmann Machines, and non- xedpoint algorithms, namely backpropagation through time, Elman's history cuto, and Jordan's output feedback architecture. Forward propagation, an online technique that uses adjoint equations, and variations thereof, are also discussed. In many cases, the uni ed presentation leads to generalizations of various sorts. We discuss advantages and disadvantages of temporally continuous neural networks in contrast to clocked ones, continue with some \tricks of the trade" for training, using, and simulating continuous time and recurrent neural networks. We present somesimulations, and at the end, address issues of computational complexity and learning speed.
Neural Net Architectures for Temporal Sequence Processing
, 1994
"... I present a general taxonomy of neural net architectures for processing time-varying patterns. This taxonomy subsumes many existing architectures in the literature, and points to several promising architectures that have yet to be examined. Any architecture that processes timevarying patterns requir ..."
Abstract
-
Cited by 103 (0 self)
- Add to MetaCart
I present a general taxonomy of neural net architectures for processing time-varying patterns. This taxonomy subsumes many existing architectures in the literature, and points to several promising architectures that have yet to be examined. Any architecture that processes timevarying patterns requires two conceptually distinct components: a short-term memory that holds on to relevant past events and an associator that uses the short-term memory to classify or predict. My taxonomy is based on a characterization of short-term memory models along the dimensions of form, content, and adaptability. Experiments on predicting future values of a financial time series (US dollar--Swiss franc exchange rates) are presented using several alternative memory models. The results of these experiments serve as a baseline against which more sophisticated architectures can be compared. Neural networks have proven to be a promising alternative to traditional techniques for nonlinear temporal prediction t...
Making Working Memory Work: A Computational Model of Learning in the Prefrontal Cortex and Basal Ganglia
, 2005
"... The prefrontal cortex has long been thought to subserve both working memory (the holding of information online for processing) and executive functions (deciding how to manipulate working memory and perform processing). Although many computational models of working memory have been developed, the mec ..."
Abstract
-
Cited by 63 (4 self)
- Add to MetaCart
The prefrontal cortex has long been thought to subserve both working memory (the holding of information online for processing) and executive functions (deciding how to manipulate working memory and perform processing). Although many computational models of working memory have been developed, the mechanistic basis of executive function remains elusive, often amounting to a homunculus. This article presents an attempt to deconstruct this homunculus through powerful learning mechanisms that allow a computational model of the prefrontal cortex to control both itself and other brain areas in a strategic, task-appropriate manner. These learning mechanisms are based on subcortical structures in the midbrain, basal ganglia, and amygdala, which together form an actor-critic architecture. The critic system learns which prefrontal representations are task relevant and trains the actor, which in turn provides a dynamic gating mechanism for controlling working memory updating. Computationally, the learning mechanism is designed to simultaneously solve the temporal and structural credit assignment problems. The model’s performance compares favorably with standard backpropagation-based temporal learning mechanisms on the challenging 1-2-AX working memory task and other benchmark working memory tasks.
Learning Factorial Codes By Predictability Minimization
- Neural Computation
, 1991
"... I propose a novel general principle for unsupervised learning of distributed non-redundant internal representations of input patterns. The principle is based on two opposing forces. For each representational unit there is an adaptive predictor which tries to predict the unit from the remaining units ..."
Abstract
-
Cited by 47 (22 self)
- Add to MetaCart
I propose a novel general principle for unsupervised learning of distributed non-redundant internal representations of input patterns. The principle is based on two opposing forces. For each representational unit there is an adaptive predictor which tries to predict the unit from the remaining units. In turn, each unit tries to react to the environment such that it minimizes its predictability. This encourages each unit to filter `abstract concepts' out of the environmental input such that these concepts are statistically independent of those upon which the other units focus. I discuss various simple yet potentially powerful implementations of the principle which aim at finding binary factorial codes (Barlow et al., 1989), i.e. codes where the probability of the occurrence of a particular input is simply the product of the probabilities of the corresponding code symbols. Such codes are potentially relevant for (1) segmentation tasks, (2) speeding up supervised learning, (3) novelty detection. Methods for finding factorial codes automatically implement Occam's razor for finding codes using a minimal number of units. Unlike previous methods the novel principle has a potential for removing not only linear but also non-linear output redundancy. Illustrative experiments show that algorithms based on the principle of predictability minimization are practically feasible. The final part of this paper describes an entirely local algorithm that has a potential for learning unique representations of extended input sequences.
Learning long-term dependencies in NARX recurrent neural networks
, 1996
"... It has recently been shown that gradient-descent learning algorithms for recurrent neural networks can perform poorly on tasks that involve long--term dependencies, i.e. those problems for which the desired output depends on inputs presented at times far in the past. We show tht the long--term de ..."
Abstract
-
Cited by 40 (5 self)
- Add to MetaCart
It has recently been shown that gradient-descent learning algorithms for recurrent neural networks can perform poorly on tasks that involve long--term dependencies, i.e. those problems for which the desired output depends on inputs presented at times far in the past. We show tht the long--term dependencies problem is lessened for a class of architectures called NARX recurrent neural networks, which have powerful representational capabilities. We have previously reported that gradient descent learning can be more effective in NARX networks than in recurrent neural network architectures that have "hidden states" on problems including grammatical inference and nonlinear system identification. Typically, the network converges much faster and generalizes better than other networks. The results in this paper are consistent with this phenomenon. We present some experimental results which show that NARX networks can often retain information for two to three times as long as conventi...
Neural network music composition by prediction: Exploring the benefits of psychoacoustic constraints and multiscale processing
- Connection Science
, 1994
"... In algorithmic music composition, a simple technique involves selecting notes sequentially according to a transition table that specifies the probability of the next note as a function of the previous context. I describe an extension of this transition table approach using a recurrent autopredictive ..."
Abstract
-
Cited by 33 (0 self)
- Add to MetaCart
In algorithmic music composition, a simple technique involves selecting notes sequentially according to a transition table that specifies the probability of the next note as a function of the previous context. I describe an extension of this transition table approach using a recurrent autopredictive connectionist network called CONCERT. CONCERT is trained on a set of pieces with the aim of extracting stylistic regularities. CONCERT can then be used to compose new pieces. A central ingredient of CONCERT is the incorporation of psychologically-grounded representations of pitch, duration, and harmonic structure. CONCERT was tested on sets of examples artificially generated according to simple rules and was shown to learn the underlying structure, even where other approaches failed. In larger experiments, CONCERT was trained on sets of J. S. Bach pieces and traditional European folk melodies and was then allowed to compose novel melodies. Although the compositions are occasionally pleasa...
Extracting Regularities in Space and Time Through a Cascade of Prediction Networks: The Case of a Mobile Robot Navigating in a Structured Environment
, 1999
"... We propose that the ability to extract regularities from time series through prediction learning can be enhanced if we use a hierarchical architecture in which higher layers are trained to predict the internal state of lower layers when such states change significantly. This hierarchical organiza ..."
Abstract
-
Cited by 30 (6 self)
- Add to MetaCart
We propose that the ability to extract regularities from time series through prediction learning can be enhanced if we use a hierarchical architecture in which higher layers are trained to predict the internal state of lower layers when such states change significantly. This hierarchical organization has two functions: (a) it forces the system to progressively re-code sensory information so as to enhance useful regularities and filter out useless information; (b) it progressively reduces the length of the sequences which should be predicted going from lower to higher layers. This, in turn, allows higher levels to extract higher level regularities which are hidden at the sensory level. By training an architecture of this type to predict the next sensory state of a robot navigating in a environment divided into two rooms we show how the first level prediction layer extracts low level regularities such as `walls', `corners', and `corridors' while the second level prediction laye...
Learning Sequential Tasks by Incrementally Adding Higher Orders
- Advances in Neural Information Processing Systems 5
, 1993
"... An incremental, higher-order, non-recurrent network combines two properties found to be useful for learning sequential tasks: higherorder connections and incremental introduction of new units. The network adds higher orders when needed by adding new units that dynamically modify connection weights. ..."
Abstract
-
Cited by 26 (4 self)
- Add to MetaCart
An incremental, higher-order, non-recurrent network combines two properties found to be useful for learning sequential tasks: higherorder connections and incremental introduction of new units. The network adds higher orders when needed by adding new units that dynamically modify connection weights. Since the new units modify the weights at the next time-step with information from the previous step, temporal tasks can be learned without the use of feedback, thereby greatly simplifying training. Furthermore, a theoretically unlimited number of units can be added to reach into the arbitrarily distant past. Experiments with the Reber grammar have demonstrated speedups of two orders of magnitude over recurrent networks. 1 INTRODUCTION Second-order recurrent networks have proven to be very powerful [8], especially when trained using complete back propagation through time [1, 6, 14]. It has also been demonstrated by Fahlman that a recurrent network that incrementally adds nodes during traini...
Self-Segmentation of Sequences: Automatic Formation of Hierarchies of Sequential Behaviors
- IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: PART B CYBERNETICS
, 2000
"... The paper presents an approach for hierarchical reinforcement learning that does not rely on a priori domain-specific knowledge regarding hierarchical structures. Thus this work deals with a more difficult problem compared with existing work. It involves learning to segment action sequences to cr ..."
Abstract
-
Cited by 18 (1 self)
- Add to MetaCart
The paper presents an approach for hierarchical reinforcement learning that does not rely on a priori domain-specific knowledge regarding hierarchical structures. Thus this work deals with a more difficult problem compared with existing work. It involves learning to segment action sequences to create hierarchical structures (for example, for the purpose of dealing with partially observable Markov decision processes, with multiple limited-memory or memoryless modules). Segmentation is based on reinforcement received during task execution, with different levels of control communicating with each other through sharing reinforcement estimates obtained by each other. The algorithm segments action sequences to reduce non-Markovian temporal dependencies, and seeks out proper configurations of long- and short-range dependencies, to facilitate the learning of the overall task. Developing hierarchies also facilitates the extraction of explicit hierarchical plans. The initial experiments demonstrate the promise of the approach.
Sequential Neural Text Compression
, 1996
"... The purpose of this paper is to show that neural networks may be promising tools for data compression without loss of information. We combine predictive neural nets and statistical coding techniques to compress text les. We apply our methods to certain short newspaper articles and obtain compression ..."
Abstract
-
Cited by 18 (4 self)
- Add to MetaCart
The purpose of this paper is to show that neural networks may be promising tools for data compression without loss of information. We combine predictive neural nets and statistical coding techniques to compress text les. We apply our methods to certain short newspaper articles and obtain compression ratios exceeding those of widely used Lempel-Ziv algorithms (which build the basis of the UNIX functions "compress" and "gzip"). The main disadvantage of our methods is that they are about three orders of magnitude slower than standard methods.

