Results 1 - 10
of
454
Making working memory work: A computational model of learning in the prefrontal cortex and basal ganglia
, 2003
"... The prefrontal cortex has long been thought to subserve both working memory (the holding of information online for processing) and “executive ” functions (deciding how to manipulate working memory and perform processing). Although many computational models of working memory have been developed, the ..."
Abstract
-
Cited by 174 (19 self)
- Add to MetaCart
(Show Context)
The prefrontal cortex has long been thought to subserve both working memory (the holding of information online for processing) and “executive ” functions (deciding how to manipulate working memory and perform processing). Although many computational models of working memory have been developed, the mechanistic basis of executive function remains elusive, often amounting to a homunculus. This paper presents an attempt to deconstruct this homunculus through powerful learning mechanisms that allow a computational model of the prefrontal cortex to control both itself and other brain areas in a strategic, task-appropriate manner. These learning mechanisms are based on subcortical structures in the midbrain, basal ganglia and amygdala, which together form an actor/critic architecture. The critic system learns which prefrontal representations are task-relevant and trains the actor, which in turn provides a dynamic gating mechanism for controlling working memory updating. Computationally, the learning mechanism is designed to simultaneously solve the temporal and structural credit assignment problems. The model’s performance compares favorably with standard backpropagation-based temporal learning mechanisms on the challenging 1-2-AX working memory task, and other benchmark working memory tasks.
Interactions Between Frontal Cortex and Basal Ganglia in Working Memory: A Computational Model
, 2000
"... The frontal cortex and basal ganglia interact via a relatively well-understood and elaborate system of interconnections. In the context of motor function, these interconnections can be understood as disinhibiting or "releasing the brakes" on frontal motor action plans --- the basal ganglia ..."
Abstract
-
Cited by 152 (18 self)
- Add to MetaCart
The frontal cortex and basal ganglia interact via a relatively well-understood and elaborate system of interconnections. In the context of motor function, these interconnections can be understood as disinhibiting or "releasing the brakes" on frontal motor action plans --- the basal ganglia detect appropriate contexts for performing motor actions, and enable the frontal cortex to execute such actions at the appropriate time. We build on this idea in the domain of working memory through the use of computational neural network models of this circuit. In our model, the frontal cortex exhibits robust active maintenance, while the basal ganglia contribute a selective, dynamic gating function that enables frontal memory representations to be rapidly updated in a task-relevant manner. We apply the model to a novel version of the continuous performance task (CPT) that requires subroutine-like selective working memory updating, and compare and contrast our model with other existing models and th...
Learning to Perceive the World as Articulated: An Approach for Hierarchical Learning in Sensory-Motor Systems
- NEURAL NETWORKS
, 1999
"... This paper describes how agents can learn an internal model of the world structurally by focusing on the problem of behavior-based articulation. We develop an on-line learning scheme -- the so-called mixture of recurrent neural net (RNN) experts -- in which a set of RNN modules becomes self-organ ..."
Abstract
-
Cited by 141 (31 self)
- Add to MetaCart
This paper describes how agents can learn an internal model of the world structurally by focusing on the problem of behavior-based articulation. We develop an on-line learning scheme -- the so-called mixture of recurrent neural net (RNN) experts -- in which a set of RNN modules becomes self-organized as experts on multiple levels in order to account for the different categories of sensory-motor flow which the robot experiences. Autonomous switching of activated modules in the lower level actually represents the articulation of the sensory-motor flow. In the meanwhile, a set of RNNs in the higher level competes to learn the sequences of module switching in the lower level, by which articulation at a further more abstract level can be achieved. The proposed scheme was examined through simulation experiments involving the navigation learning problem. Our dynamical systems analysis clarified the mechanism of the articulation; the possible correspondence between the articulation...
Speech recognition with deep recurrent neural networks
, 2013
"... Recurrent neural networks (RNNs) are a powerful model for sequential data. End-to-end training methods such as Connectionist Temporal Classification make it possible to train RNNs for sequence labelling problems where the input-output alignment is unknown. The combination of these methods with the L ..."
Abstract
-
Cited by 104 (8 self)
- Add to MetaCart
Recurrent neural networks (RNNs) are a powerful model for sequential data. End-to-end training methods such as Connectionist Temporal Classification make it possible to train RNNs for sequence labelling problems where the input-output alignment is unknown. The combination of these methods with the Long Short-term Memory RNN architecture has proved particularly fruitful, delivering state-of-the-art results in cursive handwriting recognition. However RNN performance in speech recognition has so far been disappointing, with better results returned by deep feedforward networks. This paper investigates deep recurrent neural networks, which combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs. When trained end-to-end with suitable regularisation, we find that deep Long Short-term Memory RNNs achieve a test set error of 17.7 % on the TIMIT phoneme recognition benchmark, which to our knowledge is the best recorded score.
Short term memory in echo state networks. GMD-Report 152
- GMD - German National Research Institute for Computer Science (2002), http://www.faculty.jacobs-university.de/hjaeger/pubs/ STMEchoStatesTechRep.pdf
"... Echo State Networks (ESNs) is an approach to design and train recur-rent neural networks in supervised learning tasks. An important objective in many such tasks is to learn to exploit long-time dependencies in the pro-cessed signals (“long short-term memory ” performance). Here we expose ESNs to a s ..."
Abstract
-
Cited by 97 (3 self)
- Add to MetaCart
Echo State Networks (ESNs) is an approach to design and train recur-rent neural networks in supervised learning tasks. An important objective in many such tasks is to learn to exploit long-time dependencies in the pro-cessed signals (“long short-term memory ” performance). Here we expose ESNs to a series of synthetic benchmark tasks that have been used in the literature to study the learnability of long-range temporal dependencies. This report provides all the detail necessary to replicate these experiments. It is intended to serve as the technical companion to a journal submission paper where the findings are analysed and compared to results obtained elsewhere with other learning paradigms. 1
Framewise phoneme classification with bidirectional LSTM and other neural network architectures
- NEURAL NETWORKS
, 2005
"... In this paper, we apply bidirectional training to a Long Short Term Memory (LSTM) network for the first time. We also present a modified, full gradient version of the LSTM learning algorithm. On the TIMIT speech database, we measure the framewise phoneme classification ability of bidirectional and ..."
Abstract
-
Cited by 92 (24 self)
- Add to MetaCart
(Show Context)
In this paper, we apply bidirectional training to a Long Short Term Memory (LSTM) network for the first time. We also present a modified, full gradient version of the LSTM learning algorithm. On the TIMIT speech database, we measure the framewise phoneme classification ability of bidirectional and unidirectional variants of both LSTM and conventional Recurrent Neural Networks (RNNs). We find that the LSTM architecture outperforms conventional RNNs and that bidirectional networks outperform unidirectional ones.
Learning to Forget: Continual Prediction with LSTM
- NEURAL COMPUTATION
, 1999
"... Long Short-Term Memory (LSTM, Hochreiter & Schmidhuber, 1997) can solve numerous tasks not solvable by previous learning algorithms for recurrent neural networks (RNNs). We identify a weakness of LSTM networks processing continual input streams that are not a priori segmented into subsequenc ..."
Abstract
-
Cited by 86 (25 self)
- Add to MetaCart
Long Short-Term Memory (LSTM, Hochreiter & Schmidhuber, 1997) can solve numerous tasks not solvable by previous learning algorithms for recurrent neural networks (RNNs). We identify a weakness of LSTM networks processing continual input streams that are not a priori segmented into subsequences with explicitly marked ends at which the network's internal state could be reset. Without resets, the state may grow indenitely and eventually cause the network to break down. Our remedy is a novel, adaptive \forget gate" that enables an LSTM cell to learn to reset itself at appropriate times, thus releasing internal resources. We review illustrative benchmark problems on which standard LSTM outperforms other RNN algorithms. All algorithms (including LSTM) fail to solve continual versions of these problems. LSTM with forget gates, however, easily solves them in an elegant way.
openEAR - Introducing the Munich Open-Source Emotion and Affect Recognition Toolkit
- In ACII
, 2009
"... Various open-source toolkits exist for speech recognition and speech processing. These toolkits have brought a great benefit to the research community, i.e. speeding up research. Yet, no such freely available toolkit exists for automatic affect recognition from speech. We herein introduce a novel op ..."
Abstract
-
Cited by 85 (31 self)
- Add to MetaCart
(Show Context)
Various open-source toolkits exist for speech recognition and speech processing. These toolkits have brought a great benefit to the research community, i.e. speeding up research. Yet, no such freely available toolkit exists for automatic affect recognition from speech. We herein introduce a novel open-source affect and emotion recognition engine, which integrates all necessary components in one highly efficient software package. The components include audio recording and audio file reading, state-of-the-art paralinguistic feature extraction and plugable classification modules. In this paper we introduce the engine and extensive baseline results. Pre-trained models for four affect recognition tasks are included in the openEAR distribution. The engine is tailored for multi-threaded, incremental on-line processing of live input in real-time, however it can also be used for batch processing of databases. 1.
Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies
, 2001
"... Recurrent networks (crossreference Chapter 12) can, in principle, use their feedback connections to store representations of recent input events in the form of activations. The most widely used algorithms for learning what to put in short-term memory, however, take too much time to be feasible or d ..."
Abstract
-
Cited by 83 (25 self)
- Add to MetaCart
Recurrent networks (crossreference Chapter 12) can, in principle, use their feedback connections to store representations of recent input events in the form of activations. The most widely used algorithms for learning what to put in short-term memory, however, take too much time to be feasible or do not work well at all, especially when minimal time lags between inputs and corresponding teacher signals are long. Although theoretically fascinating, they do not provide clear practical advantages over, say, backprop in feedforward networks with limited time windows (see crossreference Chapters 11 and 12). With conventional "algorithms based on the computation of the complete gradient", such as "Back-Propagation Through Time" (BPTT, e.g., [22, 27, 26]) or "Real-Time Recurrent Learning" (RTRL, e.g., [21]) error signals "flowing backwards in time" tend to either (1) blow up or (2) vanish: the temporal evolution of the backpropagated error ex