• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Actor-critic models of the basal ganglia: New anatomical and computational perspectives. Neural Netw (2002)

by D Joel, Y Niv, E Ruppin
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 19
Next 10 →

Making Working Memory Work: A Computational Model of Learning in the Prefrontal Cortex and Basal Ganglia

by Randall C. O’Reilly, Michael J. Frank , 2005
"... The prefrontal cortex has long been thought to subserve both working memory (the holding of information online for processing) and executive functions (deciding how to manipulate working memory and perform processing). Although many computational models of working memory have been developed, the mec ..."
Abstract - Cited by 63 (4 self) - Add to MetaCart
The prefrontal cortex has long been thought to subserve both working memory (the holding of information online for processing) and executive functions (deciding how to manipulate working memory and perform processing). Although many computational models of working memory have been developed, the mechanistic basis of executive function remains elusive, often amounting to a homunculus. This article presents an attempt to deconstruct this homunculus through powerful learning mechanisms that allow a computational model of the prefrontal cortex to control both itself and other brain areas in a strategic, task-appropriate manner. These learning mechanisms are based on subcortical structures in the midbrain, basal ganglia, and amygdala, which together form an actor-critic architecture. The critic system learns which prefrontal representations are task relevant and trains the actor, which in turn provides a dynamic gating mechanism for controlling working memory updating. Computationally, the learning mechanism is designed to simultaneously solve the temporal and structural credit assignment problems. The model’s performance compares favorably with standard backpropagation-based temporal learning mechanisms on the challenging 1-2-AX working memory task and other benchmark working memory tasks.

Temporal sequence learning, prediction and control - a review of different models and their relation to biological mechanisms

by Florentin Wörgötter, Bernd Porr - Neural Computation , 2004
"... In this article we compare methods for temporal sequence learning (TSL) across the disciplines machine-control, classical conditioning, neuronal models for TSL as well as spiketiming dependent plasticity. This review will briefly introduce the most influential models and focus on two questions: 1) T ..."
Abstract - Cited by 17 (3 self) - Add to MetaCart
In this article we compare methods for temporal sequence learning (TSL) across the disciplines machine-control, classical conditioning, neuronal models for TSL as well as spiketiming dependent plasticity. This review will briefly introduce the most influential models and focus on two questions: 1) To what degree are reward-based (e.g. TD-learning) and correlation based (hebbian) learning related? and 2) How do the different models correspond to possibly underlying biological mechanisms of synaptic plasticity? We will first compare the different models in an open-loop condition, where behavioral feedback does not alter the learning. Here we observe, that reward-based and correlation based learning are indeed very similar. Machine-control is then used to introduce the problem of closed-loop control (e.g. “actor-critic architectures”). Here the problem of evaluative (“rewards”) versus nonevaluative (“correlations”) feedback from the environment will be discussed showing that both learning approaches are fundamentally different in the closed-loop condition. In trying to answer the second question we will compare neuronal versions of the different learning architectures to the anatomy of the involved brain structures (basal-ganglia, thalamus and

P VLV: the primary value and learned value Pavlovian learning algorithm

by Randall C. O’reilly, Thomas E. Hazy, On Watz, Michael J. Frank - Behav. Neurosci , 2007
"... The authors present their primary value learned value (PVLV) model for understanding the rewardpredictive firing properties of dopamine (DA) neurons as an alternative to the temporal-differences (TD) algorithm. PVLV is more directly related to underlying biology and is also more robust to variabilit ..."
Abstract - Cited by 10 (2 self) - Add to MetaCart
The authors present their primary value learned value (PVLV) model for understanding the rewardpredictive firing properties of dopamine (DA) neurons as an alternative to the temporal-differences (TD) algorithm. PVLV is more directly related to underlying biology and is also more robust to variability in the environment. The primary value (PV) system controls performance and learning during primary rewards, whereas the learned value (LV) system learns about conditioned stimuli. The PV system is essentially the Rescorla–Wagner/delta-rule and comprises the neurons in the ventral striatum/nucleus accumbens that inhibit DA cells. The LV system comprises the neurons in the central nucleus of the amygdala that excite DA cells. The authors show that the PVLV model can account for critical aspects of the DA firing data, making a number of clear predictions about lesion effects, several of which are consistent with existing data. For example, first- and second-order conditioning can be anatomically dissociated, which is consistent with PVLV and not TD. Overall, the model provides a biologically plausible framework for understanding the neural basis of reward learning.

Reinforcement learning in the brain

by Yael Niv
"... Abstract: A wealth of research focuses on the decision-making processes that animals and humans employ when selecting actions in the face of reward and punishment. Initially such work stemmed from psychological investigations of conditioned behavior, and explanations of these in terms of computation ..."
Abstract - Cited by 8 (4 self) - Add to MetaCart
Abstract: A wealth of research focuses on the decision-making processes that animals and humans employ when selecting actions in the face of reward and punishment. Initially such work stemmed from psychological investigations of conditioned behavior, and explanations of these in terms of computational models. Increasingly, analysis at the computational level has drawn on ideas from reinforcement learning, which provide a normative framework within which decision-making can be analyzed. More recently, the fruits of these extensive lines of research have made contact with investigations into the neural basis of decision making. Converging evidence now links reinforcement learning to specific neural substrates, assigning them precise computational roles. Specifically, electrophysiological recordings in behaving animals and functional imaging of human decision-making have revealed in the brain the existence of a key reinforcement learning signal, the temporal difference reward prediction error. Here, we first introduce the formal reinforcement learning framework. We then review the multiple lines of evidence linking reinforcement learning to the function of dopaminergic neurons in the mammalian midbrain and

Conditional Routing of Information to the Cortex: A Model of the Basal Ganglia’s Role in Cognitive Coordination

by Andrea Stocco, Christian Lebiere, John R. Anderson
"... The basal ganglia play a central role in cognition and are involved in such general functions as action selection and reinforcement learning. Here, we present a model exploring the hypothesis that the basal ganglia implement a conditional information-routing system. The system directs the transmissi ..."
Abstract - Cited by 6 (3 self) - Add to MetaCart
The basal ganglia play a central role in cognition and are involved in such general functions as action selection and reinforcement learning. Here, we present a model exploring the hypothesis that the basal ganglia implement a conditional information-routing system. The system directs the transmission of cortical signals between pairs of regions by manipulating separately the selection of sources and destinations of information transfers. We suggest that such a mechanism provides an account for several cognitive functions of the basal ganglia. The model also incorporates a possible mechanism by which subsequent transfers of information control the release of dopamine. This signal is used to produce novel stimulus–response associations by internalizing transferred cortical representations in the striatum. We discuss how the model is related to production systems and cognitive architectures. A series of simulations is presented to illustrate how the model can perform simple stimulus–response tasks, develop automatic behaviors, and provide an account of impairments in Parkinson’s and Huntington’s diseases.

Actor-critic models of reinforcement learning in the basal ganglia: From natural to artificial rats

by Mehdi Khamassi, Loïc Lachèze, Benoît Girard, Alain Berthoz, Agnès Guillot - Adapt. Behav , 2005
"... On behalf of: ..."
Abstract - Cited by 5 (1 self) - Add to MetaCart
On behalf of:

Actor-Critic models of animal control -- a critique of reinforcement learning

by Florentin Wörgötter - PROCEEDING OF FOURTH INTERNATIONAL ICSC SYMPOSIUM ON ENGINEERING OF INTELLIGENT SYSTEMS , 2004
"... In this article we will compare traditional reinforcement learning techniques with a novel correlation based algorithm. We will discuss several problems which occur in reward-based reinforcement learning and outline alternative solutions. An example of a robot control task shown at the end will supp ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
In this article we will compare traditional reinforcement learning techniques with a novel correlation based algorithm. We will discuss several problems which occur in reward-based reinforcement learning and outline alternative solutions. An example of a robot control task shown at the end will support our claims.

A Model of Reaching That Integrates Reinforcement Learning and Population Encoding of Postures*

by Dimitri Ognibene, Angelo Rega, Gianluca Baldassarre
"... Abstract. When monkeys tackle novel complex behavioral tasks by trial-anderror they select actions from repertoires of sensorimotor primitives that allow them to search solutions in a space which is coarser than the space of fine movements. Neuroscientific findings suggested that upper-limb sensorim ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Abstract. When monkeys tackle novel complex behavioral tasks by trial-anderror they select actions from repertoires of sensorimotor primitives that allow them to search solutions in a space which is coarser than the space of fine movements. Neuroscientific findings suggested that upper-limb sensorimotor primitives might be encoded, in terms of the final goal-postures they pursue, in premotor cortex. A previous work by the authors reproduced these results in a model based on the idea that cortical pathways learn sensorimotor primitives while basal ganglia learn to assemble and trigger them to pursue complex reward-based goals. This paper extends that model in several directions: a) it uses a Kohonen network to create a neural map with population encoding of postural primitives; b) it proposes an actor-critic reinforcement learning algorithm capable of learning to select those primitives in a biologically plausible fashion (i.e., through a dynamic competition between postures); c) it proposes a procedure to pre-train the actor to select promising primitives when tackling novel reinforcement learning tasks. Some tests (obtained with a task used for studying monkeys engaged in learning reaching-action sequences) show that the model is computationally sound and capable of learning to select sensorimotor primitives from the postures ’ continuous space on the basis of their population encoding. 1

ConneCtions Between Computational and neuroBiologiCal perspeCtives on decision making -- decision theory, . . .

by Peter Dayan, et al. , 2008
"... ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Abstract not found

In search of the neural circuits of intrinsic motivation

by Frederic Kaplan, Pierre-yves Oudeyer
"... Children seem to acquire new know-how in a continuous and open-ended manner. In this paper, we hypothesize that an intrinsic motivation to progress in learning is at the origins of the remarkable structure of children’s developmental trajectories. In this view, children engage in exploratory and pla ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
Children seem to acquire new know-how in a continuous and open-ended manner. In this paper, we hypothesize that an intrinsic motivation to progress in learning is at the origins of the remarkable structure of children’s developmental trajectories. In this view, children engage in exploratory and playful activities for their own sake, not as steps toward other extrinsic goals. The central hypothesis of this paper is that intrinsically motivating activities correspond to expected decrease in prediction error. This motivation system pushes the infant to avoid both predictable and unpredictable situations in order to focus on the ones that are expected to maximize progress in learning. Based on a computational model and a series of robotic experiments, we show how this principle can lead to organized sequences of behavior of increasing complexity characteristic of several behavioral and developmental patterns observed in humans. We then discuss the putative circuitry underlying such an intrinsic motivation system in the brain and formulate two novel hypotheses. The first one is that tonic dopamine acts as a learning progress signal. The second is that this progress signal is directly computed through a hierarchy of microcortical circuits that act both as prediction and metaprediction systems.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University