Results 1 - 10
of
66
Intrinsically motivated learning of hierarchical collections of skills
, 2004
"... Humans and other animals often engage in activities for their own sakes rather than as steps toward solving practical problems. Psychologists call these intrinsically motivated behaviors. What we learn during intrinsically motivated behavior is essential for our development as competent autonomous e ..."
Abstract
-
Cited by 80 (15 self)
- Add to MetaCart
Humans and other animals often engage in activities for their own sakes rather than as steps toward solving practical problems. Psychologists call these intrinsically motivated behaviors. What we learn during intrinsically motivated behavior is essential for our development as competent autonomous entities able to efficiently solve a wide range of practical problems as they arise. In this paper we present initial results from a computational study of intrinsically motivated learning aimed at allowing artificial agents to construct and extend hierarchies of reusable skills that are needed for competent autonomy. At the core of the model are recent theoretical and algorithmic advances in computational reinforcement learning, specifically, new concepts related to skills and new learning algorithms for learning with skill hierarchies. 1
Making Working Memory Work: A Computational Model of Learning in the Prefrontal Cortex and Basal Ganglia
, 2005
"... The prefrontal cortex has long been thought to subserve both working memory (the holding of information online for processing) and executive functions (deciding how to manipulate working memory and perform processing). Although many computational models of working memory have been developed, the mec ..."
Abstract
-
Cited by 63 (4 self)
- Add to MetaCart
The prefrontal cortex has long been thought to subserve both working memory (the holding of information online for processing) and executive functions (deciding how to manipulate working memory and perform processing). Although many computational models of working memory have been developed, the mechanistic basis of executive function remains elusive, often amounting to a homunculus. This article presents an attempt to deconstruct this homunculus through powerful learning mechanisms that allow a computational model of the prefrontal cortex to control both itself and other brain areas in a strategic, task-appropriate manner. These learning mechanisms are based on subcortical structures in the midbrain, basal ganglia, and amygdala, which together form an actor-critic architecture. The critic system learns which prefrontal representations are task relevant and trains the actor, which in turn provides a dynamic gating mechanism for controlling working memory updating. Computationally, the learning mechanism is designed to simultaneously solve the temporal and structural credit assignment problems. The model’s performance compares favorably with standard backpropagation-based temporal learning mechanisms on the challenging 1-2-AX working memory task and other benchmark working memory tasks.
Interactions Between Frontal Cortex and Basal Ganglia in Working Memory: A Computational Model
, 2000
"... The frontal cortex and basal ganglia interact via a relatively well-understood and elaborate system of interconnections. In the context of motor function, these interconnections can be understood as disinhibiting or "releasing the brakes" on frontal motor action plans --- the basal ganglia detect ap ..."
Abstract
-
Cited by 58 (8 self)
- Add to MetaCart
The frontal cortex and basal ganglia interact via a relatively well-understood and elaborate system of interconnections. In the context of motor function, these interconnections can be understood as disinhibiting or "releasing the brakes" on frontal motor action plans --- the basal ganglia detect appropriate contexts for performing motor actions, and enable the frontal cortex to execute such actions at the appropriate time. We build on this idea in the domain of working memory through the use of computational neural network models of this circuit. In our model, the frontal cortex exhibits robust active maintenance, while the basal ganglia contribute a selective, dynamic gating function that enables frontal memory representations to be rapidly updated in a task-relevant manner. We apply the model to a novel version of the continuous performance task (CPT) that requires subroutine-like selective working memory updating, and compare and contrast our model with other existing models and th...
Temporal Difference Model Reproduces Anticipatory Neural Activity
, 2000
"... Introduction In a famous experiment by Pavlov (1927), a dog was trained with the ringing of a bell (stimulus) followed by food delivery (reinforcer). In the first trial, the animal salivated when food was presented. After several trials, salivation started when the bell was rung. This finding sugge ..."
Abstract
-
Cited by 31 (1 self)
- Add to MetaCart
Introduction In a famous experiment by Pavlov (1927), a dog was trained with the ringing of a bell (stimulus) followed by food delivery (reinforcer). In the first trial, the animal salivated when food was presented. After several trials, salivation started when the bell was rung. This finding suggests that the salivation response following the bell ring reflects anticipation of food delivery. A large body of experimental evidence led to the hypothesis that Pavlovian learning is dependent upon the degree of unpredictability of the reinforcer (Rescorla & Wagner, 1972; Dickinson, 1980). According to this hypothesis, reinforcers become progressively less efficient for behavioral adaptation as their predictability grows during the course of learning. The difference between the actual occurrence and the prediction of the reinforcer is usually referred to as the "error" in the reinforcer prediction. This concept has been employed in the temporal-difference model (TD model) of Pavlovi
Intuition: a social cognitive neuroscience approach
- Psychological Bulletin
, 2000
"... This review proposes that implicit learning processes are the cognitive substrate of social intuition. This hypothesis is supported by (a) the conceptual correspondence between implicit learning and social intuition (nonverbal communication) and (b) a review of relevant neuropsychological (Huntingto ..."
Abstract
-
Cited by 29 (7 self)
- Add to MetaCart
This review proposes that implicit learning processes are the cognitive substrate of social intuition. This hypothesis is supported by (a) the conceptual correspondence between implicit learning and social intuition (nonverbal communication) and (b) a review of relevant neuropsychological (Huntington's and Parkinson's disease), neuroimaging, neurophysiological, and neuroanatomical data. It is concluded that the caudate and putamen, in the basal ganglia, are central components of both intuition and implicit learning, supporting the proposed relationship. Parallel, but distinct, processes of judgment and action are demonstrated at each of the social, cognitive, and neural levels of analysis. Additionally, explicit attempts to learn a sequence can interfere with implicit learning. The possible relevance of the computations of the basal ganglia to emotional appraisal, automatic evaluation, script processing, and decision making are discussed. These "feelings " have an efficiency of operation which it is impossi-ble for thought to match. Even our most highly intellectualized operations depend upon them as a "fringe " by which to guide our inferential movements. They give us our sense of rightness and wrongness, of what to select and emphasize and follow up, and what
A computational model of how the basal ganglia produce sequences
- Journal of Cognitive Neuroscience
, 1998
"... We propose a systems-level computational model of the basal ganglia based closely on known anatomy and physiology. First, we assume that the thalamic targets, which relay ascending information to cortical action and planning areas, are tonically inhibited by the basal ganglia. Second, we assume that ..."
Abstract
-
Cited by 26 (0 self)
- Add to MetaCart
We propose a systems-level computational model of the basal ganglia based closely on known anatomy and physiology. First, we assume that the thalamic targets, which relay ascending information to cortical action and planning areas, are tonically inhibited by the basal ganglia. Second, we assume that the output stage of the basal ganglia, the internal segment of the globus pallidus (GPi), selects a single action from several competing actions via lateral interactions. Third, we propose that a form of local working memory exists in the form of reciprocal connections between the external globus pallidus (GPe) and the subthalamic nucleus (STN). As a test of the model, the system was trained to learn a sequence of states that required the context of previous actions. The striatum, which was assumed to represent a conjunction of cortical states, directly selected the action in the GP during training. The STN-to-GP connection strengths were modi�ed by an associative learning
Temporal Difference Learning in Continuous Time and Space
- Advances in Neural Information Processing Systems 8
, 1996
"... A continuous-time, continuous-state version of the temporal difference (TD) algorithm is derived in order to facilitate the application of reinforcement learning to real-world control tasks and neurobiological modeling. An optimal nonlinear feedback control law was also derived using the derivatives ..."
Abstract
-
Cited by 24 (6 self)
- Add to MetaCart
A continuous-time, continuous-state version of the temporal difference (TD) algorithm is derived in order to facilitate the application of reinforcement learning to real-world control tasks and neurobiological modeling. An optimal nonlinear feedback control law was also derived using the derivatives of the value function. The performance of the algorithms was tested in a task of swinging up a pendulum with limited torque. Both the "critic" that specifies the paths to the upright position and the "actor" that works as a nonlinear feedback controller were successfully implemented by radial basis function (RBF) networks. 1 INTRODUCTION The temporal-difference (TD) algorithm (Sutton, 1988) for delayed reinforcement learning has been applied to a variety of tasks, such as robot navigation, board games, and biological modeling (Houk et al., 1994). Elucidation of the relationship between TD learning and dynamic programming (DP) has provided good theoretical insights (Barto et al., 1995). How...
Models of the cerebellum and motor learning
- Behavioral and Brain Sciences
, 1996
"... Houk, J.C., Buckingham, J.T., & Barto, A.G. (1996). Models of the cerebellum and motor learning. Behavioral and Brain Sciences 19 ..."
Abstract
-
Cited by 23 (5 self)
- Add to MetaCart
Houk, J.C., Buckingham, J.T., & Barto, A.G. (1996). Models of the cerebellum and motor learning. Behavioral and Brain Sciences 19
From recurrent choice to skill learning: A reinforcement-learning model
- Journal of Experimental Psychology: General
, 2006
"... The authors propose a reinforcement-learning mechanism as a model for recurrent choice and extend it to account for skill learning. The model was inspired by recent research in neurophysiological studies of the basal ganglia and provides an integrated explanation of recurrent choice behavior and ski ..."
Abstract
-
Cited by 22 (6 self)
- Add to MetaCart
The authors propose a reinforcement-learning mechanism as a model for recurrent choice and extend it to account for skill learning. The model was inspired by recent research in neurophysiological studies of the basal ganglia and provides an integrated explanation of recurrent choice behavior and skill learning. The behavior includes effects of differential probabilities, magnitudes, variabilities, and delay of reinforcement. The model can also produce the violation of independence, preference reversals, and the goal gradient of reinforcement in maze learning. An experiment was conducted to study learning of action sequences in a multistep task. The fit of the model to the data demonstrated its ability to account for complex skill learning. The advantages of incorporating the mechanism into a larger cognitive architecture are discussed.
Layered Control Architectures in Robots and Vertebrates
- Adaptive Behavior
, 1998
"... We review recent research in robotics, neuroscience, evolutionary neurobiology, and ethology with the aim of highlighting some points of agreement and convergence. Specifically, we compare Brooks' (1986) subsumption architecture for robot control with research in neuroscience demonstrating layered c ..."
Abstract
-
Cited by 20 (5 self)
- Add to MetaCart
We review recent research in robotics, neuroscience, evolutionary neurobiology, and ethology with the aim of highlighting some points of agreement and convergence. Specifically, we compare Brooks' (1986) subsumption architecture for robot control with research in neuroscience demonstrating layered control systems in vertebrate brains, and with research in ethology that emphasizes the decomposition of control into multiple, intertwined behavior systems. From this perspective we then describe interesting parallels between the subsumption architecture and the natural layered behavior system that determines defense reactions in the rat. We then consider the action selection problem for robots and vertebrates and argue that, in addition to subsumption-like conflict resolution mechanisms, the vertebrate nervous system employs specialized selection mechanisms located in a group of central brain structures termed the basal ganglia. We suggest that similar specialized switching mechanisms might...

