Results 1 - 10
of
48
Learning to generate articulated behavior through the bottom-up and the top-down interaction processes
- NEURAL NETW 16: 11–23
, 2003
"... A novel hierarchical neural network architecture for sensory-motor learning and behavior generation is proposed. Two levels of forward model neural networks are operated on different time scales while parametric interactions are allowed between the two network levels in the bottom-up and top-down di ..."
Abstract
-
Cited by 33 (16 self)
- Add to MetaCart
A novel hierarchical neural network architecture for sensory-motor learning and behavior generation is proposed. Two levels of forward model neural networks are operated on different time scales while parametric interactions are allowed between the two network levels in the bottom-up and top-down directions. The models are examined through experiments of behavior learning and generation using a real robot arm equipped with a vision system. The results of the learning experiments showed that the behavioral patterns are learned by self-organizing the behavioral primitives in the lower level and combining the primitives sequentially in the higher level. The results contrast with prior work
Multiple model-based reinforcement learning
- Neural Computation
, 2002
"... We propose a modular reinforcement learning architecture for non-linear, non-stationary control tasks, which we call multiple model-based reinforcement learn-ing (MMRL). The basic idea is to decompose a complex task into multiple domains in space and time based on the predictability of the environme ..."
Abstract
-
Cited by 32 (1 self)
- Add to MetaCart
We propose a modular reinforcement learning architecture for non-linear, non-stationary control tasks, which we call multiple model-based reinforcement learn-ing (MMRL). The basic idea is to decompose a complex task into multiple domains in space and time based on the predictability of the environmental dynamics. The 1 system is composed of multiple modules, each of which consists of a state predic-tion model and a reinforcement learning controller. The “responsibility signal,” which is given by the softmax function of the prediction errors, is used to weight the outputs of multiple modules as well as to gate the learning of the predic-tion models and the reinforcement learning controllers. We formulate MMRL for both discrete-time, finite state case and continuous-time, continuous state case. The performance of MMRL was demonstrated for discrete case in a non-stationary hunting task in a grid world and for continuous case in a non-linear, non-stationary control task of swinging up a pendulum with variable physical parameters. 1
Self-Organization of Distributedly Represented Multiple Behavior Schemata in a Mirror System: . . .
, 2004
"... The current paper reviews a connectionist model, the recurrent neural network with parametric biases (RNNPB), in which multiple behavior schemata can be learned by the network in a distributed manner. The parametric biases in the network play an essential role in both generating and recognizing beh ..."
Abstract
-
Cited by 30 (7 self)
- Add to MetaCart
The current paper reviews a connectionist model, the recurrent neural network with parametric biases (RNNPB), in which multiple behavior schemata can be learned by the network in a distributed manner. The parametric biases in the network play an essential role in both generating and recognizing behavior 1 patterns. They act as a mirror system by means of self-organizing adequate memory structures. Three different robot experiments are reviewed: robot and user interactions; learning and generating different types of dynamic patterns; and linguistic-behavior binding. The hallmark of this study is explaining how self-organizing internal structures can contribute to generalization in learning, and diversity in behavior generation, in the proposed distributed representation scheme.
Reinforcement learning by reward-weighted regression for operational space control
- In: Proceedings of the International Conference on Machine Learning (ICML
, 2007
"... Many robot control problems of practical importance, including operational space control, can be reformulated as immediate reward reinforcement learning problems. However, few of the known optimization or reinforcement learning algorithms can be used in online learning control for robots, as they ar ..."
Abstract
-
Cited by 17 (11 self)
- Add to MetaCart
Many robot control problems of practical importance, including operational space control, can be reformulated as immediate reward reinforcement learning problems. However, few of the known optimization or reinforcement learning algorithms can be used in online learning control for robots, as they are either prohibitively slow, do not scale to interesting domains of complex robots, or require trying out policies generated by random search, which are infeasible for a physical system. Using a generalization of the EM-base reinforcement learning framework suggested by Dayan & Hinton, we reduce the problem of learning with immediate rewards to a reward-weighted regression problem with an adaptive, integrated reward transformation for faster convergence. The resulting algorithm is efficient, learns smoothly without dangerous jumps in solution space, and works well in applications of complex high degreeof-freedom robots. 1.
Relativized Options: Choosing the Right Transformation
- Proceedings of the Twentieth International Conference on Machine Learning
, 2003
"... Relativized options combine model minimization methods and a hierarchical reinforcement learning framework to derive compact reduced representations of a related family of tasks. Relativized options are defined without an absolute frame of reference, and an option's policy is transformed suita ..."
Abstract
-
Cited by 15 (6 self)
- Add to MetaCart
Relativized options combine model minimization methods and a hierarchical reinforcement learning framework to derive compact reduced representations of a related family of tasks. Relativized options are defined without an absolute frame of reference, and an option's policy is transformed suitably based on the circumstances under which the option is invoked. In earlier work we addressed the issue of learning the option policy online. In this article we develop an algorithm for choosing, from among a set of candidate transformations, the right transformation for each member of the family of tasks.
Isotropic Sequence Order Learning
, 2003
"... In this article, we present an isotropic unsupervised algorithm for temporal sequence learning. Nospecial reward signal is used such that all inputs are completely isotropic. All input signals are bandpass filtered before converging onto a linear output neuron. All synaptic weights change according ..."
Abstract
-
Cited by 12 (8 self)
- Add to MetaCart
In this article, we present an isotropic unsupervised algorithm for temporal sequence learning. Nospecial reward signal is used such that all inputs are completely isotropic. All input signals are bandpass filtered before converging onto a linear output neuron. All synaptic weights change according to the correlation of bandpass-filtered inputs with the derivative of the output. We investigate the algorithm in an open- and a closed-loop condition, the latter being defined by embedding the learning system into a behavioral feedback loop. In the open-loop condition, we find that the linear structure of the algorithm allows analytically calculating the shape of the weight change, which is strictly heterosynaptic and follows the shape of the weight change curves found in spike-time-dependent plasticity. Furthermore, we show that synaptic weights stabilize automatically when no more temporal differences exist between the inputs without additional normalizing measures. In the second part of this study, the algorithm is is placed in an environment that leads to closed sensormotor loop. To this end, a robot is programmed with a prewired retraction reflex reaction in response to collisions. Through isotropic sequence order (ISO) learning, the robot achieves collision avoidance by learning the correlation between his early range-finder signals and the later occurring collision signal. Synaptic weights stabilize at the end of learning as theoretically predicted. Finally, we discuss the relation of ISO learning with other drive reinforcement models and with the commonly used temporal difference learning algorithm. This study is followed up by a mathematical analysis of the closed-loop situation in the companion article in this issue, “ISO Learning Approximates a Solution to the Inverse-Controller Problem in an Unsupervised Behavioral Paradigm” (pp. 865–884).
Action Coordination in Groups and Individuals: Learning Anticipatory Control
- J. Experimental Psychology: Learning, Memory, and Cognition
, 2003
"... When individuals act alone, they can internally coordinate the actions at hand. Such coordination is not feasible when individuals act together in a group. The present research examines to what extent groups encounter specific challenges when acting jointly and whether these challenges impede extend ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
When individuals act alone, they can internally coordinate the actions at hand. Such coordination is not feasible when individuals act together in a group. The present research examines to what extent groups encounter specific challenges when acting jointly and whether these challenges impede extending planning into the future. Individuals and groups carried out a tracking task that required learning a new anticipatory control strategy. The results show that groups face additional demands that are harder to overcome when planning needs to be extended into the future. Information about others ’ actions is a necessary condition for groups to effectively learn to extend their plans. Possible mechanisms for exerting and learning anticipatory control are discussed. Researchers in the area of action planning and action control use a host of diverse tasks to investigate the cognitive functions that enable one to coordinate action alternatives. Examples include selecting and programming arbitrary actions in response to arbitrary stimuli (Hommel & Prinz, 1997), switching between arbitrary tasks (Allport, 1993; Mayr & Keele, 2000; Rogers & Monsell, 1995), and carrying out two tasks at the same time (Meyer &
Understanding mirror neurons: a bio-robotic approach
- INTERACTION STUDIES
, 2006
"... This paper reports about our investigation on action understanding in the brain. We review recent results of the neurophysiology of the mirror system in the monkey. Based on these observations we propose a model of the brain systems responsible for action recognition, in which the link between objec ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
This paper reports about our investigation on action understanding in the brain. We review recent results of the neurophysiology of the mirror system in the monkey. Based on these observations we propose a model of the brain systems responsible for action recognition, in which the link between object affordances and action understanding is explicitly considered. To support our hypothesis we describe two experiments where some aspects of the model have been implemented. In the first experiment an action recognition system is trained by using data recorded from human movements which include kinesthetic, tactile, and visual information. In the second experiment, the model is partially implemented on a humanoid robot which learns to mimic simple actions performed by a human subject on different objects. These experiments show that motor information can have a significant role in interpretation of actions and that a mirror-like representation can be developed autonomously as a result of the interaction between an individual and the environment.
Integration of Action and Language Knowledge: A Roadmap for Developmental Robotics
, 2010
"... This position paper proposes that the study of embodied cognitive agents, such as humanoid robots, can advance our understanding of the cognitive development of complex sensorimotor, linguistic and social learning skills. This in turn will benefit the design of cognitive robots capable of learning ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
This position paper proposes that the study of embodied cognitive agents, such as humanoid robots, can advance our understanding of the cognitive development of complex sensorimotor, linguistic and social learning skills. This in turn will benefit the design of cognitive robots capable of learning to handle and manipulate objects and tools autonomously, to cooperate and communicate with other robots and humans, and to adapt their abilities to changing internal, environmental, and social conditions. Four key areas of research challenges are discussed, specifically for the issues related to the understanding of: (i) how agents learn and represent compositional actions; (ii) how agents learn and represent compositional lexicons; (iii) the dynamics of social interaction and learning; and (iv) how compositional action and language representations are integrated to bootstrap the cognitive system. The review of specific issues and progress in these areas is then translated into a practical roadmap based on a series of milestones. These milestones provide a possible set of cognitive robotics goals and test-scenarios, thus acting as a research roadmap for future work on cognitive developmental robotics.
ISO-learning approximates a solution to the inverse-controller problem in an unsupervised behavioural paradigm
, 2003
"... this article we will analytically demonstrate that this process can be understood in terms of control theory showing that the system learns the inverse controller of its own reflex. Thereby this system is able to learn a simple form feed-forward motor control ..."
Abstract
-
Cited by 6 (5 self)
- Add to MetaCart
this article we will analytically demonstrate that this process can be understood in terms of control theory showing that the system learns the inverse controller of its own reflex. Thereby this system is able to learn a simple form feed-forward motor control

