Results 1 - 10
of
281
Combining Manual Feedback with Subsequent MDP Reward Signals for Reinforcement Learning
"... As learning agents move from research labs to the real world, it is increasingly important that human users, including those without programming skills, be able to teach agents desired behaviors. Recently, the tamer framework was introduced for designing agents that can be interactively shaped by hu ..."
Abstract
-
Cited by 58 (9 self)
- Add to MetaCart
(Show Context)
As learning agents move from research labs to the real world, it is increasingly important that human users, including those without programming skills, be able to teach agents desired behaviors. Recently, the tamer framework was introduced for designing agents that can be interactively shaped by human trainers who give only positive and negative feedback signals. Past work on tamer showed that shaping can greatly reduce the sample complexity required to learn a good policy, can enable lay users to teach agents the behaviors they desire, and can allow agents to learn within a Markov Decision Process (MDP) in the absence of a coded reward function. However, tamer does not allow this human training to be combined with autonomous learning based on such a coded reward function. This paper
An Effective Personal Mobile Robot Agent Through Symbiotic Human-Robot Interaction
"... Several researchers, present authors included, envision personal mobile robot agents that can assist humans in their daily tasks. Despite many advances in robotics, such mobile robot agents still face many limitations in their perception, cognition, and action capabilities. In this work, we propose ..."
Abstract
-
Cited by 56 (25 self)
- Add to MetaCart
(Show Context)
Several researchers, present authors included, envision personal mobile robot agents that can assist humans in their daily tasks. Despite many advances in robotics, such mobile robot agents still face many limitations in their perception, cognition, and action capabilities. In this work, we propose a symbiotic interaction between robot agents and humans to overcome the robot limitations while allowing robots to also help humans. We introduce a visitor’s companion robot agent, as a natural task for such symbiotic interaction. The visitor lacks knowledge of the environment but can easily open a door or read a door label, while the mobile robot with no arms cannot open a door and may be confused about its exact location, but can plan paths well through the building and can provide useful relevant information to the visitor. We present this visitor companion task in detail with an enumeration and formalization of the actions of the robot agent in its interaction with the human. We briefly describe the wifi-based robot localization algorithm and show results of the different levels of human help to the robot during its navigation. We then test the value of robot help to the visitor during the task to understand the relationship tradeoffs. Our work has been fully implemented in a mobile robot agent, CoBot, which has successfully navigated for several hours and continues to navigate in our indoor environment.
R-IAC: Robust Intrinsically Motivated Exploration and Active Learning
- IEEE Transactions on Autonomous Mental Development
, 2009
"... Abstract—Intelligent adaptive curiosity (IAC) was initially introduced as a developmental mechanism allowing a robot to self-organize developmental trajectories of increasing complexity without preprogramming the particular developmental stages. In this paper, we argue that IAC and other intrinsical ..."
Abstract
-
Cited by 47 (12 self)
- Add to MetaCart
Abstract—Intelligent adaptive curiosity (IAC) was initially introduced as a developmental mechanism allowing a robot to self-organize developmental trajectories of increasing complexity without preprogramming the particular developmental stages. In this paper, we argue that IAC and other intrinsically motivated learning heuristics could be viewed as active learning algorithms that are particularly suited for learning forward models in un-prepared sensorimotor spaces with large unlearnable subspaces. Then, we introduce a novel formulation of IAC, called robust intelligent adaptive curiosity (R-IAC), and show that its perfor-mances as an intrinsically motivated active learning algorithm are far superior to IAC in a complex sensorimotor space where only a small subspace is neither unlearnable nor trivial. We also show results in which the learnt forward model is reused in a control scheme. Finally, an open source accompanying software containing these algorithms as well as tools to reproduce all the experiments presented in this paper is made publicly available. Index Terms—Active learning, artificial curiosity, developmental robotics, exploration, intrinsic motivation, sensorimotor learning.
Efficient reduction for imitation learning
- In AISTATS
, 2010
"... Imitation Learning, while applied successfully on many large real-world problems, is typically addressed as a standard supervised learning problem, where it is assumed the training and testing data are i.i.d.. This is not true in imitation learning as the learned policy influences the future test in ..."
Abstract
-
Cited by 44 (8 self)
- Add to MetaCart
Imitation Learning, while applied successfully on many large real-world problems, is typically addressed as a standard supervised learning problem, where it is assumed the training and testing data are i.i.d.. This is not true in imitation learning as the learned policy influences the future test inputs (states) upon which it will be tested. We show that this leads to compounding errors and a regret bound that grows quadratically in the time horizon of the task. We propose two alternative algorithms for imitation learning where training occurs over several episodes of interaction. These two approaches share in common that the learner’s policy is slowly modified from executing the expert’s policy to the learned policy. We show that this leads to stronger performance guarantees and demonstrate the improved performance on two challenging problems: training a learner to play 1) a 3D racing game (Super Tux Kart) and 2) Mario Bros.; given input images from the games and corresponding actions taken by a human expert and near-optimal planner respectively. 1
Robot Learning from Demonstration by Constructing Skill Trees
"... We describe CST, an online algorithm for constructing skill trees from demonstration trajectories. CST segments a demonstration trajectory into a chain of component skills, where each skill has a goal and is assigned a suitable abstraction from an abstraction library. These properties permit skills ..."
Abstract
-
Cited by 34 (5 self)
- Add to MetaCart
We describe CST, an online algorithm for constructing skill trees from demonstration trajectories. CST segments a demonstration trajectory into a chain of component skills, where each skill has a goal and is assigned a suitable abstraction from an abstraction library. These properties permit skills to be improved efficiently using a policy learning algorithm. Chains from multiple demonstration trajectories are merged into a skill tree. We show that CST can be used to acquire skills from human demonstration in a dynamic continuous domain, and from both expert demonstration and learned control sequences on the uBot-5 mobile manipulator. 1 1
An Overview of 3D Object Grasp Synthesis Algorithms
, 2011
"... This overview presents computational algorithms for generating 3D object grasps with autonomous multi-fingered robotic hands. Robotic grasping has been an active research subject for decades, and a great deal of effort has been spent on grasp synthesis algorithms. Existing papers focus on reviewing ..."
Abstract
-
Cited by 31 (3 self)
- Add to MetaCart
This overview presents computational algorithms for generating 3D object grasps with autonomous multi-fingered robotic hands. Robotic grasping has been an active research subject for decades, and a great deal of effort has been spent on grasp synthesis algorithms. Existing papers focus on reviewing the mechanics of grasping and the finger-object contact interactions [7] or robot hand design and their control [1]. Robot grasp synthesis algorithms have been reviewed in [63], but since then an important progress has been made toward applying learning techniques to the grasping problem. This overview focuses on analytical as well as empirical grasp synthesis approaches.
Constructing skill trees for reinforcement learning agents from demonstration trajectories
- In Advances in Neural Information Processing Systems (NIPS
, 2010
"... We introduce CST, an algorithm for constructing skill trees from demonstration trajectories in continuous reinforcement learning domains. CST uses a changepoint detection method to segment each trajectory into a skill chain by detecting a change of appropriate abstraction, or that a segment is too c ..."
Abstract
-
Cited by 30 (7 self)
- Add to MetaCart
(Show Context)
We introduce CST, an algorithm for constructing skill trees from demonstration trajectories in continuous reinforcement learning domains. CST uses a changepoint detection method to segment each trajectory into a skill chain by detecting a change of appropriate abstraction, or that a segment is too complex to model as a single skill. The skill chains from each trajectory are then merged to form a skill tree. We demonstrate that CST constructs an appropriate skill tree that can be further refined through learning in a challenging continuous domain, and that it can be used to segment demonstration trajectories on a mobile manipulator into chains of skills where each skill is assigned an appropriate abstraction. 1
Learning to select and generalize striking movements in robot table tennis
- In Proceedings of the AAAI 2012 Fall Symposium on robots that Learn Interactively from Human Teachers
, 2012
"... Learning new motor tasks from physical interactions is an important goal for both robotics and machine learning. However, when moving beyond basic skills, most monolithic machine learning approaches fail to scale. For more complex skills, methods that are tailored for the domain of skill learning ar ..."
Abstract
-
Cited by 28 (12 self)
- Add to MetaCart
(Show Context)
Learning new motor tasks from physical interactions is an important goal for both robotics and machine learning. However, when moving beyond basic skills, most monolithic machine learning approaches fail to scale. For more complex skills, methods that are tailored for the domain of skill learning are needed. In this paper, we take the task of learning table tennis as an example and present a new framework that allows a robot to learn cooperative table tennis from physical interaction with a human. The robot first learns a set of elementary table tennis hitting movements from a human table tennis teacher by kinesthetic teach-in, which is compiled into a set of motor primitives represented by dynamical systems. The robot subsequently generalizes these movements to a wider range of situations using our mixture of motor primitives approach. The resulting policy enables the robot to select appropriate motor primitives as well as to generalize between them. Finally, the robot plays with a human table tennis partner and learns online to improve its behavior. We show that the resulting setup is capable of playing table tennis using an anthropomorphic robot arm. 1
Learning Trajectory Preferences for Manipulators via Iterative Improvement
"... We consider the problem of learning good trajectories for manipulation tasks. This is challenging because the criterion defining a good trajectory varies with users, tasks and environments. In this paper, we propose a co-active online learning framework for teaching robots the preferences of its use ..."
Abstract
-
Cited by 25 (7 self)
- Add to MetaCart
(Show Context)
We consider the problem of learning good trajectories for manipulation tasks. This is challenging because the criterion defining a good trajectory varies with users, tasks and environments. In this paper, we propose a co-active online learning framework for teaching robots the preferences of its users for object manipulation tasks. The key novelty of our approach lies in the type of feedback expected from the user: the human user does not need to demonstrate optimal trajectories as training data, but merely needs to iteratively provide trajectories that slightly improve over the trajectory currently proposed by the system. We argue that this co-active preference feedback can be more easily elicited from the user than demonstrations of optimal trajectories, while, nevertheless, theoretical regret bounds of our algorithm match the asymptotic rates of optimal trajectory algorithms. We demonstrate the generalization ability of our algorithm on a variety of tasks, for whom, the preferences were not only influenced by the object being manipulated but also by the surrounding environment. 1.
Designing robot learners that ask good questions
- In Proc. of the Intl. Conf. on Human-Robot Interaction (HRI
, 2012
"... Programming new skills on a robot should take minimal time and effort. One approach to achieve this goal is to allow the robot to ask questions. This idea, called Active Learning, has recently caught a lot of attention in the robotics commu-nity. However, it has not been explored from a human-robot ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
(Show Context)
Programming new skills on a robot should take minimal time and effort. One approach to achieve this goal is to allow the robot to ask questions. This idea, called Active Learning, has recently caught a lot of attention in the robotics commu-nity. However, it has not been explored from a human-robot interaction perspective. In this paper, we identify three types of questions (label, demonstration and feature queries) and discuss how a robot can use these while learning new skills. Then, we present an experiment on human question asking which characterizes the extent to which humans use these question types. Finally, we evaluate the three question types within a human-robot teaching interaction. We inves-tigate the ease with which different types of questions are answered and whether or not there is a general preference of one type of question over another. Based on our findings from both experiments we provide guidelines for designing question asking behaviors on a robot learner.