Results 1  10
of
20
Transfer in variablereward hierarchical reinforcement learning
 In: Proc. of the Inductive Transfer workshop at NIPS
, 2005
"... We consider the problem of transferring learned knowledge among Markov Decision Processes that share the same transition dynamics but different reward functions. In particular, we assume that reward functions are described as linear combinations of reward features, and that only the feature weights ..."
Abstract

Cited by 33 (3 self)
 Add to MetaCart
(Show Context)
We consider the problem of transferring learned knowledge among Markov Decision Processes that share the same transition dynamics but different reward functions. In particular, we assume that reward functions are described as linear combinations of reward features, and that only the feature weights vary among MDPs. We introduce VariableReward Hierarchical Reinforcement Learning (VRHRL), which leverages a cache of learned policies to speed up learning in this setting. With suitable design of the task hierarchy, VRHRL can achieve better transfer than its nonhierarchical counterpart. 1
A Survey of MultiObjective Sequential DecisionMaking
"... Sequential decisionmaking problems with multiple objectives arise naturally in practice and pose unique challenges for research in decisiontheoretic planning and learning, which has largely focused on singleobjective settings. This article surveys algorithms designed for sequential decisionmakin ..."
Abstract

Cited by 16 (5 self)
 Add to MetaCart
(Show Context)
Sequential decisionmaking problems with multiple objectives arise naturally in practice and pose unique challenges for research in decisiontheoretic planning and learning, which has largely focused on singleobjective settings. This article surveys algorithms designed for sequential decisionmaking problems with multiple objectives. Though there is a growing body of literature on this subject, little of it makes explicit under what circumstances special methods are needed to solve multiobjective problems. Therefore, we identify three distinct scenarios in which converting such a problem to a singleobjective one is impossible, infeasible, or undesirable. Furthermore, we propose a taxonomy that classifies multiobjective methods according to the applicable scenario, the nature of the scalarization function (which projects multiobjective values to scalar ones), and the type of policies considered. We show how these factors determine the nature of an optimal solution, which can be a single policy, a convex hull, or a Pareto front. Using this taxonomy, we survey the literature on multiobjective methods for planning and learning. Finally, we discuss key applications of such methods and outline opportunities for future work. 1.
Linear fittedq iteration with multiple reward functions
 Journal of Machine Learning Research
"... We present a general and detailed development of an algorithm for finitehorizon fittedQ iteration with an arbitrary number of reward signals and linear value function approximation using an arbitrary number of state features. This includes a detailed treatment of the 3reward function case using t ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
We present a general and detailed development of an algorithm for finitehorizon fittedQ iteration with an arbitrary number of reward signals and linear value function approximation using an arbitrary number of state features. This includes a detailed treatment of the 3reward function case using triangulation primitives from computational geometry and a method for identifying globally dominated actions. We also present an example of how our methods can be used to construct a realworld decision aid by considering symptom reduction, weight gain, and quality of life in sequential treatments for schizophrenia. Finally, we discuss future directions in which to take this work that will further enable our methods to make a positive impact on the field of evidencebased clinical decision support.
MultiAgent Inverse Reinforcement Learning
"... Learning the reward function of an agent by observing its behavior is termed inverse reinforcement learning and has applications in learning from demonstration or apprenticeship learning. We introduce the problem of multiagent inverse reinforcement learning, where reward functions of multiple agents ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
(Show Context)
Learning the reward function of an agent by observing its behavior is termed inverse reinforcement learning and has applications in learning from demonstration or apprenticeship learning. We introduce the problem of multiagent inverse reinforcement learning, where reward functions of multiple agents are learned by observing their uncoordinated behavior. A centralized controller then learns to coordinate their behavior by optimizing a weighted sum of reward functions of all the agents. We evaluate our approach on a trafficrouting domain, in which a controller coordinates actions of multiple traffic signals to regulate traffic density. We show that the learner is not only able to match but even significantly outperform the expert. I.
Hypervolume indicator and dominance reward based multiobjective montecarlo tree search
 Machine Learning
, 2013
"... Concerned with multiobjective reinforcement learning (MORL), this paper presents MOMCTS, an extension of MonteCarlo Tree Search to multiobjective sequential decision making, embedding two decision rules respectively based on the hypervolume indicator and the Pareto dominance reward. The MOMCTS a ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
Concerned with multiobjective reinforcement learning (MORL), this paper presents MOMCTS, an extension of MonteCarlo Tree Search to multiobjective sequential decision making, embedding two decision rules respectively based on the hypervolume indicator and the Pareto dominance reward. The MOMCTS approaches are firstly compared with the MORL state of the art on two artificial problems, the twoobjective Deep Sea Treasure problem and the threeobjective Resource Gathering problem. The scalability of MOMCTS is also examined in the context of the NPhard grid scheduling problem, showing that the MOMCTS performance matches the (nonRL based) state of the art albeit with a higher computational cost.
Multiobjective MonteCarlo Tree Search
 in "Asian Conferenc on Machine Learning (ACML 2012
"... Concerned with multiobjective reinforcement learning (MORL), this paper presents MOMCTS, an extension of MonteCarlo Tree Search to multiobjective sequential decision making. The known multiobjective indicator referred to as hypervolume indicator is used to define an action selection criterion, ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Concerned with multiobjective reinforcement learning (MORL), this paper presents MOMCTS, an extension of MonteCarlo Tree Search to multiobjective sequential decision making. The known multiobjective indicator referred to as hypervolume indicator is used to define an action selection criterion, replacing the UCB criterion in order to deal with multidimensional rewards. MOMCTS is firstly compared with an existing MORL algorithm on the artificial Deep Sea Treasure problem. Then a scalability study of MOMCTS is made on the NPhard problem of grid scheduling, showing that the performance of MOMCTS matches the non RLbased state of the art albeit with a higher computational cost.
Asian Conference on Machine Learning Multiobjective MonteCarlo Tree Search
, 2012
"... Concerned with multiobjective reinforcement learning (MORL), this paper presents MOMCTS, an extension of MonteCarlo Tree Search to multiobjective sequential decision making. The known multiobjective indicator referred to as hypervolume indicator is used to define an action selection criterion, ..."
Abstract
 Add to MetaCart
(Show Context)
Concerned with multiobjective reinforcement learning (MORL), this paper presents MOMCTS, an extension of MonteCarlo Tree Search to multiobjective sequential decision making. The known multiobjective indicator referred to as hypervolume indicator is used to define an action selection criterion, replacing the UCB criterion in order to deal with multidimensional rewards. MOMCTS is firstly compared with an existing MORL algorithm on the artificial Deep Sea Treasure problem. Then a scalability study of MOMCTS is made on the NPhard problem of grid scheduling, showing that the performance of MOMCTS matches the non RLbased state of the art albeit with a higher computational cost.
Dynamic priority broadcasting channels: a multiobjective planning problem
"... This article explores the broadcasting channel problem, focusing on dynamic priority planning. We will show that using the Paretooptimal set of solutions significantly reduces the set size and computation time, but that it is still infeasible. We make suggestions for a method for reducing computati ..."
Abstract
 Add to MetaCart
(Show Context)
This article explores the broadcasting channel problem, focusing on dynamic priority planning. We will show that using the Paretooptimal set of solutions significantly reduces the set size and computation time, but that it is still infeasible. We make suggestions for a method for reducing computation time of generating the Paretooptimal set and show that pruning is necessary.
Using Reinforcement Learning for MultiPolicy Optimization in Decentralized Autonomic Systems An Experimental Evaluation
"... Abstract. Largescale autonomic systems are required to selfoptimize with respect to highlevel policies, that can differ in terms of their priority, as well as their spatial and temporal scope. Decentralized multiagent systems represent one approach to implementing the required selfoptimization ca ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. Largescale autonomic systems are required to selfoptimize with respect to highlevel policies, that can differ in terms of their priority, as well as their spatial and temporal scope. Decentralized multiagent systems represent one approach to implementing the required selfoptimization capabilities. However, the presence of multiple heterogeneous policies leads to heterogeneity of the agents that implement them. In this paper we evaluate the use of Reinforcement Learning techniques to support the selfoptimization of heterogeneous agents towards multiple policies in decentralized systems. We evaluate these techniques in an Urban Traffic Control simulation and compare two approaches to supporting multiple policies. Our results suggest that approaches based on Wlearning, which learn separately for each policy and then select between nominated actions based on current action importance, perform better than combining policies into a single learning process over a single state space. The results also indicate that explicitly supporting multiple policies simultaneously can improve waiting times over policies dedicated to optimizing for a single vehicle type. 1 Autonomic Systems Autonomic computing systems are systems that selfmanage and selfadapt to varying circumstances without human intervention [8]. The need for autonomic capabilities arises due to the increasingly large scale, decentralization and complexity of computing systems rendering the traditional manual, centralized and hierarchical approaches to system management infeasible [21]. Autonomic systems are only given highlevel objectives while the details of how to meet those c○SpringerVerlag, 2009. This is the author’s version of the work. The original publication is available at www.springerlink.com. It is posted here by permission of SpringerVerlag for your personal use. Not for redistribution. The definitive
HumanRobot Collaborative Learning of a Bag Shaking Trajectory
"... Abstract—This paper presents a collaborative reinforcement learning algorithm,)(λCQ, designed to accelerate learning by integrating a human operator into the learning process. The)(λCQlearning algorithm enables collaboration of knowledge between the robot and a human; the human, responsible for rem ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract—This paper presents a collaborative reinforcement learning algorithm,)(λCQ, designed to accelerate learning by integrating a human operator into the learning process. The)(λCQlearning algorithm enables collaboration of knowledge between the robot and a human; the human, responsible for remotely monitoring the robot, suggests solutions when intervention is required. Based on the robot’s learning performance, it switches between fully autonomous operation, and the integration of human commands. The)(λCQlearning algorithm was tested on a Motoman UP6 fixedarm robot required to empty the contents of a suspicious bag. We demonstrate experimental results that support our hypothesis of evaluating whether learning is faster while human collaboration is triggered than when the system functions autonomously. Index Terms—robot learning, reinforcement learning, humanrobot collaboration. I.