Results 1  10
of
14
Combining Multiple Correlated Reward and Shaping Signals by Measuring Confidence
"... Multiobjective problems with correlated objectives are a class of problems that deserve specific attention. In contrast to typical multiobjective problems, they do not require the identification of tradeoffs between the objectives, as (near) optimal solutions for any objective are (near) optim ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
(Show Context)
Multiobjective problems with correlated objectives are a class of problems that deserve specific attention. In contrast to typical multiobjective problems, they do not require the identification of tradeoffs between the objectives, as (near) optimal solutions for any objective are (near) optimal for every objective. Intelligently combining the feedback from these objectives, instead of only looking at a single one, can improve optimization. This class of problems is very relevant in reinforcement learning, as any singleobjective reinforcement learning problem can be framed as such a multiobjective problem using multiple reward shaping functions. After discussing this problem class, we propose a solution technique for such reinforcement learning problems, called adaptive objective selection. This technique makes a temporal difference learner estimate the Qfunction for each objective in parallel, and introduces a way of measuring confidence in these estimates. This confidence metric is then used to choose which objective’s estimates to use for action selection. We show significant improvements in performance over other plausible techniques on two problem domains. Finally, we provide an intuitive analysis of the technique’s decisions, yielding insights into the nature of the problems being solved.
Linear Support for MultiObjective Coordination Graphs
"... Many realworld decision problems require making tradeoffs among multiple objectives. However, in some cases, the relative importance of these objectives is not known when the problem is solved, precluding the use of singleobjective methods. Instead, multiobjective methods, which compute the set ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
Many realworld decision problems require making tradeoffs among multiple objectives. However, in some cases, the relative importance of these objectives is not known when the problem is solved, precluding the use of singleobjective methods. Instead, multiobjective methods, which compute the set of all potentially useful solutions, are required. This paper proposes variable elimination linear support (VELS), a new multiobjective algorithm for multiagent coordination that exploits loose couplings to compute the convex coverage set (CCS): the set of optimal solutions for all possible weights for linearly weighted objectives. Unlike existing methods, VELS exploits insights from POMDP solution methods to build the CCS incrementally. We prove the correctness of VELS and show that for moderate numbers of objectives its complexity is better than that of previous methods. Furthermore, we present empirical results showing that VELS can tackle both random and realistic problems with many more agents than was previously feasible. The incremental nature of VELS also makes it an anytime algorithm, i.e., its intermediate results constitute εoptimal approximations of the CCS, with ε decreasing the longer it runs. Our empirical results show that, by allowing even very small ε, VELS can enable large additional speedups.
Bounded approximations for linear multiobjective planning under uncertainty
 In ICAPS
, 2014
"... Planning under uncertainty poses a complex problem in which multiple objectives often need to be balanced. When dealing with multiple objectives, it is often assumed that the relative importance of the objectives is known a priori. However, in practice human decision makers often find it hard to sp ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
Planning under uncertainty poses a complex problem in which multiple objectives often need to be balanced. When dealing with multiple objectives, it is often assumed that the relative importance of the objectives is known a priori. However, in practice human decision makers often find it hard to specify such preferences, and would prefer a decision support system that presents a range of possible alternatives. We propose two algorithms for computing these alternatives for the case of linearly weighted objectives. First, we propose an anytime method, approximate optimistic linear support (AOLS), that incrementally builds up a complete set of optimal plans, exploiting the piecewiselinear and convex shape of the value function. Second, we propose an approximate anytime method, scalarised sample incremental improvement (SSII), that employs weight sampling to focus on the most interesting regions in weight space, as suggested by a prior over preferences. We show empirically that our methods are able to produce (near)optimal alternative sets orders of magnitude faster than existing techniques. 1
Accelerating Learning in MultiObjective Systems through Transfer Learning
 in In a Special Session on Learning and Optimization in MultiCriteria Dynamic and Uncertain Environments at the International Joint Conference on Neural Network 2014 (IEEE IJCNN 2014. IEEE
, 2014
"... Abstract — Largescale, multiagent systems are too complex for optimal control strategies to be known at design time and as a result good strategies must be learned at runtime. Learning in such systems, particularly those with multiple objectives, takes a considerable amount of time because of the ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
Abstract — Largescale, multiagent systems are too complex for optimal control strategies to be known at design time and as a result good strategies must be learned at runtime. Learning in such systems, particularly those with multiple objectives, takes a considerable amount of time because of the size of the environment and dependencies between goals. Transfer Learning (TL) has been shown to reduce learning time in singleagent, singleobjective applications. It is the process of sharing knowledge between two learning tasks called the source and target. The source is required to have been completed prior to the target task. This work proposes extending TL to multiagent, multiobjective applications. To achieve this, an online version of TL called Parallel Transfer Learning (PTL) is presented. The issues involved in extending this algorithm to a multiobjective form are discussed. The effectiveness of this approach is evaluated in a smart grid scenario. When using PTL in this scenario learning is significantly accelerated. PTL achieves comparable performance to the base line in one third of the time. I.
ModelBased MultiObjective Reinforcement Learning
"... Abstract—This paper describes a novel multiobjective reinforcement learning algorithm. The proposed algorithm first learns a model of the multiobjective sequential decision making problem, after which this learned model is used by a multiobjective dynamic programming method to compute Pareto op ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract—This paper describes a novel multiobjective reinforcement learning algorithm. The proposed algorithm first learns a model of the multiobjective sequential decision making problem, after which this learned model is used by a multiobjective dynamic programming method to compute Pareto optimal policies. The advantage of this modelbased multiobjective reinforcement learning method is that once an accurate model has been estimated from the experiences of an agent in some environment, the dynamic programming method will compute all Pareto optimal policies. Therefore it is important that the agent explores the environment in an intelligent way by using a good exploration strategy. In this paper we have supplied the agent with two different exploration strategies and compare their effectiveness in estimating accurate models within a reasonable amount of time. The experimental results show that our method with the best exploration strategy is able to quickly learn all Pareto optimal policies for the Deep Sea Treasure problem. I.
SelfOrganising Algorithms for Residential Demand Response
"... Abstract—Residential Demand Response has shown promising results in smart grid applications. It can be achieved manually or autonomously. The variety of algorithms applied to achieve autonomous Demand Response have lacked a common baseline. The selection of testing environment is typically skewed by ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract—Residential Demand Response has shown promising results in smart grid applications. It can be achieved manually or autonomously. The variety of algorithms applied to achieve autonomous Demand Response have lacked a common baseline. The selection of testing environment is typically skewed by the desire to find one in which a particular algorithm performs well. This work evaluates several algorithms in a common evaluation environment. Which has been designed to encompass the range of conditions in a typical Residential Demand Response application. The environments described exhibit the following characteristics: nonstationary, dynamic, multiactor, multiobjective. These characteristics will then be used to provide heuristics for algorithm selection. The algorithms used were selected to cover the spectrum of possible approaches to Demand Response. Some are centralised, others distributed. There are collaborative approaches and noncollaborative ones. Some are learning based, others require no training. This work provides criteria for which particular algorithms should be applied to a given application. I.
On Fairness in DecisionMaking under Uncertainty: Definitions, Computation, and Comparison
"... The utilitarian solution criterion, which has been extensively studied in multiagent decision making under uncertainty, aims to maximize the sum of individual utilities. However, as the utilitarian solution often discriminates against some agents, it is not desirable for many practical applicat ..."
Abstract
 Add to MetaCart
(Show Context)
The utilitarian solution criterion, which has been extensively studied in multiagent decision making under uncertainty, aims to maximize the sum of individual utilities. However, as the utilitarian solution often discriminates against some agents, it is not desirable for many practical applications where agents have their own interests and fairness is expected. To address this issue, this paper introduces egalitarian solution criteria for sequential decisionmaking under uncertainty, which are based on the maximin principle. Motivated by different application domains, we propose four maximin fairness criteria and develop corresponding algorithms for computing their optimal policies. Furthermore, we analyze the connections between these criteria and discuss and compare their characteristics.
Fairness in MultiAgent Sequential DecisionMaking
"... We define a fairness solution criterion for multiagent decisionmaking problems, where agents have local interests. This new criterion aims to maximize the worst performance of agents with a consideration on the overall performance. We develop a simple linear programming approach and a more scalab ..."
Abstract
 Add to MetaCart
(Show Context)
We define a fairness solution criterion for multiagent decisionmaking problems, where agents have local interests. This new criterion aims to maximize the worst performance of agents with a consideration on the overall performance. We develop a simple linear programming approach and a more scalable gametheoretic approach for computing an optimal fairness policy. This gametheoretic approach formulates this fairness optimization as a twoplayer zerosum game and employs an iterative algorithm for finding a Nash equilibrium, corresponding to an optimal fairness policy. We scale up this approach by exploiting problem structure and value function approximation. Our experiments on resource allocation problems show that this fairness criterion provides a more favorable solution than the utilitarian criterion, and that our gametheoretic approach is significantly faster than linear programming.
MultiObjective MDPs with Conditional Lexicographic Reward Preferences
"... Sequential decision problems that involve multiple objectives are prevalent. Consider for example a driver of a semiautonomous car who may want to optimize competing objectives such as travel time and the effort associated with manual driving. We introduce a rich model called Lexicographic MDP ( ..."
Abstract
 Add to MetaCart
(Show Context)
Sequential decision problems that involve multiple objectives are prevalent. Consider for example a driver of a semiautonomous car who may want to optimize competing objectives such as travel time and the effort associated with manual driving. We introduce a rich model called Lexicographic MDP (LMDP) and a corresponding planning algorithm called LVI that generalize previous work by allowing for conditional lexicographic preferences with slack. We analyze the convergence characteristics of LVI and establish its game theoretic properties. The performance of LVI in practice is tested within a realistic benchmark problem in the domain of semiautonomous driving. Finally, we demonstrate how GPUbased optimization can improve the scalability of LVI and other value iteration algorithms for MDPs. 1