Results 1  10
of
23
Collaborative Multiagent Reinforcement Learning by Payoff Propagation
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... In this article we describe a set of scalable techniques for learning the behavior of a group of agents in a collaborative multiagent setting. As a basis we use the framework of coordination graphs of Guestrin, Koller, and Parr (2002a) which exploits the dependencies between agents to decompose t ..."
Abstract

Cited by 32 (2 self)
 Add to MetaCart
In this article we describe a set of scalable techniques for learning the behavior of a group of agents in a collaborative multiagent setting. As a basis we use the framework of coordination graphs of Guestrin, Koller, and Parr (2002a) which exploits the dependencies between agents to decompose the global payoff function into a sum of local terms. First, we deal with the singlestate case and describe a payoff propagation algorithm that computes the individual actions that approximately maximize the global payoff function. The method can be viewed as the decisionmaking analogue of belief propagation in Bayesian networks. Second, we focus on learning the behavior of the agents in sequential decisionmaking tasks. We introduce different modelfree reinforcementlearning techniques, unitedly called Sparse Cooperative Qlearning, which approximate the global actionvalue function based on the topology of a coordination graph, and perform updates using the contribution of the individual agents to the maximal global action value. The combined use of an edgebased decomposition of the actionvalue function and the payoff propagation algorithm for efficient action selection, result in an approach that scales only linearly in the problem size. We provide experimental evidence that our method outperforms related multiagent reinforcementlearning methods based on temporal differences.
Sparse Cooperative Qlearning
 Proceedings of the International Conference on Machine Learning
, 2004
"... Learning in multiagent systems suffers from the fact that both the state and the action space scale exponentially with the number of agents. In this paper we are interested in using Qlearning to learn the coordinated actions of a group of cooperative agents, using a sparse representation of the joi ..."
Abstract

Cited by 20 (6 self)
 Add to MetaCart
Learning in multiagent systems suffers from the fact that both the state and the action space scale exponentially with the number of agents. In this paper we are interested in using Qlearning to learn the coordinated actions of a group of cooperative agents, using a sparse representation of the joint stateaction space of the agents. We first examine a compact representation in which the agents need to explicitly coordinate their actions only in a predefined set of states. Next, we use a coordinationgraph approach in which we represent the Qvalues by value rules that specify the coordination dependencies of the agents at particular states. We show how Qlearning can be efficiently applied to learn a coordinated policy for the agents in the above framework. We demonstrate the proposed method on the predatorprey domain, and we compare it with other related multiagent Qlearning methods.
Utile coordination: Learning interdependencies among cooperative agents
 In Proceedings of the IEEE Symposium on Computational Intelligence and Games (CIG’05
, 2005
"... a multiagent system to learn where and how to coordinate. The method starts with uncoordinated learners and maintains statistics on expected returns. Coordination dependencies are dynamically added if the statistics indicate a statistically significant benefit. This results in a compact state repres ..."
Abstract

Cited by 19 (0 self)
 Add to MetaCart
a multiagent system to learn where and how to coordinate. The method starts with uncoordinated learners and maintains statistics on expected returns. Coordination dependencies are dynamically added if the statistics indicate a statistically significant benefit. This results in a compact state representation because only necessary coordination is modeled. We apply our method within the framework of coordination graphs in which value rules represent the coordination dependencies between the agents for a specific context. The algorithm is first applied on a small illustrative problem, and next on a large predatorprey problem in which two predators have to capture a single prey. 1
Solving Factored MDPs with Hybrid State and Action Variables
 JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 2006
"... Efficient representations and solutions for large decision problems with continuous and discrete variables are among the most important challenges faced by the designers of automated decision support systems. In this paper, we describe a novel hybrid factored Markov decision process (MDP) model t ..."
Abstract

Cited by 17 (4 self)
 Add to MetaCart
Efficient representations and solutions for large decision problems with continuous and discrete variables are among the most important challenges faced by the designers of automated decision support systems. In this paper, we describe a novel hybrid factored Markov decision process (MDP) model that allows for a compact representation of these problems, and a new hybrid approximate linear programming (HALP) framework that permits their efficient solutions. The central idea of HALP is to approximate the optimal value function by a linear combination of basis functions and optimize its weights by linear programming.
A CostShaping Linear Program for AverageCost Approximate Dynamic Programming with Performance Guarantees
, 2006
"... ..."
Symmetric PrimalDual Approximate Linear Programming for Factored MDPs
, 2006
"... A weakness of classical Markov decision processes is that they scale very poorly due to the flat statespace representation. Factored MDPs address this representational problem by exploiting problem structure to specify the transition and reward functions of an MDP in a compact manner. However, in g ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
A weakness of classical Markov decision processes is that they scale very poorly due to the flat statespace representation. Factored MDPs address this representational problem by exploiting problem structure to specify the transition and reward functions of an MDP in a compact manner. However, in general, solutions to factored MDPs do not retain the structure and compactness of the problem representation, forcing approximate solutions, with approximate linear programming (ALP) emerging as a very promising MDPapproximation technique. To date, most ALP work has focused on the primalLP formulation, while the dual LP, which forms the basis for solving constrained Markov problems, has received much less attention. We show that a straightforward linear approximation of the dual optimization variables is problematic, because some of the required computations cannot be carried out efficiently. Nonetheless, we develop a composite approach that symmetrically approximates the primal and dual optimization variables (effectively approximating both the objective function and the feasible region of the LP) that is computationally feasible and suitable for solving constrained MDPs. We empirically show that this new ALP formulation also performs well on unconstrained problems.
Resource allocation among agents with MDPinduced preferences
 Journal of Artificial Intelligence Research
"... Allocating scarce resources among agents to maximize global utility is, in general, computationally challenging. We focus on problems where resources enable agents to execute actions in stochastic environments, modeled as Markov decision processes (MDPs), such that the value of a resource bundle is ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
Allocating scarce resources among agents to maximize global utility is, in general, computationally challenging. We focus on problems where resources enable agents to execute actions in stochastic environments, modeled as Markov decision processes (MDPs), such that the value of a resource bundle is defined as the expected value of the optimal MDP policy realizable given these resources. We present an algorithm that simultaneously solves the resourceallocation and the policyoptimization problems. This allows us to avoid explicitly representing utilities over exponentially many resource bundles, leading to drastic (often exponential) reductions in computational complexity. We then use this algorithm in the context of selfinterested agents to design a combinatorial auction for allocating resources. We empirically demonstrate the effectiveness of our approach by showing that it can, in minutes, optimally solve problems for which a straightforward combinatorial resourceallocation technique would require the agents to enumerate up to 2 100 resource bundles and the auctioneer to solve an NPcomplete problem with an input of that size. 1.
MultiRobot Coordination and Competition Using Mixed Integer and Linear Programs
, 2004
"... I would like to dedicate this work to my family. May you remain close even though you are far. Markov decision processes (MDPs) and partially observable Markov decision processes (POMDPs) are preferred methods representing complex uncertain dynamic systems and determining an optimal control policy ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
I would like to dedicate this work to my family. May you remain close even though you are far. Markov decision processes (MDPs) and partially observable Markov decision processes (POMDPs) are preferred methods representing complex uncertain dynamic systems and determining an optimal control policy to manipulate the system in the desired manner. Until recently, controlling a system composed of multiple agents using the MDP methodology was impossible due to an exponential increase in the size of the MDP problem representation. In this thesis, a novel method for solving large multiagent MDP systems is presented which avoids this exponential size increase while still providing optimal policies for a large class of useful problems. This thesis provides the following main contributions: A novel description language for multiagent MDPs: We develop two different modeling techniques for representing multiagent MDP (MAMDP) coordination problems. The first phrases the problem using a linear program which avoids the exponential
Model transfer for markov decision tasks via parameter matching
 Workshop of the UK Planning and Scheduling Special Interest Group
, 2006
"... Model transfer refers to the process of adjusting a model that was previously identified for one task (source) so that it can be used for a new, related task (target). For decision tasks in unknown Markov environments, we profit through model transfer by using information from related tasks, e.g. tr ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
Model transfer refers to the process of adjusting a model that was previously identified for one task (source) so that it can be used for a new, related task (target). For decision tasks in unknown Markov environments, we profit through model transfer by using information from related tasks, e.g. transition knowledge and solution (policy) knowledge, to quickly determine an appropriate model of the new task environment. A difficulty with such transfer is typically the nonlinear and indirect relationship between the available source knowledge and the target’s working prior model of the unknown environment, provided through a complex multidimensional transfer function. In this paper, we take a Bayesian view and present a probability perturbation method that conditions the target’s model parameters to a variety of source knowledge types. The method relies on preposterior distributions, which specifies the distribution of the target’s parameter set given each individual knowledge types. The preposteriors are then combined to obtain a posterior distribution for the parameter set that matches all the available knowledge. The method is illustrated with an example.
Resource allocation among agents with preferences induced by factored mdps
 In Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS06
, 2006
"... Distributing scarce resources among agents in a way that maximizes the social welfare of the group is a computationally hard problem when the value of a resource bundle is not linearly decomposable. Furthermore, the problem of determining the value of a resource bundle can be a significant computati ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
Distributing scarce resources among agents in a way that maximizes the social welfare of the group is a computationally hard problem when the value of a resource bundle is not linearly decomposable. Furthermore, the problem of determining the value of a resource bundle can be a significant computational challenge in itself, such as for an agent operating in a stochastic environment, where the value of a resource bundle is the expected payoff of the optimal policy realizable given these resources. Recent work has shown that the structure in agents ’ preferences induced by stochastic policyoptimization problems (modeled as MDPs) can be exploited to solve the resourceallocation and the policyoptimization problems simultaneously, leading to drastic (often exponential) improvements in computational efficiency. However, previous work used a flat MDP model that scales very poorly. In this work, we present and empirically evaluate a resourceallocation mechanism that achieves much better scaling by using factored MDP models, thus exploiting both the structure in agents ’ MDPinduced preferences, as well as the structure within agents ’ MDPs.