Results 1 - 10
of
18
Collaborative Multiagent Reinforcement Learning by Payoff Propagation
- JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... In this article we describe a set of scalable techniques for learning the behavior of a group of agents in a collaborative multiagent setting. As a basis we use the framework of coordination graphs of Guestrin, Koller, and Parr (2002a) which exploits the dependencies between agents to decompose t ..."
Abstract
-
Cited by 21 (2 self)
- Add to MetaCart
In this article we describe a set of scalable techniques for learning the behavior of a group of agents in a collaborative multiagent setting. As a basis we use the framework of coordination graphs of Guestrin, Koller, and Parr (2002a) which exploits the dependencies between agents to decompose the global payoff function into a sum of local terms. First, we deal with the single-state case and describe a payoff propagation algorithm that computes the individual actions that approximately maximize the global payoff function. The method can be viewed as the decision-making analogue of belief propagation in Bayesian networks. Second, we focus on learning the behavior of the agents in sequential decision-making tasks. We introduce different model-free reinforcementlearning techniques, unitedly called Sparse Cooperative Q-learning, which approximate the global action-value function based on the topology of a coordination graph, and perform updates using the contribution of the individual agents to the maximal global action value. The combined use of an edge-based decomposition of the action-value function and the payoff propagation algorithm for efficient action selection, result in an approach that scales only linearly in the problem size. We provide experimental evidence that our method outperforms related multiagent reinforcement-learning methods based on temporal differences.
Sparse Cooperative Q-learning
- Proceedings of the International Conference on Machine Learning
, 2004
"... Learning in multiagent systems suffers from the fact that both the state and the action space scale exponentially with the number of agents. In this paper we are interested in using Q-learning to learn the coordinated actions of a group of cooperative agents, using a sparse representation of the joi ..."
Abstract
-
Cited by 15 (6 self)
- Add to MetaCart
Learning in multiagent systems suffers from the fact that both the state and the action space scale exponentially with the number of agents. In this paper we are interested in using Q-learning to learn the coordinated actions of a group of cooperative agents, using a sparse representation of the joint stateaction space of the agents. We first examine a compact representation in which the agents need to explicitly coordinate their actions only in a predefined set of states. Next, we use a coordination-graph approach in which we represent the Q-values by value rules that specify the coordination dependencies of the agents at particular states. We show how Q-learning can be efficiently applied to learn a coordinated policy for the agents in the above framework. We demonstrate the proposed method on the predator-prey domain, and we compare it with other related multiagent Q-learning methods.
Solving Factored MDPs with Hybrid State and Action Variables
- JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 2006
"... Efficient representations and solutions for large decision problems with continuous and discrete variables are among the most important challenges faced by the designers of automated decision support systems. In this paper, we describe a novel hybrid factored Markov decision process (MDP) model t ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
Efficient representations and solutions for large decision problems with continuous and discrete variables are among the most important challenges faced by the designers of automated decision support systems. In this paper, we describe a novel hybrid factored Markov decision process (MDP) model that allows for a compact representation of these problems, and a new hybrid approximate linear programming (HALP) framework that permits their efficient solutions. The central idea of HALP is to approximate the optimal value function by a linear combination of basis functions and optimize its weights by linear programming.
Utile coordination: Learning interdependencies among cooperative agents
- In Proceedings of the IEEE Symposium on Computational Intelligence and Games (CIG’05
, 2005
"... a multiagent system to learn where and how to coordinate. The method starts with uncoordinated learners and maintains statistics on expected returns. Coordination dependencies are dynamically added if the statistics indicate a statistically significant benefit. This results in a compact state repres ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
a multiagent system to learn where and how to coordinate. The method starts with uncoordinated learners and maintains statistics on expected returns. Coordination dependencies are dynamically added if the statistics indicate a statistically significant benefit. This results in a compact state representation because only necessary coordination is modeled. We apply our method within the framework of coordination graphs in which value rules represent the coordination dependencies between the agents for a specific context. The algorithm is first applied on a small illustrative problem, and next on a large predator-prey problem in which two predators have to capture a single prey. 1
A Cost-Shaping Linear Program for Average-Cost Approximate Dynamic Programming with Performance Guarantees
, 2006
"... ..."
Symmetric Primal-Dual Approximate Linear Programming for Factored MDPs
, 2006
"... A weakness of classical Markov decision processes is that they scale very poorly due to the flat state-space representation. Factored MDPs address this representational problem by exploiting problem structure to specify the transition and reward functions of an MDP in a compact manner. However, in g ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
A weakness of classical Markov decision processes is that they scale very poorly due to the flat state-space representation. Factored MDPs address this representational problem by exploiting problem structure to specify the transition and reward functions of an MDP in a compact manner. However, in general, solutions to factored MDPs do not retain the structure and compactness of the problem representation, forcing approximate solutions, with approximate linear programming (ALP) emerging as a very promising MDP-approximation technique. To date, most ALP work has focused on the primal-LP formulation, while the dual LP, which forms the basis for solving constrained Markov problems, has received much less attention. We show that a straightforward linear approximation of the dual optimization variables is problematic, because some of the required computations cannot be carried out efficiently. Nonetheless, we develop a composite approach that symmetrically approximates the primal and dual optimization variables (effectively approximating both the objective function and the feasible region of the LP) that is computationally feasible and suitable for solving constrained MDPs. We empirically show that this new ALP formulation also performs well on unconstrained problems.
Multi-Robot Coordination and Competition Using Mixed Integer and Linear Programs
, 2004
"... I would like to dedicate this work to my family. May you remain close even though you are far. Markov decision processes (MDPs) and partially observable Markov decision processes (POMDPs) are preferred methods representing complex uncertain dynamic systems and de-termining an optimal control policy ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
I would like to dedicate this work to my family. May you remain close even though you are far. Markov decision processes (MDPs) and partially observable Markov decision processes (POMDPs) are preferred methods representing complex uncertain dynamic systems and de-termining an optimal control policy to manipulate the system in the desired manner. Until recently, controlling a system composed of multiple agents using the MDP methodology was impossible due to an exponential increase in the size of the MDP problem representation. In this thesis, a novel method for solving large multi-agent MDP systems is presented which avoids this exponential size increase while still providing optimal policies for a large class of useful problems. This thesis provides the following main contributions: A novel description language for multi-agent MDPs: We develop two different mod-eling techniques for representing multi-agent MDP (MAMDP) coordination problems. The first phrases the problem using a linear program which avoids the exponential
Model transfer for markov decision tasks via parameter matching
- Workshop of the UK Planning and Scheduling Special Interest Group
, 2006
"... Model transfer refers to the process of adjusting a model that was previously identified for one task (source) so that it can be used for a new, related task (target). For decision tasks in unknown Markov environments, we profit through model transfer by using information from related tasks, e.g. tr ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Model transfer refers to the process of adjusting a model that was previously identified for one task (source) so that it can be used for a new, related task (target). For decision tasks in unknown Markov environments, we profit through model transfer by using information from related tasks, e.g. transition knowledge and solution (policy) knowledge, to quickly determine an appropriate model of the new task environment. A difficulty with such transfer is typically the non-linear and indirect relationship between the available source knowledge and the target’s working prior model of the unknown environment, provided through a complex multidimensional transfer function. In this paper, we take a Bayesian view and present a probability perturbation method that conditions the target’s model parameters to a variety of source knowledge types. The method relies on pre-posterior distributions, which specifies the distribution of the target’s parameter set given each individual knowledge types. The pre-posteriors are then combined to obtain a posterior distribution for the parameter set that matches all the available knowledge. The method is illustrated with an example.
Resource allocation among agents with preferences induced by factored mdps
- In Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS-06
, 2006
"... Distributing scarce resources among agents in a way that maximizes the social welfare of the group is a computationally hard problem when the value of a resource bundle is not linearly decomposable. Furthermore, the problem of determining the value of a resource bundle can be a significant computati ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Distributing scarce resources among agents in a way that maximizes the social welfare of the group is a computationally hard problem when the value of a resource bundle is not linearly decomposable. Furthermore, the problem of determining the value of a resource bundle can be a significant computational challenge in itself, such as for an agent operating in a stochastic environment, where the value of a resource bundle is the expected payoff of the optimal policy realizable given these resources. Recent work has shown that the structure in agents ’ preferences induced by stochastic policy-optimization problems (modeled as MDPs) can be exploited to solve the resource-allocation and the policyoptimization problems simultaneously, leading to drastic (often exponential) improvements in computational efficiency. However, previous work used a flat MDP model that scales very poorly. In this work, we present and empirically evaluate a resource-allocation mechanism that achieves much better scaling by using factored MDP models, thus exploiting both the structure in agents ’ MDP-induced preferences, as well as the structure within agents ’ MDPs.
Resource allocation among agents with MDP-induced preferences
- Journal of Artificial Intelligence Research
"... Allocating scarce resources among agents to maximize global utility is, in general, computationally challenging. We focus on problems where resources enable agents to execute actions in stochastic environments, modeled as Markov decision processes (MDPs), such that the value of a resource bundle is ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Allocating scarce resources among agents to maximize global utility is, in general, computationally challenging. We focus on problems where resources enable agents to execute actions in stochastic environments, modeled as Markov decision processes (MDPs), such that the value of a resource bundle is defined as the expected value of the optimal MDP policy realizable given these resources. We present an algorithm that simultaneously solves the resource-allocation and the policy-optimization problems. This allows us to avoid explicitly representing utilities over exponentially many resource bundles, leading to drastic (often exponential) reductions in computational complexity. We then use this algorithm in the context of self-interested agents to design a combinatorial auction for allocating resources. We empirically demonstrate the effectiveness of our approach by showing that it can, in minutes, optimally solve problems for which a straightforward combinatorial resource-allocation technique would require the agents to enumerate up to 2 100 resource bundles and the auctioneer to solve an NP-complete problem with an input of that size. 1.

