Results 1  10
of
38
Multiagent Planning with Factored MDPs
 In NIPS14
, 2001
"... We present a principled and efficient planning algorithm for cooperative multiagent dynamic systems. A striking feature of our method is that the coordination and communication between the agents is not imposed, but derived directly from the system dynamics and function approximation architecture ..."
Abstract

Cited by 176 (15 self)
 Add to MetaCart
(Show Context)
We present a principled and efficient planning algorithm for cooperative multiagent dynamic systems. A striking feature of our method is that the coordination and communication between the agents is not imposed, but derived directly from the system dynamics and function approximation architecture. We view the entire multiagent system as a single, large Markov decision process (MDP), which we assume can be represented in a factored way using a dynamic Bayesian network (DBN). The action space of the resulting MDP is the joint action space of the entire set of agents. Our approach is based on the use of factored linear value functions as an approximation to the joint value function. This factorization of the value function allows the agents to coordinate their actions at runtime using a natural message passing scheme. We provide a simple and efficient method for computing such an approximate value function by solving a single linear program, whose size is determined by the interaction between the value function structure and the DBN. We thereby avoid the exponential blowup in the state and action space. We show that our approach compares favorably with approaches based on reward sharing. We also show that our algorithm is an efficient alternative to more complicated algorithms even in the single agent case.
An introduction to collective intelligence
 Handbook of Agent technology. AAAI
, 1999
"... ..."
(Show Context)
Coordinated Reinforcement Learning
 In Proceedings of the ICML2002 The Nineteenth International Conference on Machine Learning
, 2002
"... We present several new algorithms for multiagent reinforcement learning. A common feature of these algorithms is a parameterized, structured representation of a policy or value function. This structure is leveraged in an approach we call coordinated reinforcement learning, by which agents coordinate ..."
Abstract

Cited by 113 (6 self)
 Add to MetaCart
(Show Context)
We present several new algorithms for multiagent reinforcement learning. A common feature of these algorithms is a parameterized, structured representation of a policy or value function. This structure is leveraged in an approach we call coordinated reinforcement learning, by which agents coordinate both their action selection activities and their parameter updates. Within the limits of our parametric representations, the agents will determine a jointly optimal action without explicitly considering every possible action in their exponentially large joint action space. Our methods differ from many previous reinforcement learning approaches to multiagent coordination in that structured communication and coordination between agents appears at the core of both the learning algorithm and the execution architecture. Our experimental results, comparing our approach to other RL methods, illustrate both the quality of the policies obtained and the additional benefits of coordination. 1.
Solving transition independent decentralized Markov decision processes
 JAIR
, 2004
"... Formal treatment of collaborative multiagent systems has been lagging behind the rapid progress in sequential decision making by individual agents. Recent work in the area of decentralized Markov Decision Processes (MDPs) has contributed to closing this gap, but the computational complexity of thes ..."
Abstract

Cited by 107 (13 self)
 Add to MetaCart
Formal treatment of collaborative multiagent systems has been lagging behind the rapid progress in sequential decision making by individual agents. Recent work in the area of decentralized Markov Decision Processes (MDPs) has contributed to closing this gap, but the computational complexity of these models remains a serious obstacle. To overcome this complexity barrier, we identify a specific class of decentralized MDPs in which the agents ’ transitions are independent. The class consists of independent collaborating agents that are tied together through a structured global reward function that depends on all of their histories of states and actions. We present a novel algorithm for solving this class of problems and examine its properties, both as an optimal algorithm and as an anytime algorithm. To the best of our knowledge, this is the first algorithm to optimally solve a nontrivial subclass of decentralized MDPs. It lays the foundation for further work in this area on both exact and approximate algorithms. 1.
Decentralized control of cooperative systems: Categorization and complexity analysis
 Journal of Artificial Intelligence Research
, 2004
"... Decentralized control of cooperative systems captures the operation of a group of decisionmakers that share a single global objective. The difficulty in solving optimally such problems arises when the agents lack full observability of the global state of the system when they operate. The general pr ..."
Abstract

Cited by 89 (9 self)
 Add to MetaCart
Decentralized control of cooperative systems captures the operation of a group of decisionmakers that share a single global objective. The difficulty in solving optimally such problems arises when the agents lack full observability of the global state of the system when they operate. The general problem has been shown to be NEXPcomplete. In this paper, we identify classes of decentralized control problems whose complexity ranges between NEXP and P. In particular, we study problems characterized by independent transitions, independent observations, and goaloriented objective functions. Two algorithms are shown to solve optimally useful classes of goaloriented decentralized processes in polynomial time. This paper also studies information sharing among the decisionmakers, which can improve their performance. We distinguish between three ways in which agents can exchange information: indirect communication, direct communication and sharing state features that are not controlled by the agents. Our analysis shows that for every class of problems we consider, introducing direct or indirect communication does not change the worstcase complexity. The results provide a better understanding of the complexity of decentralized control problems that arise in practice and facilitate the development of planning algorithms for these problems. 1.
A survey of collectives
 IN COLLECTIVES AND THE DESIGN OF COMPLEX SYSTEMS
, 2004
"... Due to the increasing sophistication and miniaturization of computational components, complex, distributed systems of interacting agents are becoming ubiquitous. Such systems, where each agent aims to optimize its own performance, but where there is a welldefined set of systemlevel performance cr ..."
Abstract

Cited by 28 (12 self)
 Add to MetaCart
(Show Context)
Due to the increasing sophistication and miniaturization of computational components, complex, distributed systems of interacting agents are becoming ubiquitous. Such systems, where each agent aims to optimize its own performance, but where there is a welldefined set of systemlevel performance criteria, are called collectives. The fundamental problem in analyzing/designing such systems is in determining how the combined actions of a large number of agents leads to “coordinated ” behavior on the global scale. Examples of artificial systems which exhibit such behavior include packet routing across a data network, control of an array of communication satellites, coordination of multiple rovers, and dynamic job scheduling across a distributed computer grid. Examples of natural systems include ecosystems, economies, and the organelles within a living cell. No current scientific discipline provides a thorough understanding of the relation between the structure of collectives and how well they meet their overall performance criteria. Although still very young, research on collectives has resulted in successes both in understanding and designing such systems. It is expected that as it matures and draws upon other disciplines related to collectives, this field will greatly expand the range of computationally addressable tasks. Moreover, in addition to drawing on them, such a fully developed field of collective intelligence may provide insight into already established scientific fields, such as mechanism design, economics, game theory, and population biology. This chapter provides a survey to the emerging science of collectives.
Wireless sensor networks for commercial lighting control: Decision making with multiagent systems
 In AAAI Workshop on Sensor Networks
, 2004
"... The application of wireless sensor networks to commercial lighting control provides a practical application that can benefit directly from artificial intelligence techniques. This application requires decision making in the face of ..."
Abstract

Cited by 24 (1 self)
 Add to MetaCart
The application of wireless sensor networks to commercial lighting control provides a practical application that can benefit directly from artificial intelligence techniques. This application requires decision making in the face of
Multiagent systems by incremental gradient reinforcement learning
 In Proceedings of Seventeenth International Joint Conference on Artificial Intelligence (IJCAI01
, 2001
"... A new reinforcement learning (RL) methodology is proposed to design multiagent systems. In the realistic setting of situated agents with local perception, the task of automatically building a coordinated system is of crucial importance. We use simple reactive agents which learn their own behavior i ..."
Abstract

Cited by 21 (8 self)
 Add to MetaCart
A new reinforcement learning (RL) methodology is proposed to design multiagent systems. In the realistic setting of situated agents with local perception, the task of automatically building a coordinated system is of crucial importance. We use simple reactive agents which learn their own behavior in a decentralized way. To cope with the difficulties inherent to RL used in that framework, we have developed an incremental learning algorithm where agents face more and more complex tasks. We illustrate this general framework on a computer experiment where agents have to coordinate to reach a global goal. 1
Adaptivity in agentbased routing for data networks
 In Proceedings of the Fourth International Conference on Autonomous Agents (Agents 2000
, 2000
"... Adaptivity, both of the individual agents and of the interaction structure among the agents, seems indispensable for scaling up multiagent systems (MAS’s) in noisy environments. One important consideration in designing adaptive agents is choosing their action spaces to be as amenable as possible to ..."
Abstract

Cited by 15 (5 self)
 Add to MetaCart
(Show Context)
Adaptivity, both of the individual agents and of the interaction structure among the agents, seems indispensable for scaling up multiagent systems (MAS’s) in noisy environments. One important consideration in designing adaptive agents is choosing their action spaces to be as amenable as possible to machine learning techniques, especially to reinforcement learning (RL) techniques [22]. One important way to have the interaction structure connecting agents itself be adaptive is to have the intentions and/or actions of the agents be in the input spaces of the other agents, much as in Stackelberg games [2, 16, 15, 18]. We consider both kinds of adaptivity in the design of a MAS to control network packet routing [21, 6, 17, 12] We demonstrate on the OPNET
Adaptive, distributed control of constrained multiagent systems
 Proceedings of AAMAS 04
, 2004
"... Product Distribution (PD) theory was recently developed as a broad framework for analyzing and optimizing distributed systems. Here we demonstrate its use for adaptive distributed control of MultiAgent Systems (MAS’s), i.e., for distributed stochastic optimization using MAS’s. First we review one m ..."
Abstract

Cited by 15 (11 self)
 Add to MetaCart
(Show Context)
Product Distribution (PD) theory was recently developed as a broad framework for analyzing and optimizing distributed systems. Here we demonstrate its use for adaptive distributed control of MultiAgent Systems (MAS’s), i.e., for distributed stochastic optimization using MAS’s. First we review one motivation of PD theory, as the informationtheoretic extension of conventional fullrationality game theory to the case of bounded rational agents. In this extension the equilibrium of the game is the optimizer of a Lagrangian of the (probability distribution of) the joint state of the agents. When the game in question is a team game with constraints, that equilibrium optimizes the expected value of the team game utility, subject to those constraints. One common way to £nd that equilibrium is to have each agent run a Reinforcement Learning (RL) algorithm. PD theory reveals this to be a particular type of search algorithm for minimizing the Lagrangian. Typically that algorithm is quite inef£cient. A more principled alternative is to use a variant of Newton’s method to minimize the Lagrangian. Here we compare this alternative to RLbased search in three sets of computer experiments. These are the N Queen’s problem and binpacking problem from the optimization literature, and the Bar problem from the distributed RL literature. Our results con£rm that the PDtheorybased approach outperforms the RLbased scheme in all three domains. 1.