## Sequential optimality and coordination in multiagent systems (1999)

### Cached

### Download Links

Venue: | In International Joint Conference on Artificial Intelligence |

Citations: | 149 - 3 self |

### BibTeX

@INPROCEEDINGS{Boutilier99sequentialoptimality,

author = {Craig Boutilier},

title = {Sequential optimality and coordination in multiagent systems},

booktitle = {In International Joint Conference on Artificial Intelligence},

year = {1999},

pages = {478--485}

}

### Years of Citing Articles

### OpenURL

### Abstract

Coordination of agent activities is a key problem in multiagent systems. Set in a larger decision theoretic context, the existence of coordination problems leads to difficulty in evaluating the utility of a situation. This in turn makes defining optimal policies for sequential decision processes problematic. We propose a method for solving sequential multiagent decision problems by allowing agents to reason explicitly about specific coordination mechanisms. We define an extension of value iteration in which the system’s state space is augmented with the state of the coordination mechanism adopted, allowing agents to reason about the short and long term prospects for coordination, the long term consequences of (mis)coordination, and make decisions to engage or avoid coordination problems based on expected value. We also illustrate the benefits of mechanism generalization. 1

### Citations

1311 |
Markov Decision Processes: Discrete Stochastic Dynamic Programming
- Puterman
- 2005
(Show Context)
Citation Context ...tions in Section 6. 2 Multiagent MDPs 2.1 Markov Decision Processes We begin by presenting standard (single-agent) Markov decision processes (MDPs) and describe their multiagent extensions below (see =-=[3, 13]-=- for further details on MDPs). A fully observable MDP M = hS; A; Pr; Ri comprises the following components. S is a finite set of states of the system being controlled. The agent has a finite set of ac... |

875 |
The theory of learning in games
- Fudenberg, Levine
- 1998
(Show Context)
Citation Context ...coordinated actions [9, 15]; (b) allowing communication among agents before action selection [16]; and (c) the use of learning methods, whereby agents learn to coordinate through repeated interaction =-=[5, 6, 8, 11]-=-. Unfortunately, none of these approaches explicitly considers the impact of coordination problems in the context of larger sequential decision problems. If the agents run the risk of miscoordination ... |

525 | Markov games as a framework for multi-agent reinforcement learning
- Littman
- 1994
(Show Context)
Citation Context ...coordinated actions [9, 15]; (b) allowing communication among agents before action selection [16]; and (c) the use of learning methods, whereby agents learn to coordinate through repeated interaction =-=[5, 6, 8, 11]-=-. Unfortunately, none of these approaches explicitly considers the impact of coordination problems in the context of larger sequential decision problems. If the agents run the risk of miscoordination ... |

478 |
Convention: A philosophical study
- Lewis
- 1969
(Show Context)
Citation Context ...agents. This is often infeasible. Approaches to dealing with "independent " decision makers include: (a) the design of conventions or social laws that restrict agents to selecting coordinate=-=d actions [9, 15]-=-; (b) allowing communication among agents before action selection [16]; and (c) the use of learning methods, whereby agents learn to coordinate through repeated interaction [5, 6, 8, 11]. Unfortunatel... |

252 |
Stochastic games
- Shapley
(Show Context)
Citation Context ...on problem, e V t (s 1 ) is given by 10b t+1 3 c. MMDPs, while a natural extension of MDPs to cooperative multiagent settings, can also be viewed as a type of stochastic game as formulated by Shapley =-=[14]-=-. Stochastic games were originally formulated for zero-sum games only (and as we will see, the zero-sum assumption alleviates certain difficulties), whereas we focus on the (equally special) case of c... |

243 | Rational Learning Leads to Nash Equilibrium
- KALAI, LEHRER
- 1993
(Show Context)
Citation Context ...coordinated actions [9, 15]; (b) allowing communication among agents before action selection [16]; and (c) the use of learning methods, whereby agents learn to coordinate through repeated interaction =-=[5, 6, 8, 11]-=-. Unfortunately, none of these approaches explicitly considers the impact of coordination problems in the context of larger sequential decision problems. If the agents run the risk of miscoordination ... |

187 |
On the synthesis of useful social laws for artificial agent societies
- Shoham, Tennenholtz
- 1992
(Show Context)
Citation Context ...agents. This is often infeasible. Approaches to dealing with "independent " decision makers include: (a) the design of conventions or social laws that restrict agents to selecting coordinate=-=d actions [9, 15]-=-; (b) allowing communication among agents before action selection [16]; and (c) the use of learning methods, whereby agents learn to coordinate through repeated interaction [5, 6, 8, 11]. Unfortunatel... |

166 |
Iterative solutions of games by fictitious play
- Brown
- 1951
(Show Context)
Citation Context ...ensures eventual coordination, at a rate dictated by the number of agents and number of choices available to them. Fictitious play (FP) is a related learning technique commonly studied in game theory =-=[4, 6]-=- where each agent i observes the actions played in the past by other agents and plays a best response given the empirical distributionobserved. We refer to [6] for details, but note that the state of ... |

154 |
Decision theoretic planning: Structural assumptions and computational leverage
- Boutilier, Dean, et al.
- 1999
(Show Context)
Citation Context ...tions in Section 6. 2 Multiagent MDPs 2.1 Markov Decision Processes We begin by presenting standard (single-agent) Markov decision processes (MDPs) and describe their multiagent extensions below (see =-=[3, 13]-=- for further details on MDPs). A fully observable MDP M = hS; A; Pr; Ri comprises the following components. S is a finite set of states of the system being controlled. The agent has a finite set of ac... |

114 |
Fictitious play property for games with identical interests
- Monderer, Shapley
- 1996
(Show Context)
Citation Context ... thus FP has an infinite number of states. For fully cooperative games, FP converges to an optimal joint action if attention is restricted to PIO-actions and agents randomize over tied best responses =-=[2, 12]-=-. 2 It also has the property that once a coordinated action is played, it is played forever. Unlike randomization, FP tends to lead to faster coordination as the number of agents and actions increase ... |

62 |
Decentralized supervisory control of discrete-event systems
- Lin, Wonham
- 1988
(Show Context)
Citation Context ...ing can be viewed as one of designing social laws [15]. It is also related to the issues faced in the design of protocols for distributed systems and the distributed control of discrete-event systems =-=[10]-=-. But rather than designing protocols for specific situations, metaprotocols that increase value over a wide variety of CPs would be the target. The framework developed here can also help decide wheth... |

50 |
Steady State Learning and Nash Equilibrium
- Fudenberg, Levine
- 1993
(Show Context)
Citation Context |

20 | Learning conventions in multiagent stochastic domains using likelihood estimates
- Boutilier
- 1996
(Show Context)
Citation Context ... thus FP has an infinite number of states. For fully cooperative games, FP converges to an optimal joint action if attention is restricted to PIO-actions and agents randomize over tied best responses =-=[2, 12]-=-. 2 It also has the property that once a coordinated action is played, it is played forever. Unlike randomization, FP tends to lead to faster coordination as the number of agents and actions increase ... |

20 |
Learning to coordinate actions in multi-agent systems
- Wei
- 1993
(Show Context)
Citation Context ... " decision makers include: (a) the design of conventions or social laws that restrict agents to selecting coordinated actions [9, 15]; (b) allowing communication among agents before action selec=-=tion [16]-=-; and (c) the use of learning methods, whereby agents learn to coordinate through repeated interaction [5, 6, 8, 11]. Unfortunately, none of these approaches explicitly considers the impact of coordin... |

17 |
Learning to coordinate actions in multiagent systems
- Weiß
- 1993
(Show Context)
Citation Context ...t” decision makers include: (a) the design of conventions or social laws that restrict agents to selecting coordinated actions [9, 15]; (b) allowing communication among agents before action selection =-=[16]-=-; and (c) the use of learning methods, whereby agents learn to coordinate through repeated interaction [5, 6, 8, 11]. Unfortunately, none of these approaches explicitly considers the impact of coordin... |

4 |
A General Equilibrium Selection in Games
- Harsanyi, Reinhard
- 1988
(Show Context)
Citation Context ...ten be "reduced" by eliminating certain PIO-actions due to considerations such as dominance, risk (e.g., see the notions of risk-dominance and tracing used by Harsanyi and Selten to select e=-=quilibria [7]), or focu-=-sing on certain PIO-actions due to certain asymmetries. These reductions, if embodied in protocols commonly known by all agents, can limit choices making the CP "smaller" (thus potentially m... |