## Solving transition independent decentralized markov decision processes (2004)

### Cached

### Download Links

- [www.aaai.org]
- [www.cs.cmu.edu]
- [anytime.cs.umass.edu]
- [www-2.cs.cmu.edu]
- [anytime.cs.umass.edu]
- [anytime.cs.umass.edu]
- [www.jair.org]
- [rbr.cs.umass.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | Journal of Artificial Intelligence Research |

Citations: | 73 - 11 self |

### BibTeX

@ARTICLE{Becker04solvingtransition,

author = {Raphen Becker and Shlomo Zilberstein and Claudia V. Goldman},

title = {Solving transition independent decentralized markov decision processes},

journal = {Journal of Artificial Intelligence Research},

year = {2004},

volume = {22},

pages = {2004}

}

### Years of Citing Articles

### OpenURL

### Abstract

Formal treatment of collaborative multi-agent systems has been lagging behind the rapid progress in sequential decision making by individual agents. Recent work in the area of decentralized Markov Decision Processes (MDPs) has contributed to closing this gap, but the computational complexity of these models remains a serious obstacle. To overcome this complexity barrier, we identify a specific class of decentralized MDPs in which the agents ’ transitions are independent. The class consists of independent collaborating agents that are tied together through a structured global reward function that depends on all of their histories of states and actions. We present a novel algorithm for solving this class of problems and examine its properties, both as an optimal algorithm and as an anytime algorithm. To the best of our knowledge, this is the first algorithm to optimally solve a non-trivial subclass of decentralized MDPs. It lays the foundation for further work in this area on both exact and approximate algorithms. 1.

### Citations

834 | Planning and acting in partially observable stochastic domains - Kaelbling, Littman, et al. - 1998 |

309 |
The complexity of markov decision processes
- Papadimitriou, Tsitsiklis
- 1987
(Show Context)
Citation Context ...EC-POMDP and DEC-MDP is NEXPcomplete, even when only two agents are involved (Bernstein et al., 2002). This is in contrast to the best known bounds for MDPs (P-complete) and POMDPs (PSPACE-complete) (=-=Papadimitriou & Tsitsiklis, 1987-=-; Mundhenk, Goldsmith, Lusena, & Allender, 2000). The few recent studies of decentralized control problems (with or without communication between the agents) confirm that solving even simple problem i... |

289 | The complexity of decentralized control of markov decision processes
- Bernstein, Zilberstein, et al.
(Show Context)
Citation Context ...omplexity study of decentralized control shows that solving such problems is extremely difficult. The complexity of both DEC-POMDP and DEC-MDP is NEXPcomplete, even when only two agents are involved (=-=Bernstein et al., 2002-=-). This is in contrast to the best known bounds for MDPs (P-complete) and POMDPs (PSPACE-complete) (Papadimitriou & Tsitsiklis, 1987; Mundhenk, Goldsmith, Lusena, & Allender, 2000). The few recent stu... |

183 | The communicative multi-agent team decision problem: Analyzing teamwork theories and models
- Pynadeth, Tambe
- 2002
(Show Context)
Citation Context ...a, & Allender, 2000). The few recent studies of decentralized control problems (with or without communication between the agents) confirm that solving even simple problem instances is extremely hard (=-=Pynadath & Tambe, 2002-=-; Xuan & Lesser, 2002). However, there are certain goal-oriented DEC-MDPs for which there are optimal, polynomial algorithms (Goldman & Zilberstein, 2004). For the general DEC-POMDP, the only known op... |

165 | Solving distributed constraint optimization problems using cooperative mediation - Mailler, Lesser - 2004 |

155 | Taming decentralized pomdps : Towards efficient policy computation for multiagent settings
- Nair, Tambe, et al.
- 2003
(Show Context)
Citation Context ...ement learning online to find high quality but non-optimal solutions. Other research has approached this complexity barrier through approximations of the general problem (Goldman & Zilberstein, 2003; =-=Nair, Tambe, Yokoo, Pynadath, & Marsella, 2003-=-; Peshkin et al., 2000; Shen, Lesser, & Carver, 2003). However, the approach taken in this paper is two-fold. First, we exploit the structure of the domain offered by some special classes of DEC-MDPs ... |

144 | Sequential optimality and coordination in multiagent systems
- Boutilier
- 1999
(Show Context)
Citation Context ...pproximate solutions, such as heuristic policy search and gradient descent (e.g., Peshkin et al., 2000), or assumed complete communication at every step or when the optimal action is ambiguous (e.g., =-=Boutilier, 1999-=-; Xuan & Lesser, 2002). The former are not guaranteed to converge on the optimal solution and the latter are not practical when communication is not possible or very expensive. While the formal proble... |

130 | Learning to cooperate via policy search
- Peshkin, Kim, et al.
- 2000
(Show Context)
Citation Context ...on-optimal solutions. Other research has approached this complexity barrier through approximations of the general problem (Goldman & Zilberstein, 2003; Nair, Tambe, Yokoo, Pynadath, & Marsella, 2003; =-=Peshkin et al., 2000-=-; Shen, Lesser, & Carver, 2003). However, the approach taken in this paper is two-fold. First, we exploit the structure of the domain offered by some special classes of DEC-MDPs to reduce the complexi... |

119 | Dynamic programming for partially observable stochastic games - Hansen, Bernstein, et al. - 2004 |

110 | 2001), An asynchronous complete method for distributed constraint optimization, Journal of the Association for Computing Machinery - Modi, Shen, et al. |

95 | Optimizing information exchange in cooperative multi-agent systems
- Goldman, Zilberstein
- 2003
(Show Context)
Citation Context ... while they focus on reinforcement learning online to find high quality but non-optimal solutions. Other research has approached this complexity barrier through approximations of the general problem (=-=Goldman & Zilberstein, 2003-=-; Nair, Tambe, Yokoo, Pynadath, & Marsella, 2003; Peshkin et al., 2000; Shen, Lesser, & Carver, 2003). However, the approach taken in this paper is two-fold. First, we exploit the structure of the dom... |

76 | Quantitative modeling of complex environments
- Decker, Lesser
- 1994
(Show Context)
Citation Context ...ts So far we have focused on global reward structures that do not impose any temporal constraints on the agents. Other types of constraints are soft temporal constraints like facilitates and hinders (=-=Decker & Lesser, 1993-=-). A facilitates constraint between activities A and B means that if A is finished before B is started, then execution of B is somehow facilitated. Facilitation can take many forms like reduced consum... |

68 | Decentralized control of cooperative systems: Categorization and complexity analysis
- Goldman, Zilberstein
- 2004
(Show Context)
Citation Context ...ng even simple problem instances is extremely hard (Pynadath & Tambe, 2002; Xuan & Lesser, 2002). However, there are certain goal-oriented DEC-MDPs for which there are optimal, polynomial algorithms (=-=Goldman & Zilberstein, 2004-=-). For the general DEC-POMDP, the only known optimal algorithm is a new dynamic programming algorithm developed by Hansen, Bernstein, and Zilberstein (2004). Xuan, Lesser, and Zilberstein (2001) use a... |

64 | Communication decisions in multi-agent cooperation: Model and experiments - Xuan, Lesser, et al. - 2001 |

61 | Transition-independent decentralized Markov decision processes
- Becker, Zilberstein, et al.
- 2003
(Show Context)
Citation Context ...Problem Description In this section we formalize the n-agent control problem as a transition independent, cooperative, decentralized decision problem. This formalism is an extension of previous work (=-=Becker, Zilberstein, Lesser, & Goldman, 2003-=-) to n agents. The domain involves any number of agents operating in a decentralized manner, choosing actions based upon their own local and incomplete view of the world. The agents are cooperative in... |

55 | Context-Specific Multiagent Coordination and Planning with Factored MDPs - Guestrin, Venkataraman, et al. - 2002 |

39 | Exploiting problem structure for distributed constraint optimization
- Liu, Sycara
(Show Context)
Citation Context ...al value reduces to solving the corresponding augmented MDP for each of the remaining agents. As a distributed search process it maps nicely into a DCOP (Distributed Constraint Optimization Problem) (=-=Liu & Sycara, 1995-=-; Yokoo & Durfee, 1991). As a DCOP, each agent has one variable, which represents the policy that agent adopts. The domain of that variable is the optimal coverage set for that agent. There is a cost ... |

39 | Complexity of finite-horizon Markov decision processes
- Mundhenk, Goldsmith, et al.
- 2000
(Show Context)
Citation Context ...te, even when only two agents are involved (Bernstein et al., 2002). This is in contrast to the best known bounds for MDPs (P-complete) and POMDPs (PSPACE-complete) (Papadimitriou & Tsitsiklis, 1987; =-=Mundhenk, Goldsmith, Lusena, & Allender, 2000-=-). The few recent studies of decentralized control problems (with or without communication between the agents) confirm that solving even simple problem instances is extremely hard (Pynadath & Tambe, 2... |

36 | General principles of learning-based multi-agent systems - Wolpert, Wheeler, et al. - 1999 |

34 | Tsitsiklis. Intractable problems in control theory - Papadimitriou, N - 1985 |

33 | Autonomous rovers for mars exploration
- Washington, Golden, et al.
- 1999
(Show Context)
Citation Context ...ss of problems with two examples. The first example is the problem of controlling the operation of multiple planetary exploration rovers, such as the ones used by NASA to explore the surface of Mars (=-=Washington, Golden, Bresina, Smith, Anderson, & Smith, 1999-=-). Periodically, the rovers are in communication with a ground control center. During that time, the rovers transmit the scientific data they have collected and receive a new mission for the next peri... |

32 |
Multi-agent policies: from centralized ones to decentralized ones
- Xuan, Lesser
- 2002
(Show Context)
Citation Context ...e few recent studies of decentralized control problems (with or without communication between the agents) confirm that solving even simple problem instances is extremely hard (Pynadath & Tambe, 2002; =-=Xuan & Lesser, 2002-=-). However, there are certain goal-oriented DEC-MDPs for which there are optimal, polynomial algorithms (Goldman & Zilberstein, 2004). For the general DEC-POMDP, the only known optimal algorithm is a ... |

23 | Decentralized control of a multiple access broadcast channel: performance bounds - Ooi, Wornell - 1996 |

21 | Decentralized control of finite state Markov processes - Hsu, Marcus - 1982 |

19 |
N.: Minimizing communication cost in a distributed bayesian network using a decentralised mdp
- Shen, Lesser, et al.
- 2003
(Show Context)
Citation Context ...Other research has approached this complexity barrier through approximations of the general problem (Goldman & Zilberstein, 2003; Nair, Tambe, Yokoo, Pynadath, & Marsella, 2003; Peshkin et al., 2000; =-=Shen, Lesser, & Carver, 2003-=-). However, the approach taken in this paper is two-fold. First, we exploit the structure of the domain offered by some special classes of DEC-MDPs to reduce the complexity of the model. Then we prese... |

17 | On the complexity of designing distributed protocols
- Papadimitriou, Tsitsiklis
- 1982
(Show Context)
Citation Context ...y such policies for which this should be done. Therefore, the upper bound for the decision problem stated in this theorem is NP. To prove the lower bound we will reduce the NP-complete problem DTEAM (=-=Papadimitriou & Tsitsiklis, 1982-=-, 1986) to this problem for two agents, which is sufficient for a lower bound. DTEAM is a single-step discrete team decision problem. There are two agents. Agent i, i = 1, 2, observes a random integer... |

9 | Distributed constraint optimization as a formal model of partially adversarial cooperation
- Yokoo, Durfee
- 1991
(Show Context)
Citation Context ...solving the corresponding augmented MDP for each of the remaining agents. As a distributed search process it maps nicely into a DCOP (Distributed Constraint Optimization Problem) (Liu & Sycara, 1995; =-=Yokoo & Durfee, 1991-=-). As a DCOP, each agent has one variable, which represents the policy that agent adopts. The domain of that variable is the optimal coverage set for that agent. There is a cost function for each of t... |

3 | A multiagent reinforcement learning algorithm by dynamically merging Markov decision processes - Ghavamzadeh, Mahadevan - 2002 |

2 |
Decentralized Markov decision processes with structured transitions
- Becker, Zilberstein, et al.
- 2004
(Show Context)
Citation Context ... tractably and optimally solve a significant subclass of DEC-MDPs. We have also applied it to a different class of problems in which the agents were reward independent but not transition independent (=-=Becker, Zilberstein, & Lesser, 2004-=-), which demonstrates that it is not limited to the class of problems described in this paper. Most other work on distributed problems have used approximate solutions, such as heuristic policy search ... |

1 | Dynamic programming for partially observable stochastic games - Becker, Lesser, et al. - 2004 |

1 | Transition Independent DEC-MDPs - Tumer, Agogino, et al. - 2002 |