## Decentralized control of cooperative systems: Categorization and complexity analysis (2004)

### Cached

### Download Links

Venue: | Journal of Artificial Intelligence Research |

Citations: | 70 - 8 self |

### BibTeX

@ARTICLE{Goldman04decentralizedcontrol,

author = {Claudia V. Goldman and Shlomo Zilberstein},

title = {Decentralized control of cooperative systems: Categorization and complexity analysis},

journal = {Journal of Artificial Intelligence Research},

year = {2004},

volume = {22},

pages = {2004}

}

### Years of Citing Articles

### OpenURL

### Abstract

Decentralized control of cooperative systems captures the operation of a group of decision-makers that share a single global objective. The difficulty in solving optimally such problems arises when the agents lack full observability of the global state of the system when they operate. The general problem has been shown to be NEXP-complete. In this paper, we identify classes of decentralized control problems whose complexity ranges between NEXP and P. In particular, we study problems characterized by independent transitions, independent observations, and goal-oriented objective functions. Two algorithms are shown to solve optimally useful classes of goal-oriented decentralized processes in polynomial time. This paper also studies information sharing among the decision-makers, which can improve their performance. We distinguish between three ways in which agents can exchange information: indirect communication, direct communication and sharing state features that are not controlled by the agents. Our analysis shows that for every class of problems we consider, introducing direct or indirect communication does not change the worst-case complexity. The results provide a better understanding of the complexity of decentralized control problems that arise in practice and facilitate the development of planning algorithms for these problems. 1.

### Citations

7537 | Probabilistic Reasoning in Intelligent Systems - Pearl - 1988 |

1120 |
The Contract Net Protocol: High-Level Communication and Control in a Distributed Problem Solver
- Smith
- 1980
(Show Context)
Citation Context ... agents in the system, who have full observability. Coordination and cooperation have been studied extensively by the distributed artificial intelligence community (Durfee, 1988; Grosz & Kraus, 1996; =-=Smith, 1988-=-) assuming a known and fixed language of communication. KQML (Finin, Labrou, & Mayfield, 1997) is an example of one standard designed to specify the possible communication between the agents. Balch an... |

1102 | KQML as an agent communication language - Finin, Fritzson, et al. - 1994 |

901 | Planning and acting in partially observable stochastic domains
- Kaelbling, Littman, et al.
- 1998
(Show Context)
Citation Context ...lar, single-agent planning problems in stochastic domains were modeled as partially observable Markov decision processes (POMDPs) or fully-observable MDPs (Dean, Kaelbling, Kirman, & Nicholson, 1995; =-=Kaelbling, Littman, & Cassandra, 1998-=-; Boutilier, Dearden, & Goldszmidt, 1995). Borrowing from Operations Research techniques, optimal plans can be computed for these planning problems by solving the corresponding Markov decision problem... |

494 | Collaborative plan for complex group actions
- Grosz, Kraus
(Show Context)
Citation Context ...entity, or by all the agents in the system, who have full observability. Coordination and cooperation have been studied extensively by the distributed artificial intelligence community (Durfee, 1988; =-=Grosz & Kraus, 1996-=-; Smith, 1988) assuming a known and fixed language of communication. KQML (Finin, Labrou, & Mayfield, 1997) is an example of one standard designed to specify the possible communication between the age... |

473 | Games and Decisions
- Luce, Raiffa
- 1957
(Show Context)
Citation Context ...systems and communication was free. Our aim is to find optimal policies of communication and action off-line, taking into account information that agents can acquire on-line. Game theory researchers (=-=Luce & Raiffa, 1957-=-; Aumann & Hart, 1994) have also looked at communication, although the approaches and questions are somewhat different from ours. For example, Wärneryd (1993), and Blume and Sobel (1995) study how the... |

325 |
Tsitsiklis. The complexity of Markov decision processes
- Papadimitriou, N
- 1987
(Show Context)
Citation Context ...ents will benefit most by communicating constantly. This results in a fully observable decentralized process, which is equivalent to an MMDP (Boutilier, 1999). This problem is known to be P-complete (=-=Papadimitriou & Tsitsiklis, 1987-=-). In real-world scenarios, it is reasonable to assume that direct communication has indeed an additional cost associated with it; the cost may reflect the risk of revealing information to competitive... |

324 | The dynamics of reinforcement learning in cooperative multiagent systems
- Claus, Boutilier
- 1998
(Show Context)
Citation Context ...ses serve as a formal framework to study the foundations of multi-agent systems (e.g., Becker et al., 2003; Hansen et al., 2004; Guestrin & Gordon, 2002; Peshkin et al., 2000; Pynadath & Tambe, 2002; =-=Claus & Boutilier, 1998-=-). Our study focuses on computing off-line decentralized policies of control for cooperative systems. This paper analyzes the complexity of solving these problems optimally for certain classes of dece... |

316 |
The optimal control of partially observable Markov processes over a finite horizon
- Smallwood, Sondik
- 1973
(Show Context)
Citation Context ... as all the information about the Dec-MDP process available to agent 1 at the end of I1 t the control interval t. This is done similarly to Smallwood and Sondik’s original proof for classical POMDPs (=-=Smallwood & Sondik, 1973-=-). I 1 t is given by the action a1t that agent 1 chose to perform at time t, the current resulting state s1t, which is fully observable by . We assume a certain policy for agent agent 1 (s1t =i1), and... |

310 | The complexity of decentralized control of markov decision processes
- Bernstein, Zilberstein, et al.
(Show Context)
Citation Context ... decentralized partially-observable Markov decision processes (Dec-POMDPs) or decentralized Markov decision processes (Dec-MDPs). 2 The complexity of solving these problems has been studied recently (=-=Bernstein, Givan, Immerman, & Zilberstein, 2002-=-; Pynadath & Tambe, 2002). Bernstein et al. have shown that solving optimally a Dec-MDP is NEXP-complete by reducing the control problem to the tiling problem. Rabinovich et al. (2003) have shown that... |

233 | Communication in reactive multiagent robotic systems - Balch, Arkin - 1994 |

232 | Exploiting structure in policy construction
- Boutilier, Dearden, et al.
- 1995
(Show Context)
Citation Context ...stochastic domains were modeled as partially observable Markov decision processes (POMDPs) or fully-observable MDPs (Dean, Kaelbling, Kirman, & Nicholson, 1995; Kaelbling, Littman, & Cassandra, 1998; =-=Boutilier, Dearden, & Goldszmidt, 1995-=-). Borrowing from Operations Research techniques, optimal plans can be computed for these planning problems by solving the corresponding Markov decision problem. There has been a vast amount of progre... |

193 | The communicative multiagent team decision problem: Analyzing teamwork theories and models
- Pynadath, Tambe
(Show Context)
Citation Context ...on processes (Dec-POMDPs) or decentralized Markov decision processes (Dec-MDPs). 2 The complexity of solving these problems has been studied recently (Bernstein, Givan, Immerman, & Zilberstein, 2002; =-=Pynadath & Tambe, 2002-=-). Bernstein et al. have shown that solving optimally a Dec-MDP is NEXP-complete by reducing the control problem to the tiling problem. Rabinovich et al. (2003) have shown that even approximating the ... |

174 | Planning under time constraints in stochastic domains
- Dean, Kaelbling, et al.
- 1995
(Show Context)
Citation Context ...ion-making in stochastic domains. In particular, single-agent planning problems in stochastic domains were modeled as partially observable Markov decision processes (POMDPs) or fully-observable MDPs (=-=Dean, Kaelbling, Kirman, & Nicholson, 1995-=-; Kaelbling, Littman, & Cassandra, 1998; Boutilier, Dearden, & Goldszmidt, 1995). Borrowing from Operations Research techniques, optimal plans can be computed for these planning problems by solving th... |

174 | Principles of metareasoning - Russell, Wefald - 1989 |

167 |
Coordination of Distributed Problem Solvers
- Durfee
- 1988
(Show Context)
Citation Context ... by a central entity, or by all the agents in the system, who have full observability. Coordination and cooperation have been studied extensively by the distributed artificial intelligence community (=-=Durfee, 1988-=-; Grosz & Kraus, 1996; Smith, 1988) assuming a known and fixed language of communication. KQML (Finin, Labrou, & Mayfield, 1997) is an example of one standard designed to specify the possible communic... |

165 | Taming decentralized POMDPs: Towards efficient policy computation for multiagent settings - Nair, Tambe, et al. - 2003 |

152 | Sequential optimality and coordination in multiagent systems
- Boutilier
- 1999
(Show Context)
Citation Context ...model, partial observability is assumed and the scenarios studied are more complex and include multiple states. Centralized multi-agent systems (MAS) were also studied in the framework of MDPs (e.g., =-=Boutilier, 1999-=-), where both the off-line planning stage and the on-line stage are controlled by a central entity, or by all the agents in the system, who have full observability. Coordination and cooperation have b... |

149 | Multi-agent planning with factored MDPs
- Guestrin, Koller, et al.
- 2001
(Show Context)
Citation Context ...an approximation algorithm for online learning that does not guarantee convergence. Agents in this model may communicate freely. Guestrin et al. study off-line approximations (a centralized approach (=-=Guestrin, Koller, & Parr, 2001-=-) and a distributed approach (Guestrin & Gordon, 2002)), where a known structure of the agents’ action dependencies induces a message passing structure. In this context, agents choose their actions in... |

136 | Efficient solution algorithms for factored mdps - Guestrin, Koller, et al. - 2003 |

134 | Learning to cooperate via policy search - Peshkin, Kim, et al. - 2000 |

123 | Dynamic programming for partially observable stochastic games
- Hansen, Bernstein, et al.
- 2004
(Show Context)
Citation Context ...ntralized Control with No Information Sharing So far, the only known algorithms for solving optimally decentralized control problems are the generalized version of dynamic programming for Dec-POMDPs (=-=Hansen et al., 2004-=-) and the Coverage-set algorithm (Becker et al., 2003) for Dec-MDPs with independent transitions and observations. The first algorithm solves optimally a general Dec-POMDP. Its practicality is restric... |

96 | Generalizing the partial global planning algorithm
- Decker, Lesser
- 1992
(Show Context)
Citation Context ...ith a predefined language of communication, which typically does not incur any costs, overlooking the fact that dependent observations offer yet another form of communication (Pynadath & Tambe, 2002; =-=Decker & Lesser, 1992-=-; Grosz & Kraus, 1996; Durfee, 1988; Roth, Vail, & Veloso, 2003). The problem of combining communication acts into the decision problem of a group of cooperative agents was addressed by Xuan et al. (2... |

96 | Optimizing Information Exchange in Cooperative Multi-agent Systems," in Second international joint conference on Autonomous agents and multiagent systems - Goldman, Zilberstein - 2003 |

66 | Transition-independent decentralized Markov decision processes
- Becker, Zilberstein, et al.
- 2003
(Show Context)
Citation Context ... the only known algorithms for solving optimally decentralized control problems are the generalized version of dynamic programming for Dec-POMDPs (Hansen et al., 2004) and the Coverage-set algorithm (=-=Becker et al., 2003-=-) for Dec-MDPs with independent transitions and observations. The first algorithm solves optimally a general Dec-POMDP. Its practicality is restricted by the complexity of these problems (NEXP-complet... |

56 | Symbolic heuristic search for factored Markov Decision Processes
- Feng, Hansen
- 2002
(Show Context)
Citation Context ...anning problems by solving the corresponding Markov decision problem. There has been a vast amount of progress in solving individual MDPs by exploiting domain structure (e.g., Boutilier et al., 1995; =-=Feng & Hansen, 2002-=-). Approximations of MDPs have also been studied, for example, by Guestrin et al. (2003), assuming that the reward function can be decomposed into local reward functions each depending on only a small... |

56 | Distributed value functions - Schneider, Wong, et al. - 1999 |

37 | General Principles of Learning-based Multi-agent Systems - Wolpert, Wheeler, et al. - 1999 |

35 | Intractable problems in control theory - Papadimitriou, Tsitsiklis - 1986 |

26 | Distributed planning in hierarchical factored MDPs
- Guestrin, Gordon
- 2002
(Show Context)
Citation Context ... guarantee convergence. Agents in this model may communicate freely. Guestrin et al. study off-line approximations (a centralized approach (Guestrin, Koller, & Parr, 2001) and a distributed approach (=-=Guestrin & Gordon, 2002-=-)), where a known structure of the agents’ action dependencies induces a message passing structure. In this context, agents choose their actions in turns and communication is free. The solution is bas... |

26 | Cheap Talk, Coordination, and Evolutionary Stability - Wärneryd - 1993 |

22 |
The Complexity of Multiagent Systems: The Price of Silence
- Rabinovich, Goldman, et al.
- 2003
(Show Context)
Citation Context ...rlapping tasks. In other cases, the system may benefit when both robots perform the same tasks. For example, both agents run the same 161s[Bernstein et al. 2002] NEXP-C Dec-MDP Goldman & Zilberstein [=-=Rabinovich et al. 2003-=-] Approx. NEXP-C NP-C NEXP-C IO and IT Goal[Lemma 4] [Lemma 3] oriented P-C |G|=1 Goaloriented [Lemma 5] [Lemma 6] NP-C With [Lemma 9] [Corollary 5] Information Sharing P-C |G|>1 – No Information Shar... |

19 | A real-time world model for multi-robot teams with high-latency communication
- Roth, Vail, et al.
- 2003
(Show Context)
Citation Context ...y does not incur any costs, overlooking the fact that dependent observations offer yet another form of communication (Pynadath & Tambe, 2002; Decker & Lesser, 1992; Grosz & Kraus, 1996; Durfee, 1988; =-=Roth, Vail, & Veloso, 2003-=-). The problem of combining communication acts into the decision problem of a group of cooperative agents was addressed by Xuan et al. (2001). Their framework is similar to ours but their approach is ... |

17 | On the complexity of designing distributed protocols
- Tsitsiklis
- 1982
(Show Context)
Citation Context ...fully observable. Figure 2: Exponential vs. Polynomial Sized Policies. It is already known that a simple decentralized decision-making problem for two agents is NP-hard (where |Ai| ≥ 2 and |Aj| ≥ 3) (=-=Papadimitriou & Tsitsiklis, 1982-=-, 1986). Therefore, the lower bound for the problem class stated in the lemma is also NP. ✷ It is an open question whether a Dec-POMDP with independent transitions and observations (without joint full... |

8 | Decentralized language learning through acting
- Goldman, Allen, et al.
- 2004
(Show Context)
Citation Context ... a separate line of research, we are addressing the question of agents controlling a decentralized process where the agents develop a mutual understanding of the messages exchanged along the process (=-=Goldman, Allen, & Zilberstein, 2004-=-). Direct communication is the only means of achieving full observability when the observations are independent and when there is no common uncontrollable features (Assumption 1). We define a Dec-MDP-... |

7 | Communication-Proof Equilibria in CheapTalk Games - Blume, Sobel - 1995 |

5 |
Goal-oriented Dec-MDPs with direct communication
- Goldman, Zilberstein
- 2004
(Show Context)
Citation Context ...T, IO GO-Dec-MDP (|G| ≥ 1), OptNGoals Section 4.2 no information sharing with uniform cost, NBCLG property IT, IO GO-Dec-MDP (|G| ≥ 1), Not Known Yet 9 Section 5 with direct communication Treated in (=-=Goldman & Zilberstein, 2004-=-). Table 2: A summary of the known algorithms for controlling decentralized MDPs optimally. function Opt1Goal(Dec-MDP) returns the optimal joint policy δ∗ , inputs: Dec-MDP=<S, A1, A2, P, R> G /* the ... |

1 |
Handbook of Game Theory with
- Aumann, Hart
- 1994
(Show Context)
Citation Context ...tion was free. Our aim is to find optimal policies of communication and action off-line, taking into account information that agents can acquire on-line. Game theory researchers (Luce & Raiffa, 1957; =-=Aumann & Hart, 1994-=-) have also looked at communication, although the approaches and questions are somewhat different from ours. For example, Wärneryd (1993), and Blume and Sobel (1995) study how the receiver of a messag... |