Results 1 -
5 of
5
Transition-Independent Decentralized Markov Decision Processes
, 2003
"... There has been substantial progress with formal models for sequential decision making by individual agents using the Markov decision process (MDP). However, similar treatment of multi-agent systems is lacking. A recent complexity result, showing that solving decentralized MDPs is NEXPhard, provides ..."
Abstract
-
Cited by 47 (11 self)
- Add to MetaCart
There has been substantial progress with formal models for sequential decision making by individual agents using the Markov decision process (MDP). However, similar treatment of multi-agent systems is lacking. A recent complexity result, showing that solving decentralized MDPs is NEXPhard, provides a partial explanation. To overcome this complexity barrier, we identify a general class of transitionindependent decentralized MDPs that is widely applicable. The class consists of independent collaborating agents that are tied together through a global reward function that depends upon both of their histories. We present a novel algorithm for solving this class of problems and examine its properties. The result is the first effective technique to solve optimally a class of decentralized MDPs. This lays the foundation for further work in this area on both exact and approximate solutions.
Risk-sensitive planning with one-switch utility functions: Value iteration
- In AAAI
, 2005
"... Decision-theoretic planning with nonlinear utility functions is important since decision makers are often risk-sensitive in high-stake planning situations. One-switch utility functions are an important class of nonlinear utility functions that can model decision makers whose decisions change with th ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
Decision-theoretic planning with nonlinear utility functions is important since decision makers are often risk-sensitive in high-stake planning situations. One-switch utility functions are an important class of nonlinear utility functions that can model decision makers whose decisions change with their wealth level. We study how to maximize the expected utility of a Markov decision problem for a given one-switch utility function, which is difficult since the resulting planning problem is not decomposable. We first study an approach that augments the states of the Markov decision problem with the wealth level. The properties of the resulting infinite Markov decision problem then allow us to generalize the standard risk-neutral version of value iteration from manipulating values to manipulating functions that map wealth levels to values. We use a probabilistic blocks-world example to demonstrate that the resulting risk-sensitive version of value iteration is practical.
Existence and finiteness conditions for risk-sensitive planning: Results and conjectures
- In Proceedings of the Twentieth Annual Conference on Uncertainty in Artificial Intelligence (UAI-05
"... Decision-theoretic planning with risk-sensitive planning objectives is important for building autonomous agents or decision-support systems for real-world applications. However, this line of research has been largely ignored in the artificial intelligence and operations research communities since pl ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
Decision-theoretic planning with risk-sensitive planning objectives is important for building autonomous agents or decision-support systems for real-world applications. However, this line of research has been largely ignored in the artificial intelligence and operations research communities since planning with risk-sensitive planning objectives is more complicated than planning with risk-neutral planning objectives. To remedy this situation, we derive conditions that guarantee that the optimal expected utilities of the total plan-execution reward exist and are finite for fully observable Markov decision process models with non-linear utility functions. In case of Markov decision process models with both positive and negative rewards, most of our results hold for stationary policies only, but we conjecture that they can be generalized to nonstationary policies. 1

