Results 1  10
of
22
Planning, learning and coordination in multiagent decision processes
 In Proceedings of the Sixth Conference on Theoretical Aspects of Rationality and Knowledge (TARK96
, 1996
"... There has been a growing interest in AI in the design of multiagent systems, especially in multiagent cooperative planning. In this paper, we investigate the extent to which methods from singleagent planning and learning can be applied in multiagent settings. We survey a number of different techniq ..."
Abstract

Cited by 96 (1 self)
 Add to MetaCart
There has been a growing interest in AI in the design of multiagent systems, especially in multiagent cooperative planning. In this paper, we investigate the extent to which methods from singleagent planning and learning can be applied in multiagent settings. We survey a number of different techniques from decisiontheoretic planning and reinforcement learning and describe a number of interesting issues that arise with regard to coordinating the policies of individual agents. To this end, we describe multiagent Markov decision processes as a general model in which to frame this discussion. These are special nperson cooperative games in which agents share the same utility function. We discuss coordination mechanisms based on imposed conventions (or social laws) as well as learning methods for coordination. Our focus is on the decomposition of sequential decision processes so that coordination can be learned (or imposed) locally, at the level of individual states. We also discuss the use of structured problem representations and their role in the generalization of learned conventions and in approximation. 1
APRICODD: Approximate policy construction using decision diagrams
 In Proceedings of Conference on Neural Information Processing Systems
, 2000
"... We propose a method of approximate dynamic programming for Markov decision processes (MDPs) using algebraic decision diagrams (ADDs). We produce nearoptimal value functions and policies with much lower time and space requirements than exact dynamic programming. Our method reduces the sizes of the i ..."
Abstract

Cited by 49 (2 self)
 Add to MetaCart
We propose a method of approximate dynamic programming for Markov decision processes (MDPs) using algebraic decision diagrams (ADDs). We produce nearoptimal value functions and policies with much lower time and space requirements than exact dynamic programming. Our method reduces the sizes of the intermediate value functions generated during value iteration by replacing the values at the terminals of the ADD with ranges of values. Our method is demonstrated on a class of large MDPs (with up to 34 billion states), and we compare the results with the optimal value functions. 1
Approximating value trees in structured dynamic programming
, 1996
"... We propose and examine a method of approximate dynamic programming for Markov decision processes based on structured problem representations. We assume an MDP is represented using a dynamic Bayesian network, and construct value functions using decision trees as our function representation. The size ..."
Abstract

Cited by 36 (13 self)
 Add to MetaCart
We propose and examine a method of approximate dynamic programming for Markov decision processes based on structured problem representations. We assume an MDP is represented using a dynamic Bayesian network, and construct value functions using decision trees as our function representation. The size of the representation is kept within acceptable limits by pruning these value trees so that leaves represent possible ranges of values, thus approximating the value functions produced during optimization. We propose a method for detecting convergence,prove errors bounds on the resulting approximately optimal value functions and policies, and describe some preliminary experimental results. 1
Structured solution methods for nonMarkovian decision processes
 In Proceedings of the Fourteenth National Conference on Artificial Intelligence (AAAI97
, 1997
"... Markov Decision Processes (MDPs), currently a popular method for modeling and solving decision theoretic planning problems, are limited by the Markovian assumption: rewards and dynamics depend on the current state only, and not on previous history. NonMarkovian decision processes (NMDPs) can also b ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
Markov Decision Processes (MDPs), currently a popular method for modeling and solving decision theoretic planning problems, are limited by the Markovian assumption: rewards and dynamics depend on the current state only, and not on previous history. NonMarkovian decision processes (NMDPs) can also be defined, but then the more tractable solution techniques developed for MDP’s cannot be directly applied. In this paper, we show how an NMDP, in which temporal logic is used to specify history dependence, can be automatically converted into an equivalent MDP by adding appropriate temporal variables. The resulting MDP can be represented in a structured fashion and solved using structured policy construction methods. In many cases, this offers significant computational advantagesover previous proposals for solving NMDPs. 1
Structured reachability analysis for Markov decision processes
 In Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence
, 1998
"... Recent research in decision theoretic planning has focussed on making the solution of Markov decision processes (MDPs) more feasible. We develop a family of algorithms for structured reachability analysis of MDPs that are suitable when an initial state (or set of states) is known. Using compact, str ..."
Abstract

Cited by 20 (3 self)
 Add to MetaCart
Recent research in decision theoretic planning has focussed on making the solution of Markov decision processes (MDPs) more feasible. We develop a family of algorithms for structured reachability analysis of MDPs that are suitable when an initial state (or set of states) is known. Using compact, structured representations of MDPs (e.g., Bayesian networks), our methods, which vary in the tradeoff between complexity and accuracy, produce structured descriptions of (estimated) reachable states that can be used to eliminate variables or variable values from the problem description, reducing the size of the MDP and making it easier to solve. One contribution of our work is the extension of ideas from GRAPHPLAN to deal with the distributed nature of action representations typically embodied within Bayes nets and the problem of correlated action effects. We also demonstrate that our algorithm can be made more complete by usingkary constraints instead of binary constraints. Another contribution is the illustration of how the compact representation of reachability constraints can be exploited by several existing (exact and approximate) abstraction algorithms for MDPs. 1
Learning Conventions in Multiagent Stochastic Domains using Likelihood Estimates
 In Proceedings of the Twelfth Conference on Uncertainty in Artificial Intelligence
, 1996
"... Fully cooperative multiagent systems—those in which agents share a joint utility model—is of special interest in AI. A key problem is that of ensuring that the actions of individual agents are coordinated, especially in settings where the agents are autonomous decision makers. We investigate approac ..."
Abstract

Cited by 18 (2 self)
 Add to MetaCart
Fully cooperative multiagent systems—those in which agents share a joint utility model—is of special interest in AI. A key problem is that of ensuring that the actions of individual agents are coordinated, especially in settings where the agents are autonomous decision makers. We investigate approaches to learning coordinated strategies in stochastic domains where an agent’s actions are not directly observable by others. Much recent work in game theory has adopted a Bayesian learning perspective to the more general problem of equilibrium selection, but tends to assume that actions can be observed. We discuss the special problems that arise when actions are not observable, including effects on rates of convergence, and the effect of action failure probabilities and asymmetries. We also use likelihood estimates as a means of generalizing fictitious play learning models in our setting. Finally, we propose the use of maximum likelihood as a means of removing strategies from consideration, with the aim of convergence to a conventional equilibrium, at which point learning and deliberation can cease. 1
Towards stochastic constraint programming: A study of online multichoice knapsack with deadlines
 In Proc. of CP'01
, 2001
"... Abstract. Constraint Programming (CP) is a very general programming paradigm that proved its efficiency on solving complex industrial problems. Most reallife problems are stochastic in nature, which is usually taken into account through different compromises, such as applying a deterministic algori ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
Abstract. Constraint Programming (CP) is a very general programming paradigm that proved its efficiency on solving complex industrial problems. Most reallife problems are stochastic in nature, which is usually taken into account through different compromises, such as applying a deterministic algorithm to the average values of the input, or performing multiple runs of simulation. Our goal in this paper is to analyze different techniques taken either from practical CP applications or from stochastic optimization approaches. We propose a benchmark issued from our industrial experience, which may be described as an OnLine MultiChoice Knapsack with Deadlines. This benchmark is used to test a framework with four different dynamic strategies that utilize a different combination of the stochastic and combinatorial aspects of the problem. To evaluate the expected future state of the reservations at the time horizon, we either use simulation, average values, systematic study of the most probable scenarios, or yield management techniques. 1.
An MCMC Approach to Solving Hybrid Factored MDPs
 In Proceedings of the 19th International Joint Conference on Artificial Intelligence
, 2005
"... Hybrid approximate linear programming (HALP) has recently emerged as a promising framework for solving large factored Markov decision processes (MDPs) with discrete and continuous state and action variables. Our work addresses its major computational bottleneck  constraint satisfaction in large str ..."
Abstract

Cited by 16 (8 self)
 Add to MetaCart
Hybrid approximate linear programming (HALP) has recently emerged as a promising framework for solving large factored Markov decision processes (MDPs) with discrete and continuous state and action variables. Our work addresses its major computational bottleneck  constraint satisfaction in large structured domains of discrete and continuous variables. We analyze this problem and propose a novel Markov chain Monte Carlo (MCMC) method for finding the most violated constraint of a relaxed HALP. This method does not require the discretization of continuous variables, searches the space of constraints intelligently based on the structure of factored MDPs, and its space complexity is linear in the number of variables. We test the method on a set of large control problems and demonstrate improvements over alternative approaches.
Correlated action effects in decision theoretic regression
 In Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence (UAI97
, 1997
"... Much recent research in decision theoretic planning has adopted Markov decision processes (MDPs) as the model of choice, and has attempted to make their solution more tractable by exploiting problem structure. One particular algorithm, structured policy construction achieves this by means of a decis ..."
Abstract

Cited by 16 (6 self)
 Add to MetaCart
Much recent research in decision theoretic planning has adopted Markov decision processes (MDPs) as the model of choice, and has attempted to make their solution more tractable by exploiting problem structure. One particular algorithm, structured policy construction achieves this by means of a decision theoretic analog of goal regression, using action descriptions based on Bayesian networks with treestructured conditional probability tables. The algorithm as presented is not able to deal with actions with correlated effects. We describe a new decision theoretic regression operator that corrects this weakness. While conceptually straightforward, this extension requires a somewhat more complicated technical approach. 1
Firstorder decisiontheoretic planning in structured relational environments
, 2008
"... We consider the general framework of firstorder decisiontheoretic planning in structured relational environments. Most traditional solution approaches to these planning problems ground the relational specification w.r.t. a specific domain instantiation and apply a solution approach directly to the ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
We consider the general framework of firstorder decisiontheoretic planning in structured relational environments. Most traditional solution approaches to these planning problems ground the relational specification w.r.t. a specific domain instantiation and apply a solution approach directly to the resulting ground Markov decision process (MDP). Unfortunately, the space and time complexity of these solution algorithms scale linearly with the domain size in the best case and exponentially in the worst case. An alternate approach to grounding a relational planning problem is to lift it to a firstorder MDP (FOMDP) specification. This FOMDP can then be solved directly, resulting in a domainindependent solution whose space and time complexity either do not scale with domain size or can scale sublinearly in the domain size. However, such generality does not come without its own set of challenges and the first purpose of this thesis is to explore exact and approximate solution techniques for practically solving FOMDPs. The second purpose of this thesis is to extend the FOMDP specification to succinctly capture factored actions and additive rewards while extending the exact and approximate solution techniques to directly exploit this structure. In addition, we provide a proof of correctness of the firstorder symbolic dynamic programming approach w.r.t. its wellstudied ground MDP