Results 1  10
of
113
Multiagent Planning with Factored MDPs
 In NIPS14
, 2001
"... We present a principled and efficient planning algorithm for cooperative multiagent dynamic systems. A striking feature of our method is that the coordination and communication between the agents is not imposed, but derived directly from the system dynamics and function approximation architecture ..."
Abstract

Cited by 143 (16 self)
 Add to MetaCart
We present a principled and efficient planning algorithm for cooperative multiagent dynamic systems. A striking feature of our method is that the coordination and communication between the agents is not imposed, but derived directly from the system dynamics and function approximation architecture. We view the entire multiagent system as a single, large Markov decision process (MDP), which we assume can be represented in a factored way using a dynamic Bayesian network (DBN). The action space of the resulting MDP is the joint action space of the entire set of agents. Our approach is based on the use of factored linear value functions as an approximation to the joint value function. This factorization of the value function allows the agents to coordinate their actions at runtime using a natural message passing scheme. We provide a simple and efficient method for computing such an approximate value function by solving a single linear program, whose size is determined by the interaction between the value function structure and the DBN. We thereby avoid the exponential blowup in the state and action space. We show that our approach compares favorably with approaches based on reward sharing. We also show that our algorithm is an efficient alternative to more complicated algorithms even in the single agent case.
Efficient Solution Algorithms for Factored MDPs
, 2003
"... This paper addresses the problem of planning under uncertainty in large Markov Decision Processes (MDPs). Factored MDPs represent a complex state space using state variables and the transition model using a dynamic Bayesian network. This representation often allows an exponential reduction in the re ..."
Abstract

Cited by 129 (4 self)
 Add to MetaCart
This paper addresses the problem of planning under uncertainty in large Markov Decision Processes (MDPs). Factored MDPs represent a complex state space using state variables and the transition model using a dynamic Bayesian network. This representation often allows an exponential reduction in the representation size of structured MDPs, but the complexity of exact solution algorithms for such MDPs can grow exponentially in the representation size. In this paper, we present two approximate solution algorithms that exploit structure in factored MDPs. Both use an approximate value function represented as a linear combination of basis functions, where each basis function involves only a small subset of the domain variables. A key contribution of this paper is that it shows how the basic operations of both algorithms can be performed efficiently in closed form, by exploiting both additive and contextspecific structure in a factored MDP. A central element of our algorithms is a novel linear program decomposition technique, analogous to variable elimination in Bayesian networks, which reduces an exponentially large LP to a provably equivalent, polynomialsized one. One algorithm uses approximate linear programming, and the second approximate dynamic programming. Our dynamic programming algorithm is novel in that it uses an approximation based on maxnorm, a technique that more directly minimizes the terms that appear in error bounds for approximate MDP algorithms. We provide experimental results on problems with over 10^40 states, demonstrating a promising indication of the scalability of our approach, and compare our algorithm to an existing stateoftheart approach, showing, in some problems, exponential gains in computation time.
On constraint sampling in the linear programming approach to approximate dynamic programming
 Mathematics of Operations Research
, 2004
"... doi 10.1287/moor.1040.0094 ..."
Protovalue functions: A laplacian framework for learning representation and control in markov decision processes
 Journal of Machine Learning Research
, 2006
"... This paper introduces a novel spectral framework for solving Markov decision processes (MDPs) by jointly learning representations and optimal policies. The major components of the framework described in this paper include: (i) A general scheme for constructing representations or basis functions by d ..."
Abstract

Cited by 67 (9 self)
 Add to MetaCart
This paper introduces a novel spectral framework for solving Markov decision processes (MDPs) by jointly learning representations and optimal policies. The major components of the framework described in this paper include: (i) A general scheme for constructing representations or basis functions by diagonalizing symmetric diffusion operators (ii) A specific instantiation of this approach where global basis functions called protovalue functions (PVFs) are formed using the eigenvectors of the graph Laplacian on an undirected graph formed from state transitions induced by the MDP (iii) A threephased procedure called representation policy iteration comprising of a sample collection phase, a representation learning phase that constructs basis functions from samples, and a final parameter estimation phase that determines an (approximately) optimal policy within the (linear) subspace spanned by the (current) basis functions. (iv) A specific instantiation of the RPI framework using leastsquares policy iteration (LSPI) as the parameter estimation method (v) Several strategies for scaling the proposed approach to large discrete and continuous state spaces, including the Nyström extension for outofsample interpolation of eigenfunctions, and the use of Kronecker sum factorization to construct compact eigenfunctions in product spaces such as factored MDPs (vi) Finally, a series of illustrative discrete and continuous control tasks, which both illustrate the concepts and provide a benchmark for evaluating the proposed approach. Many challenges remain to be addressed in scaling the proposed framework to large MDPs, and several elaboration of the proposed framework are briefly summarized at the end.
Robust Dynamic Programming
 Math. Oper. Res
, 2004
"... In this paper we propose a robust formulation for discrete time dynamic programming (DP). The objective of the robust formulation is to systematically mitigate the sensitivity of the DP optimal policy to ambiguity in the underlying transition probabilities. The ambiguity is modeled by associating ..."
Abstract

Cited by 38 (1 self)
 Add to MetaCart
In this paper we propose a robust formulation for discrete time dynamic programming (DP). The objective of the robust formulation is to systematically mitigate the sensitivity of the DP optimal policy to ambiguity in the underlying transition probabilities. The ambiguity is modeled by associating a set of conditional measures with each stateaction pair. Consequently, in the robust formulation each policy has a set of measures associated with it. We prove that when this set of measures has a certain "Rectangularity" property all the main results for finite and infinite horizon DP extend to natural robust counterparts. We identify families of sets of conditional measures for which the computational complexity of solving the robust DP is only modestly larger than solving the DP, typically logarithmic in the size of the state space. These families of sets are constructed from the confidence regions associated with density estimation, and therefore, can be chosen to guarantee any desired level of confidence in the robust optimal policy. Moreover, the sets can be easily parameterized from historical data. We contrast the performance of robust and nonrobust DP on small numerical examples.
Greedy linear valueapproximation for factored Markov decision processes
 In Proceedings of the 18th National Conference on Artificial Intelligence
, 2002
"... Significant recent work has focused on using linear representations to approximate value functions for factored Markov decision processes (MDPs). Current research has adopted linear programming as an effective means to calculate approximations for a given set of basis functions, tackling very la ..."
Abstract

Cited by 30 (7 self)
 Add to MetaCart
Significant recent work has focused on using linear representations to approximate value functions for factored Markov decision processes (MDPs). Current research has adopted linear programming as an effective means to calculate approximations for a given set of basis functions, tackling very large MDPs as a result. However, a number of issues remain unresolved: How accurate are the approximations produced by linear programs? How hard is it to produce better approximations ? and Where do the basis functions come from? To address these questions, we first investigate the complexity of minimizing the Bellman error of a linear value function approximationshowing that this is an inherently hard problem.
A pricedirected approach to stochastic inventory/routing
 Operations Research
, 2002
"... informs ® doi 10.1287/opre.1040.0114 © 2004 INFORMS We consider a new approach to stochastic inventory/routing that approximates the future costs of current actions using optimal dual prices of a linear program. We obtain two such linear programs by formulating the control problem as a Markov decisi ..."
Abstract

Cited by 30 (2 self)
 Add to MetaCart
informs ® doi 10.1287/opre.1040.0114 © 2004 INFORMS We consider a new approach to stochastic inventory/routing that approximates the future costs of current actions using optimal dual prices of a linear program. We obtain two such linear programs by formulating the control problem as a Markov decision process and then replacing the optimal value function with the sum of singlecustomer inventory value functions. The resulting approximation yields statewise lower bounds on optimal infinitehorizon discounted costs. We present a linear program that takes into account inventory dynamics and economics in allocating transportation costs for stochastic inventory routing. On test instances we find that these allocations do not introduce any error in the value function approximations relative to the best approximations that can be achieved without them. Also, unlike other approaches, we do not restrict the set of allowable vehicle itineraries in any way. Instead, we develop an efficient algorithm to both generate and eliminate itineraries during solution of the linear programs and control policy. In simulation experiments, the pricedirected policy outperforms other policies from the literature. Subject classifications: dynamic programming/optimal control, discounted infinitehorizon: separable functional
Solving Factored MDPs with Continuous and Discrete Variables
 IN PROCEEDINGS OF THE 20TH CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE
, 2004
"... Although many realworld stochastic planning problems are more naturally formulated by hybrid models with both discrete and continuous variables, current stateoftheart methods cannot adequately address these problems. We present the first framework that can exploit problem structure for modeling ..."
Abstract

Cited by 27 (8 self)
 Add to MetaCart
Although many realworld stochastic planning problems are more naturally formulated by hybrid models with both discrete and continuous variables, current stateoftheart methods cannot adequately address these problems. We present the first framework that can exploit problem structure for modeling and solving hybrid problems efficiently. We formulate these problems as hybrid Markov decision processes (MDPs with continuous and discrete state and action variables), which we assume can be represented in a factored way using a hybrid dynamic Bayesian network (hybrid DBN). This formulation also allows us to apply our methods to collaborative multiagent settings. We present a new linear program approximation method that exploits the structure of the hybrid MDP and lets us compute approximate value functions more efficiently. In particular, we describe a new factored discretization of continuous variables that avoids the exponential blowup of traditional approaches. We provide theoretical bounds on the quality of such an approximation and on its scaleup potential. We support our theoretical arguments with experiments on a set of control problems with up to 28dimensional continuous state space and 22dimensional action space.