Results 1  10
of
225
Multiagent Planning with Factored MDPs
 In NIPS14
, 2001
"... We present a principled and efficient planning algorithm for cooperative multiagent dynamic systems. A striking feature of our method is that the coordination and communication between the agents is not imposed, but derived directly from the system dynamics and function approximation architecture ..."
Abstract

Cited by 176 (15 self)
 Add to MetaCart
(Show Context)
We present a principled and efficient planning algorithm for cooperative multiagent dynamic systems. A striking feature of our method is that the coordination and communication between the agents is not imposed, but derived directly from the system dynamics and function approximation architecture. We view the entire multiagent system as a single, large Markov decision process (MDP), which we assume can be represented in a factored way using a dynamic Bayesian network (DBN). The action space of the resulting MDP is the joint action space of the entire set of agents. Our approach is based on the use of factored linear value functions as an approximation to the joint value function. This factorization of the value function allows the agents to coordinate their actions at runtime using a natural message passing scheme. We provide a simple and efficient method for computing such an approximate value function by solving a single linear program, whose size is determined by the interaction between the value function structure and the DBN. We thereby avoid the exponential blowup in the state and action space. We show that our approach compares favorably with approaches based on reward sharing. We also show that our algorithm is an efficient alternative to more complicated algorithms even in the single agent case.
Efficient Solution Algorithms for Factored MDPs
, 2003
"... This paper addresses the problem of planning under uncertainty in large Markov Decision Processes (MDPs). Factored MDPs represent a complex state space using state variables and the transition model using a dynamic Bayesian network. This representation often allows an exponential reduction in the re ..."
Abstract

Cited by 172 (3 self)
 Add to MetaCart
This paper addresses the problem of planning under uncertainty in large Markov Decision Processes (MDPs). Factored MDPs represent a complex state space using state variables and the transition model using a dynamic Bayesian network. This representation often allows an exponential reduction in the representation size of structured MDPs, but the complexity of exact solution algorithms for such MDPs can grow exponentially in the representation size. In this paper, we present two approximate solution algorithms that exploit structure in factored MDPs. Both use an approximate value function represented as a linear combination of basis functions, where each basis function involves only a small subset of the domain variables. A key contribution of this paper is that it shows how the basic operations of both algorithms can be performed efficiently in closed form, by exploiting both additive and contextspecific structure in a factored MDP. A central element of our algorithms is a novel linear program decomposition technique, analogous to variable elimination in Bayesian networks, which reduces an exponentially large LP to a provably equivalent, polynomialsized one. One algorithm uses approximate linear programming, and the second approximate dynamic programming. Our dynamic programming algorithm is novel in that it uses an approximation based on maxnorm, a technique that more directly minimizes the terms that appear in error bounds for approximate MDP algorithms. We provide experimental results on problems with over 10^40 states, demonstrating a promising indication of the scalability of our approach, and compare our algorithm to an existing stateoftheart approach, showing, in some problems, exponential gains in computation time.
On constraint sampling in the linear programming approach to approximate dynamic programming
 Mathematics of Operations Research
, 2004
"... doi 10.1287/moor.1040.0094 ..."
Protovalue functions: A laplacian framework for learning representation and control in markov decision processes
 Journal of Machine Learning Research
, 2006
"... This paper introduces a novel spectral framework for solving Markov decision processes (MDPs) by jointly learning representations and optimal policies. The major components of the framework described in this paper include: (i) A general scheme for constructing representations or basis functions by d ..."
Abstract

Cited by 92 (10 self)
 Add to MetaCart
This paper introduces a novel spectral framework for solving Markov decision processes (MDPs) by jointly learning representations and optimal policies. The major components of the framework described in this paper include: (i) A general scheme for constructing representations or basis functions by diagonalizing symmetric diffusion operators (ii) A specific instantiation of this approach where global basis functions called protovalue functions (PVFs) are formed using the eigenvectors of the graph Laplacian on an undirected graph formed from state transitions induced by the MDP (iii) A threephased procedure called representation policy iteration comprising of a sample collection phase, a representation learning phase that constructs basis functions from samples, and a final parameter estimation phase that determines an (approximately) optimal policy within the (linear) subspace spanned by the (current) basis functions. (iv) A specific instantiation of the RPI framework using leastsquares policy iteration (LSPI) as the parameter estimation method (v) Several strategies for scaling the proposed approach to large discrete and continuous state spaces, including the Nyström extension for outofsample interpolation of eigenfunctions, and the use of Kronecker sum factorization to construct compact eigenfunctions in product spaces such as factored MDPs (vi) Finally, a series of illustrative discrete and continuous control tasks, which both illustrate the concepts and provide a benchmark for evaluating the proposed approach. Many challenges remain to be addressed in scaling the proposed framework to large MDPs, and several elaboration of the proposed framework are briefly summarized at the end.
Robust Dynamic Programming
 Math. Oper. Res
, 2004
"... In this paper we propose a robust formulation for discrete time dynamic programming (DP). The objective of the robust formulation is to systematically mitigate the sensitivity of the DP optimal policy to ambiguity in the underlying transition probabilities. The ambiguity is modeled by associating ..."
Abstract

Cited by 70 (1 self)
 Add to MetaCart
In this paper we propose a robust formulation for discrete time dynamic programming (DP). The objective of the robust formulation is to systematically mitigate the sensitivity of the DP optimal policy to ambiguity in the underlying transition probabilities. The ambiguity is modeled by associating a set of conditional measures with each stateaction pair. Consequently, in the robust formulation each policy has a set of measures associated with it. We prove that when this set of measures has a certain "Rectangularity" property all the main results for finite and infinite horizon DP extend to natural robust counterparts. We identify families of sets of conditional measures for which the computational complexity of solving the robust DP is only modestly larger than solving the DP, typically logarithmic in the size of the state space. These families of sets are constructed from the confidence regions associated with density estimation, and therefore, can be chosen to guarantee any desired level of confidence in the robust optimal policy. Moreover, the sets can be easily parameterized from historical data. We contrast the performance of robust and nonrobust DP on small numerical examples.
Relative Entropy Policy Search
"... Policy search is a successful approach to reinforcement learning. However, policy improvements often result in the loss of information. Hence, it has been marred by premature convergence and implausible solutions. As first suggested in the context of covariant policy gradients (Bagnell and Schneider ..."
Abstract

Cited by 48 (19 self)
 Add to MetaCart
Policy search is a successful approach to reinforcement learning. However, policy improvements often result in the loss of information. Hence, it has been marred by premature convergence and implausible solutions. As first suggested in the context of covariant policy gradients (Bagnell and Schneider 2003), many of these problems may be addressed by constraining the information loss. In this paper, we continue this path of reasoning and suggest the Relative Entropy Policy Search (REPS) method. The resulting method differs significantly from previous policy gradient approaches and yields an exact update step. It works well on typical reinforcement learning benchmark problems.
A pricedirected approach to stochastic inventory/routing
 Operations Research
, 2002
"... informs ® doi 10.1287/opre.1040.0114 © 2004 INFORMS We consider a new approach to stochastic inventory/routing that approximates the future costs of current actions using optimal dual prices of a linear program. We obtain two such linear programs by formulating the control problem as a Markov decisi ..."
Abstract

Cited by 42 (2 self)
 Add to MetaCart
informs ® doi 10.1287/opre.1040.0114 © 2004 INFORMS We consider a new approach to stochastic inventory/routing that approximates the future costs of current actions using optimal dual prices of a linear program. We obtain two such linear programs by formulating the control problem as a Markov decision process and then replacing the optimal value function with the sum of singlecustomer inventory value functions. The resulting approximation yields statewise lower bounds on optimal infinitehorizon discounted costs. We present a linear program that takes into account inventory dynamics and economics in allocating transportation costs for stochastic inventory routing. On test instances we find that these allocations do not introduce any error in the value function approximations relative to the best approximations that can be achieved without them. Also, unlike other approaches, we do not restrict the set of allowable vehicle itineraries in any way. Instead, we develop an efficient algorithm to both generate and eliminate itineraries during solution of the linear programs and control policy. In simulation experiments, the pricedirected policy outperforms other policies from the literature. Subject classifications: dynamic programming/optimal control, discounted infinitehorizon: separable functional
B (2004) Linear program approximations for factored continuousstate Markov decision processes
"... Approximate linear programming (ALP) has emerged recently as one of the most promising methods for solving complex factored MDPs with nite state spaces. In this work we show that ALP solutions are not limited only to MDPs with nite state spaces, but that they can also be applied successfully to fact ..."
Abstract

Cited by 41 (12 self)
 Add to MetaCart
(Show Context)
Approximate linear programming (ALP) has emerged recently as one of the most promising methods for solving complex factored MDPs with nite state spaces. In this work we show that ALP solutions are not limited only to MDPs with nite state spaces, but that they can also be applied successfully to factored continuousstate MDPs (CMDPs). We show how one can build an ALPbased approximation for such a model and contrast it to existing solution methods. We argue that this approach offers a robust alternative for solving high dimensional continuousstate space problems. The point is supported by experiments on three CMDP problems with 2425 continuous state factors. 1