Results 1  10
of
54
DecisionTheoretic Planning: Structural Assumptions and Computational Leverage
 JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 1999
"... Planning under uncertainty is a central problem in the study of automated sequential decision making, and has been addressed by researchers in many different fields, including AI planning, decision analysis, operations research, control theory and economics. While the assumptions and perspectives ..."
Abstract

Cited by 417 (4 self)
 Add to MetaCart
Planning under uncertainty is a central problem in the study of automated sequential decision making, and has been addressed by researchers in many different fields, including AI planning, decision analysis, operations research, control theory and economics. While the assumptions and perspectives adopted in these areas often differ in substantial ways, many planning problems of interest to researchers in these fields can be modeled as Markov decision processes (MDPs) and analyzed using the techniques of decision theory. This paper presents an overview and synthesis of MDPrelated methods, showing how they provide a unifying framework for modeling many classes of planning problems studied in AI. It also describes structural properties of MDPs that, when exhibited by particular classes of problems, can be exploited in the construction of optimal or approximately optimal policies or plans. Planning problems commonly possess structure in the reward and value functions used to de...
Stochastic Dynamic Programming with Factored Representations
, 1997
"... Markov decision processes(MDPs) have proven to be popular models for decisiontheoretic planning, but standard dynamic programming algorithms for solving MDPs rely on explicit, statebased specifications and computations. To alleviate the combinatorial problems associated with such methods, we propo ..."
Abstract

Cited by 145 (10 self)
 Add to MetaCart
Markov decision processes(MDPs) have proven to be popular models for decisiontheoretic planning, but standard dynamic programming algorithms for solving MDPs rely on explicit, statebased specifications and computations. To alleviate the combinatorial problems associated with such methods, we propose new representational and computational techniques for MDPs that exploit certain types of problem structure. We use dynamic Bayesian networks (with decision trees representing the local families of conditional probability distributions) to represent stochastic actions in an MDP, together with a decisiontree representation of rewards. Based on this representation, we develop versions of standard dynamic programming algorithms that directly manipulate decisiontree representations of policies and value functions. This generally obviates the need for statebystate computation, aggregating states at the leaves of these trees and requiring computations only for each aggregate state. The key to these algorithms is a decisiontheoretic generalization of classic regression analysis, in which we determine the features relevant to predicting expected value. We demonstrate the method empirically on several planning problems,
Efficient Solution Algorithms for Factored MDPs
, 2003
"... This paper addresses the problem of planning under uncertainty in large Markov Decision Processes (MDPs). Factored MDPs represent a complex state space using state variables and the transition model using a dynamic Bayesian network. This representation often allows an exponential reduction in the re ..."
Abstract

Cited by 129 (4 self)
 Add to MetaCart
This paper addresses the problem of planning under uncertainty in large Markov Decision Processes (MDPs). Factored MDPs represent a complex state space using state variables and the transition model using a dynamic Bayesian network. This representation often allows an exponential reduction in the representation size of structured MDPs, but the complexity of exact solution algorithms for such MDPs can grow exponentially in the representation size. In this paper, we present two approximate solution algorithms that exploit structure in factored MDPs. Both use an approximate value function represented as a linear combination of basis functions, where each basis function involves only a small subset of the domain variables. A key contribution of this paper is that it shows how the basic operations of both algorithms can be performed efficiently in closed form, by exploiting both additive and contextspecific structure in a factored MDP. A central element of our algorithms is a novel linear program decomposition technique, analogous to variable elimination in Bayesian networks, which reduces an exponentially large LP to a provably equivalent, polynomialsized one. One algorithm uses approximate linear programming, and the second approximate dynamic programming. Our dynamic programming algorithm is novel in that it uses an approximation based on maxnorm, a technique that more directly minimizes the terms that appear in error bounds for approximate MDP algorithms. We provide experimental results on problems with over 10^40 states, demonstrating a promising indication of the scalability of our approach, and compare our algorithm to an existing stateoftheart approach, showing, in some problems, exponential gains in computation time.
Planning under continuous time and resource uncertainty: A challenge for AI
 In Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence
, 2002
"... yQSS Group Inc. zQSS Group Inc. xRIACS experiment is assigned a scientific value). Different observations and experiments take differing amounts of time and consume differing amounts of power and data storage.There are, in general, a number of constraints that govern the rovers activities: ffl Ther ..."
Abstract

Cited by 102 (16 self)
 Add to MetaCart
yQSS Group Inc. zQSS Group Inc. xRIACS experiment is assigned a scientific value). Different observations and experiments take differing amounts of time and consume differing amounts of power and data storage.There are, in general, a number of constraints that govern the rovers activities: ffl There are time, power, data storage, and positioning constraints for performing different activities. Time constraints often result from illuminationrequirementthat is, experiments may require that a target rock or sample be illuminated with a certain intensity, or from a certain angle.
Computing factored value functions for policies in structured MDPs
 In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
, 1999
"... Many large Markov decision processes (MDPs) can be represented compactly using a structured representation such as a dynamic Bayesian network. Unfortunately, the compact representation does not help standard MDP algorithms, because the value function for the MDP does not retain the structure of the ..."
Abstract

Cited by 95 (11 self)
 Add to MetaCart
Many large Markov decision processes (MDPs) can be represented compactly using a structured representation such as a dynamic Bayesian network. Unfortunately, the compact representation does not help standard MDP algorithms, because the value function for the MDP does not retain the structure of the process description. We argue that in many such MDPs, structure is approximately retained. That is, the value functions are nearly additive: closely approximated by a linear function over factors associated with small subsets of problem features. Based on this idea, we present a convergent, approximate value determination algorithm for structured MDPs. The algorithm maintains an additive value function, alternating dynamic programming steps with steps that project the result back into the restricted space of additive functions. We show that both the dynamic programming and the projection steps can be computed efficiently, despite the fact that the number of states is exponential in the numbe...
Solving very large weakly coupled Markov decision processes
 In Proceedings of the Fifteenth National Conference on Artificial Intelligence
, 1998
"... We present a technique for computing approximately optimal solutions to stochastic resource allocation problems modeled as Markov decision processes (MDPs). We exploit two key properties to avoid explicitly enumerating the very large state and action spaces associated with these problems. First, the ..."
Abstract

Cited by 81 (11 self)
 Add to MetaCart
We present a technique for computing approximately optimal solutions to stochastic resource allocation problems modeled as Markov decision processes (MDPs). We exploit two key properties to avoid explicitly enumerating the very large state and action spaces associated with these problems. First, the problems are composed of multiple tasks whose utilities are independent. Second, the actions taken with respect to (or resources allocated to) a task do not influence the status of any other task. We can therefore view each task as an MDP. However, these MDPs are weakly coupled by resource constraints: actions selected for one MDP restrict the actions available to others. We describe heuristic techniques for dealing with several classes of constraints that use the solutions for individual MDPs to construct an approximate global solution. We demonstrate this technique on problems involving thousandsof tasks, approximating the solution to problems that are far beyond the reach of standard methods. 1
Distributed Value Functions
 In Proceedings of the Sixteenth International Conference on Machine Learning
, 1999
"... Many interesting problems, such as power grids, network switches, and traffic flow, that are candidates for solving with reinforcement learning (RL), also have properties that make distributed solutions desirable. We propose an algorithm for distributed reinforcement learning based on distributing t ..."
Abstract

Cited by 51 (1 self)
 Add to MetaCart
Many interesting problems, such as power grids, network switches, and traffic flow, that are candidates for solving with reinforcement learning (RL), also have properties that make distributed solutions desirable. We propose an algorithm for distributed reinforcement learning based on distributing the representation of the value function across nodes. Each node in the system only has the ability to sense state locally, choose actions locally, and receive reward locally (the goal of the system is to maximize the sum of the rewards over all nodes and over all time). However each node is allowed to give its neighbors the current estimate of its value function for the states it passes through. We present a value function learning rule, using that information, that allows each node to learn a value function that is an estimate of a weighted sum of future rewards for all the nodes in the network. With this representation, each node can choose actions to improve the performance of the overall...
A hybrid reinforcement learning approach to autonomic resource allocation
 In Proc. of ICAC06
, 2006
"... Abstract — Reinforcement Learning (RL) provides a promising new approach to systems performance management that differs radically from standard queuingtheoretic approaches making use of explicit system performance models. In principle, RL can automatically learn highquality management policies wit ..."
Abstract

Cited by 35 (5 self)
 Add to MetaCart
Abstract — Reinforcement Learning (RL) provides a promising new approach to systems performance management that differs radically from standard queuingtheoretic approaches making use of explicit system performance models. In principle, RL can automatically learn highquality management policies without an explicit performance model or traffic model, and with little or no builtin system specific knowledge. In our original work [1], [2], [3] we showed the feasibility of using online RL to learn resource valuation estimates (in lookup table form) which can be used to make highquality server allocation decisions in a multiapplication prototype Data Center scenario. The present work shows how to combine the strengths of both RL and queuing models in a hybrid approach, in which RL trains offline on data collected while a queuing model policy controls the system. By training offline we avoid suffering potentially poor performance in live online training. We also now use RL to train nonlinear function approximators (e.g. multilayer perceptrons) instead of lookup tables; this enables scaling to substantially larger state spaces. Our results now show that, in both openloop and closedloop traffic, hybrid RL training can achieve significant performance improvements over a variety of initial modelbased policies. We also find that, as expected, RL can deal effectively with both transients and switching delays, which lie outside the scope of traditional steadystate queuing theory. I.
Distributed Planning in Hierarchical Factored MDPs
 IN PROCEEDINGS OF THE EIGHTEENTH CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE
, 2002
"... ..."
Piecewise Linear Value Function Approximation for Factored MDPs
 In Proceedings of the Eighteenth National Conference on AI
, 2002
"... A number of proposals have been put forth in recent years for the solution of Markov decision processes (MDPs) whose state (and sometimes action) spaces are factored. ..."
Abstract

Cited by 24 (6 self)
 Add to MetaCart
A number of proposals have been put forth in recent years for the solution of Markov decision processes (MDPs) whose state (and sometimes action) spaces are factored.