Results 1  10
of
74
From One to Many: Planning for Loosely Coupled MultiAgent Systems
"... Loosely coupled multiagent systems are perceived as easier to plan for because they require less coordination between agent subplans. In this paper we set out to formalize this intuition. We establish an upper bound on the complexity of multiagent planning problems that depends exponentially on t ..."
Abstract

Cited by 31 (5 self)
 Add to MetaCart
Loosely coupled multiagent systems are perceived as easier to plan for because they require less coordination between agent subplans. In this paper we set out to formalize this intuition. We establish an upper bound on the complexity of multiagent planning problems that depends exponentially on two parameters quantifying the level of agents ’ coupling, and on these parameters only. The first parameter is problemindependent, and it measures the inherent level of coupling within the system. The second is problemspecific and it has to do with the minmax number of actioncommitments per agent required to solve the problem. Most importantly, the direct dependence on the number of agents, on the overall size of the problem, and on the length of the agents ’ plans, is only polynomial. This result is obtained using a new algorithmic methodology which we call “planning as CSP+planning”. We believe this to be one of the first formal results to both quantify the notion of agents ’ coupling, and to demonstrate a multiagent planning algorithm that, for fixed coupling levels, scales polynomially with the size of the problem.
Prottle: A probabilistic temporal planner
 In AAAI’05
, 2005
"... Planning with concurrent durative actions and probabilistic effects, or probabilistic temporal planning, is a relatively new area of research. The challenge is to replicate the success of modern temporal and probabilistic planners with domains that exhibit an interaction between time and uncertainty ..."
Abstract

Cited by 25 (5 self)
 Add to MetaCart
Planning with concurrent durative actions and probabilistic effects, or probabilistic temporal planning, is a relatively new area of research. The challenge is to replicate the success of modern temporal and probabilistic planners with domains that exhibit an interaction between time and uncertainty. We present a general framework for probabilistic temporal planning in which effects, the time at which they occur, and action durations are all probabilistic. This framework includes a search space that is designed for solving probabilistic temporal planning problems via heuristic search, an algorithm that has been tailored to work with it, and an effective heuristic based on an extension of the planning graph data structure. Prottle is a planner that implements this framework, and can solve problems expressed in an extension of PDDL.
Concurrent probabilistic temporal planning
 In Proc. ICAPS
, 2005
"... Probabilistic planning problems are often modeled as Markov decision processes (MDPs), which assume that a single action is executed per decision epoch and that actions take unit time. However, in the real world it is common to execute several actions in parallel, and the durations of these actions ..."
Abstract

Cited by 22 (2 self)
 Add to MetaCart
Probabilistic planning problems are often modeled as Markov decision processes (MDPs), which assume that a single action is executed per decision epoch and that actions take unit time. However, in the real world it is common to execute several actions in parallel, and the durations of these actions may differ. This paper presents efficient methods for solving probabilistic planning problems with concurrent, durative actions. We adapt the formulation of Concurrent MDPs, MDPs which allow multiple instantaneous actions to be executed simultaneously. We add explicit action durations into the concurrent MDP model by encoding the problem as a concurrent MDP in an augmented state space. We present two novel admissible heuristics and one inadmissible heuristic to speed up the basic concurrent MDP algorithm. We also develop a novel notion of hybridizing an optimal and an approximate algorithm to yield a hybrid algorithm, which quickly generates highquality policies. Experiments show that all our heuristics speedup the policy construction significantly. Furthermore, our approximate hybrid algorithm runs up to two orders of magnitude faster than other methods, while producing policies whose makespans are typically within 5 % of optimal. 1.
Solving Concurrent Markov Decision Processes
, 2004
"... Typically, Markov decision problems (MDPs) assume a single action is executed per decision epoch, but in the real world one may frequently execute certain actions in parallel. This paper explores concurrent MDPs, MDPs which allow multiple nonconflicting actions to be executed simultaneously, a ..."
Abstract

Cited by 20 (2 self)
 Add to MetaCart
Typically, Markov decision problems (MDPs) assume a single action is executed per decision epoch, but in the real world one may frequently execute certain actions in parallel. This paper explores concurrent MDPs, MDPs which allow multiple nonconflicting actions to be executed simultaneously, and presents two new algorithms. Our first approach exploits two provably sound pruning rules, and thus guarantees solution optimality. Our second technique is a fast, samplingbased algorithm, which produces closetooptimal solutions extremely quickly. Experiments show that our approaches outperform the existing algorithms producing up to two orders of magnitude speedup.
Optimal resource allocation and policy formulation in looselycoupled Markov decision processes
 In Proceedings of the Fourteenth International Conference on Automated Planning and Scheduling (ICAPS04
, 2004
"... The problem of optimal policy formulation for teams of resourcelimited agents in stochastic environments is composed of two stronglycoupled subproblems: a resource allocation problem and a policy optimization problem. We show how to combine the two problems into a single constrained optimization p ..."
Abstract

Cited by 19 (12 self)
 Add to MetaCart
The problem of optimal policy formulation for teams of resourcelimited agents in stochastic environments is composed of two stronglycoupled subproblems: a resource allocation problem and a policy optimization problem. We show how to combine the two problems into a single constrained optimization problem that yields optimal resource allocations and policies that are optimal under these allocations. We model the system as a multiagent Markov decision process (MDP), with social welfare of the group as the optimization criterion. The straightforward approach of modeling both the resource allocation and the actual operation of the agents as a multiagent MDP on the joint state and action spaces of all agents is not feasible, because of the exponential increase in the size of the state space. As an alternative, we describe a technique that exploits problem structure by recognizing that agents are only looselycoupled via the shared resource constraints. This allows us to formulate a constrained policy optimization problem that yields optimal policies among the class of realizable ones given the shared resource limitations. Although our complexity analysis shows the constrained optimization problem to be NPcomplete, our results demonstrate that, by exploiting problem structure and via a reduction to a mixed integer program, we are able to solve problems orders of magnitude larger than what is possible using a traditional multiagent MDP formulation.
A unifying framework for computational reinforcement learning theory
, 2009
"... Computational learning theory studies mathematical models that allow one to formally analyze and compare the performance of supervisedlearning algorithms such as their sample complexity. While existing models such as PAC (Probably Approximately Correct) have played an influential role in understand ..."
Abstract

Cited by 18 (6 self)
 Add to MetaCart
Computational learning theory studies mathematical models that allow one to formally analyze and compare the performance of supervisedlearning algorithms such as their sample complexity. While existing models such as PAC (Probably Approximately Correct) have played an influential role in understanding the nature of supervised learning, they have not been as successful in reinforcement learning (RL). Here, the fundamental barrier is the need for active exploration in sequential decision problems. An RL agent tries to maximize longterm utility by exploiting its knowledge about the problem, but this knowledge has to be acquired by the agent itself through exploring the problem that may reduce shortterm utility. The need for active exploration is common in many problems in daily life, engineering, and sciences. For example, a Backgammon program strives to take good moves to maximize the probability of winning a game, but sometimes it may try novel and possibly harmful moves to discover how the opponent reacts in the hope of discovering a better gameplaying strategy. It has been known since the early days of RL that a good tradeoff between exploration and exploitation is critical for the agent to learn fast (i.e., to reach nearoptimal strategies
Integrated demonstration of instrument placement, robust execution and contingent planning
 In Proc. of iSAIRAS
, 2003
"... This paper describes an integrated demonstration of autonomous instrument placement, robust execution and groundbased contingent planning for the efficient exploration of a site by a prototype Mars rover. 1. ..."
Abstract

Cited by 17 (8 self)
 Add to MetaCart
This paper describes an integrated demonstration of autonomous instrument placement, robust execution and groundbased contingent planning for the efficient exploration of a site by a prototype Mars rover. 1.
Solving Factored MDPs with Hybrid State and Action Variables
 JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 2006
"... Efficient representations and solutions for large decision problems with continuous and discrete variables are among the most important challenges faced by the designers of automated decision support systems. In this paper, we describe a novel hybrid factored Markov decision process (MDP) model t ..."
Abstract

Cited by 17 (4 self)
 Add to MetaCart
Efficient representations and solutions for large decision problems with continuous and discrete variables are among the most important challenges faced by the designers of automated decision support systems. In this paper, we describe a novel hybrid factored Markov decision process (MDP) model that allows for a compact representation of these problems, and a new hybrid approximate linear programming (HALP) framework that permits their efficient solutions. The central idea of HALP is to approximate the optimal value function by a linear combination of basis functions and optimize its weights by linear programming.
A fast analytical algorithm for solving markov decision processes with realvalued resources
 In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI07
, 2007
"... Agents often have to construct plans that obey deadlines or, more generally, resource limits for realvalued resources whose consumption can only be characterized by probability distributions, such as execution time or battery power. These planning problems can be modeled with continuous state Marko ..."
Abstract

Cited by 16 (6 self)
 Add to MetaCart
Agents often have to construct plans that obey deadlines or, more generally, resource limits for realvalued resources whose consumption can only be characterized by probability distributions, such as execution time or battery power. These planning problems can be modeled with continuous state Markov decision processes (MDPs) but existing solution methods are either inefficient or provide no guarantee on the quality of the resulting policy. We therefore present CPH, a novel solution method that solves the planning problems by first approximating with any desired accuracy the probability distributions over the resource consumptions with phasetype distributions, which use exponential distributions as building blocks. It then uses value iteration to solve the resulting MDPs by exploiting properties of exponential distributions to calculate the necessary convolutions accurately and efficiently while providing strong guarantees on the quality of the resulting policy. Our experimental feasibility study in a Mars rover domain demonstrates a substantial speedup over Lazy Approximation, which is currently the leading algorithm for solving continuous state MDPs with quality guarantees. 1
Planning with Durative Actions in Stochastic Domains
"... Probabilistic planning problems are typically modeled as a Markov Decision Process (MDP). MDPs, while an otherwise expressive model, allow only for sequential, nondurative actions. This poses severe restrictions in modeling and solving a real world planning problem. We extend the MDP model to incor ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
Probabilistic planning problems are typically modeled as a Markov Decision Process (MDP). MDPs, while an otherwise expressive model, allow only for sequential, nondurative actions. This poses severe restrictions in modeling and solving a real world planning problem. We extend the MDP model to incorporate — 1) simultaneous action execution, 2) durative actions, and 3) stochatic durations. We develop several algorithms to combat the computational explosion introduced by these features. The key theoretical ideas used in building these algorithms are — modeling a complex problem as an MDP in extended state/action space, pruning of irrelevant actions, sampling of relevant actions, using informed heuristics to guide the search, hybridizing different planners to achieve benefits of both, approximating the problem and replanning. Our empirical evaluation illuminates the different merits in using various algorithms, viz., optimality, empirical closeness to optimality, theoretical error bounds, and speed. 1.