Results 1  10
of
19
Valuefunction approximations for partially observable Markov decision processes
 Journal of Artificial Intelligence Research
, 2000
"... Partially observable Markov decision processes (POMDPs) provide an elegant mathematical framework for modeling complex decision and planning problems in stochastic domains in which states of the system are observable only indirectly, via a set of imperfect or noisy observations. The modeling advanta ..."
Abstract

Cited by 127 (0 self)
 Add to MetaCart
Partially observable Markov decision processes (POMDPs) provide an elegant mathematical framework for modeling complex decision and planning problems in stochastic domains in which states of the system are observable only indirectly, via a set of imperfect or noisy observations. The modeling advantage of POMDPs, however, comes at a price — exact methods for solving them are computationally very expensive and thus applicable in practice only to very simple problems. We focus on efficient approximation (heuristic) methods that attempt to alleviate the computational problem and trade off accuracy for speed. We have two objectives here. First, we survey various approximation methods, analyze their properties and relations and provide some new insights into their differences. Second, we present a number of new approximation methods and novel refinements of existing techniques. The theoretical results are supported by experiments on a problem from the agent navigation domain. 1.
A Survey of Computational Complexity Results in Systems and Control
, 2000
"... The purpose of this paper is twofold: (a) to provide a tutorial introduction to some key concepts from the theory of computational complexity, highlighting their relevance to systems and control theory, and (b) to survey the relatively recent research activity lying at the interface between these fi ..."
Abstract

Cited by 116 (21 self)
 Add to MetaCart
The purpose of this paper is twofold: (a) to provide a tutorial introduction to some key concepts from the theory of computational complexity, highlighting their relevance to systems and control theory, and (b) to survey the relatively recent research activity lying at the interface between these fields. We begin with a brief introduction to models of computation, the concepts of undecidability, polynomial time algorithms, NPcompleteness, and the implications of intractability results. We then survey a number of problems that arise in systems and control theory, some of them classical, some of them related to current research. We discuss them from the point of view of computational complexity and also point out many open problems. In particular, we consider problems related to stability or stabilizability of linear systems with parametric uncertainty, robust control, timevarying linear systems, nonlinear and hybrid systems, and stochastic optimal control.
On the Undecidability of Probabilistic Planning and Related Stochastic Optimization Problems
 Artificial Intelligence
, 2003
"... Automated planning, the problem of how an agent achieves a goal given a repertoire of actions, is one of the foundational and most widely studied problems in the AI literature. The original formulation of the problem makes strong assumptions regarding the agent's knowledge and control over the world ..."
Abstract

Cited by 48 (0 self)
 Add to MetaCart
Automated planning, the problem of how an agent achieves a goal given a repertoire of actions, is one of the foundational and most widely studied problems in the AI literature. The original formulation of the problem makes strong assumptions regarding the agent's knowledge and control over the world, namely that its information is complete and correct, and that the results of its actions are deterministic and known.
Nonapproximability Results for Partially Observable Markov Decision Processes
, 2000
"... We show that for several variations of partially observable Markov decision processes, polynomialtime algorithms for nding control policies are unlikely to or simply don't have guarantees of nding policies within a constant factor or a constant summand of optimal. Here "unlikely" means \unless s ..."
Abstract

Cited by 32 (0 self)
 Add to MetaCart
We show that for several variations of partially observable Markov decision processes, polynomialtime algorithms for nding control policies are unlikely to or simply don't have guarantees of nding policies within a constant factor or a constant summand of optimal. Here "unlikely" means \unless some complexity classes collapse," where the collapses considered are P = NP, P = PSPACE, or P = EXP. Until or unless these collapses are shown to hold, any controlpolicy designer must choose between such performance guarantees and ecient computation.
Complexity results for InfiniteHorizon Markov Decision Processes
, 2000
"... Markov decision processes (MDPs) are models of dynamic decision making under uncertainty. These models arise in diverse applications and have been developed extensively in fields such as operations research, control engineering, and the decision sciences in general. Recent research, especially in a ..."
Abstract

Cited by 15 (3 self)
 Add to MetaCart
Markov decision processes (MDPs) are models of dynamic decision making under uncertainty. These models arise in diverse applications and have been developed extensively in fields such as operations research, control engineering, and the decision sciences in general. Recent research, especially in artificial intelligence, has highlighted the significance of studying the computational properties of MDP problems. We address
Complexity Issues in Markov Decision Processes
 In Proc. IEEE conference on Computational Complexity
, 1998
"... We survey the complexity of computational problems about Markov decision processes: evaluating policies, finding good and best policies, approximating best policies, and related decision problems. 1 Introduction Partiallyobservable Markov decision processes (POMDPs) model sequential decision maki ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
We survey the complexity of computational problems about Markov decision processes: evaluating policies, finding good and best policies, approximating best policies, and related decision problems. 1 Introduction Partiallyobservable Markov decision processes (POMDPs) model sequential decision making when outcomes are uncertain and the state of the system cannot be completely observed. They consist of decision epochs, states, observations, actions, transition probabilities, and rewards. At each decision epoch, the process is in some state, from which a "signal" is sent out which can be observed from outside. (Note that different states may send equal signals.) Choosing an action in a state generates a reward or possibly a cost (negative reward) and determines the state at the next decision epoch through a transition probability function. Policies are prescriptions of which action to take under any eventuality (i.e. any sequence of observations made in the previous decision epochs). De...
Algorithms for Partially Observable Markov Decision Processes
 HONG KONG UNIVERSITY OF SCIENCE AND TECHNOLOGY
, 2001
"... Partially Observable Markov Decision Process (POMDP) is a general sequential decisionmaking model where the effects of actions are... ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
Partially Observable Markov Decision Process (POMDP) is a general sequential decisionmaking model where the effects of actions are...
The complexity of policy evaluation for finitehorizon partiallyobservable Markov decision processes
 Proc. MFCS '97
, 1996
"... A partiallyobservable Markov decision process (POMDP) is a generalization of a Markov decision process that allows for incomplete information regarding the state of the system. POMDPs are used to model controlled stochastic processes, from health care to manufacturing control processes (see [19] fo ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
A partiallyobservable Markov decision process (POMDP) is a generalization of a Markov decision process that allows for incomplete information regarding the state of the system. POMDPs are used to model controlled stochastic processes, from health care to manufacturing control processes (see [19] for more examples). We consider several flavors of finitehorizon POMDPs. Our results concern the complexity of the policy evaluation and policy existence problems, which are characterized in terms of completeness for complexity classes. Although a large body of literature in mathematics, operations research, and engineering deals with optimization and approximation strategies for POMDPs, there has been little work aimed at characterizing the complexity of these problems and proving lower bounds. We prove a new upper bound of the policy evaluation problem for POMDPs, showing it is Probabilistic Logspace complete. From this, we prove policy existence problems for several variants of unobservabl...
Formalizing MultiAgent POMDP's in the context of network routing
 PROCEEDINGS OF 36TH HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES (HICSS’03
, 2003
"... This paper uses partially observable Markov decision processes (POMDP's) as a basic framework for MultiAgent planning. We distinguish three perspectives: first one is that of an omniscient agent that has access to the global state of the system, second one is the perspective of an individual agent t ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
This paper uses partially observable Markov decision processes (POMDP's) as a basic framework for MultiAgent planning. We distinguish three perspectives: first one is that of an omniscient agent that has access to the global state of the system, second one is the perspective of an individual agent that has access only to its local state, and the third one is the perspective of an agent that models the states of information of the other agents. We detail how the first perspective differs from the other two due to the partial observability. POMDP's allow us to formally define the notion of optimal actions in each perspective, and to quantify the loss of performance due to partial observability, and possible gain in performance due to intelligent information exchange between the agents. As an example we consider the domain of agents in a distributed information network. There, agents have to decide how to route packets and how to share information with other agents. Though almost all routing protocols have been formulated based on detailed study of the functional parameters in the system, there has been no clear formal representation for optimality. We argue that the various routing protocols should fall out as different approximations to policies (optimal solutions) in such a framework. Our approach also proves critical and useful for the computation of error bounds due to approximations used in practical routing algorithms. Each routing protocol is a conditional plan that involves physical actions, which change the physical state of the system, and actions that explicitly exchange information.