## On the Complexity of Partially Observed Markov Decision Processes (1996)

Venue: | Theoretical Computer Science |

Citations: | 27 - 2 self |

### BibTeX

@ARTICLE{Burago96onthe,

author = {Dima Burago and Michel De Rougemont and Anatol Slissenko},

title = {On the Complexity of Partially Observed Markov Decision Processes},

journal = {Theoretical Computer Science},

year = {1996},

volume = {157},

pages = {161--183}

}

### OpenURL

### Abstract

In the paper we consider the complexity of constructing optimal policies (strategies) for some type of partially observed Markov decision processes. This particular case of the classical problem deals with finite stationary processes, and can be represented as constructing optimal strategies to reach target vertices from a starting vertex in a graph with colored vertices and probabilistic deviations from an edge chosen to follow. The colors of the visited vertices is the only information available to a strategy. The complexity of Markov decision in the case of perfect information (bijective coloring of vertices) is known and briefly surveyed at the beginning of the paper. For the unobservable case (all the colors are equal) we give an improvement of the result of Papadimitriou and Tsitsiklis,namely we show that the problem of constructing even a very weak approximation to an optimal strategy is NP-hard. Our main results concern the case of a fixed bound on the multiplicity of coloring,...

### Citations

11488 |
Computers and Intractability: A Guide to the Theory of NP-Completeness
- Garey, Johnson
- 1979
(Show Context)
Citation Context ... 4 together with Claim 3 immediately imply R P i K (oe)sF K;i (P i ) \Gamma " that completes the proof of Theorem 8. 5.7 Proof of theorem 7. Our proof is based on a reduction of the Partition Pro=-=blem [10]-=-, A3.2: Given a set fz a g a2A of natural numbers indexed by natural numbers from A, to find whether there exists a subset A 0 ae A such that P a2A 0 z a = P a2AnA 0 z a . If such a subset A 0 exists ... |

2316 |
An Introduction to Probability Theory and its Applications, volume II
- Feller
- 1971
(Show Context)
Citation Context ... r; (v; m)) respectively. It is clear that M-strategies correspond to stationary Markov chains, and T -strategies to non-stationary ones. Sufficient information on Markov chains can be found in [13], =-=[9]-=-. In this section we review the case of bijective colouring (that is called the case of perfect information in theory of Markov decision processes), and assume C = V and clr = id. 5 3.2 Optimal M-stra... |

851 |
Finite Markov chains
- Kemeny, Snell
- 1976
(Show Context)
Citation Context ...or (G; r; (v; m)) respectively. It is clear that M-strategies correspond to stationary Markov chains, and T -strategies to non-stationary ones. Sufficient information on Markov chains can be found in =-=[13]-=-, [9]. In this section we review the case of bijective colouring (that is called the case of perfect information in theory of Markov decision processes), and assume C = V and clr = id. 5 3.2 Optimal M... |

645 |
Markov Decision Processes
- Puterman
- 1994
(Show Context)
Citation Context ...ontract 91/1061. y St-Petersburg Institute for Informatics and Automation of the Academy of Sciences of Russia 1 Introduction 1.1 We consider a particular case of Markov decision processes (e. g. see =-=[17]-=-) from the point of view of computational complexity. This case concerns stationary processes with imperfect information (partially observed) with finite number of states and actions, and under a conc... |

446 | The complexity of enumeration and reliability problems - Valiant - 1979 |

326 |
A catalog of complexity classes
- Johnson
- 1990
(Show Context)
Citation Context ...ity not less than exp(\Gamma p k). We prove theorem 6 below. 4.2 Notations on 3SAT problem. The proof is based on a polytime reduction of 3SAT - problem, which is a classical NP-complete problem, see =-=[11]-=-. Let F = 1im 1j3 z i;j ; (2) be a 3CNF formula over n variables x 1 ; : : : ; x n , where z i;j are literals, i.e. elements of the set Z = fx 1 ; : : : ; x n ;sx 1 ; : : : ;sx n g; ns3m: To visualize... |

320 |
The complexity of markov decision processes
- Papadimitriou, Tsitsiklis
- 1987
(Show Context)
Citation Context ...ion 3 the complexity of the case of perfect information (bijective coloring) is briefly surveyed. In short section 4 for the case of total uncertainty (unobservability) we strengthen corollary 2 from =-=[15]-=-, and show that even very weak approximations to optimal strategies are NP-hard. The main results are contained in section 5 where we treat the case of unobservability bounded by a fixed parameter. In... |

266 |
Computing Network Reliability
- Provan, Ball
- 1984
(Show Context)
Citation Context ...gies optimal in different classes, and as one of the further goals, to look at the complexity of optimal strategies for situations with more diverse uncertainty. Different models of uncertainty, e. g.=-=[4]-=-, [18], [14], [16], [7], [19] remain separated. 1.2 In the next section 2 we give the basic notions from the field of Markov decision processes related to the problems under consideration, and then sp... |

152 |
Dynamic Programming and Stochastic Control
- Bertsekas
- 1976
(Show Context)
Citation Context ... under stronger constraints) is as powerful as the original one. 2.4 Criteria of Quality of Strategies. General definitions of criteria can be found in texts on Markov decision processes, e. g. [17], =-=[3]-=-. Here, by a criterion we mean a function from the set of strategies to real numbers that depends only on the semantics of strategies (i. e. on the probability distribution defined by a strategy). We ... |

137 |
Shortest paths without a map
- Papadimitriou, Yannakakis
- 1991
(Show Context)
Citation Context ...ifferent classes, and as one of the further goals, to look at the complexity of optimal strategies for situations with more diverse uncertainty. Different models of uncertainty, e. g.[4], [18], [14], =-=[16]-=-, [7], [19] remain separated. 1.2 In the next section 2 we give the basic notions from the field of Markov decision processes related to the problems under consideration, and then specify the criteria... |

107 |
Games against nature
- Papadimitriou
- 1985
(Show Context)
Citation Context ...l in different classes, and as one of the further goals, to look at the complexity of optimal strategies for situations with more diverse uncertainty. Different models of uncertainty, e. g.[4], [18], =-=[14]-=-, [16], [7], [19] remain separated. 1.2 In the next section 2 we give the basic notions from the field of Markov decision processes related to the problems under consideration, and then specify the cr... |

47 |
How to learn an unknown environment
- Deng, Kameda, et al.
- 1991
(Show Context)
Citation Context ...nt classes, and as one of the further goals, to look at the complexity of optimal strategies for situations with more diverse uncertainty. Different models of uncertainty, e. g.[4], [18], [14], [16], =-=[7]-=-, [19] remain separated. 1.2 In the next section 2 we give the basic notions from the field of Markov decision processes related to the problems under consideration, and then specify the criteria of o... |

27 |
Linear programming and finite Markovian control problems
- Kallenberg
- 1983
(Show Context)
Citation Context ... is called the case of perfect information in theory of Markov decision processes), and assume C = V and clr = id. 5 3.2 Optimal M-strategies. The following theorem is known (see [17], Theorem 7.7 or =-=[12]-=-) even for the general case of positive/negative gains. In our case it can be proven by a direct combinatorial argument. Theorem 1 For every CU-graph with bijective coloring an R s;T 1 -optimal strate... |

5 |
The complexity of the max word problem
- Condon
- 1991
(Show Context)
Citation Context ...ectors V , W with positive coordinates and an integer k in unary notation, the problem is to find a sequence M i 1 ; : : : ; M i k which maximizes the product hV; ( Q k j=1 WM i j )i. It was shown in =-=[5]-=- that the Max Word Problem for stochastic matrices is NP-hard as well as its approximation version up to any multiplicative factor. Max Word Problem for stochastic m \Theta m-matrices can be reduced t... |

4 |
Diaz-Frias. A theory of robust planning
- Rougemont, F
- 1992
(Show Context)
Citation Context ... to construct a strategy fulfilling some task. One of the simplest tasks is to reach a target vertice from a source vertex with maximum probability. Our specific motivations go back to robotics (e. g.=-=[6]-=-, [8]) and to analysis of some probabilistic models. The first goal was to analyze the complexity of constructing strategies optimal in different classes, and as one of the further goals, to look at t... |

2 |
First decisions of an optimal T-strategy can be non periodic
- Beauquier, Burago, et al.
- 1994
(Show Context)
Citation Context ...s clr \Gamma1 (c), c 2 C, which characterizes the uncertainty of determining current state. In traditional terms the coloring defines partial observability of the process. ffls: D \Theta V \Theta V ! =-=[0; 1]-=-, where D is a finite set, is such that for alls2 D and u 2 V X v2V (; u; v) = 1: (1) For brevity (; uv) is used for (; u; v). The set D may be interpreted as a set of actions (or moves or even decisi... |

2 |
V'erification probabiliste de probl`emes de graphes: applications `a la robotique mobile
- Diaz-Frias
- 1993
(Show Context)
Citation Context ...onstruct a strategy fulfilling some task. One of the simplest tasks is to reach a target vertice from a source vertex with maximum probability. Our specific motivations go back to robotics (e. g.[6], =-=[8]-=-) and to analysis of some probabilistic models. The first goal was to analyze the complexity of constructing strategies optimal in different classes, and as one of the further goals, to look at the co... |

1 |
On the complexity of finite memory strategies for control under probabilistic deviations
- Beauquier, Burago, et al.
- 1994
(Show Context)
Citation Context ...2) Probability of realizing a given behavior. Let L be a set of paths interpreted as a set of allowed realizations. The criterion R L k (oe) is the probability to follow only realizations from L (cf. =-=[2]-=- where finite automaton L's are studied). For criterion R s;T k (oe) one can consider also its limit version R s;T 1 (oe) = lim k!1 R s;T k (oe). Clear, the criterion R s;T k (oe) is non decreasing on... |

1 |
A discussion of one question of Bellman
- Zalgaller
- 1992
(Show Context)
Citation Context ...asses, and as one of the further goals, to look at the complexity of optimal strategies for situations with more diverse uncertainty. Different models of uncertainty, e. g.[4], [18], [14], [16], [7], =-=[19]-=- remain separated. 1.2 In the next section 2 we give the basic notions from the field of Markov decision processes related to the problems under consideration, and then specify the criteria of optimal... |