## Value-directed Compression of POMDPs (2002)

### Cached

### Download Links

- [www.cs.toronto.edu]
- [www.cs.toronto.edu]
- [www.cs.toronto.edu]
- [www.cs.toronto.edu]
- [www.cs.toronto.edu]
- [www.cs.utoronto.ca]
- [www.cs.utoronto.ca]
- DBLP

### Other Repositories/Bibliography

Venue: | In NIPS 15 |

Citations: | 71 - 4 self |

### BibTeX

@INPROCEEDINGS{Poupart02value-directedcompression,

author = {Pascal Poupart and Craig Boutilier},

title = {Value-directed Compression of POMDPs},

booktitle = {In NIPS 15},

year = {2002},

pages = {1547--1554},

publisher = {MIT Press}

}

### Years of Citing Articles

### OpenURL

### Abstract

We examine the problem of generating state-space compressions of POMDPs in a way that minimally impacts decision quality. We analyze the impact of compressions on decision quality, observing that compressions that allow accurate policy evaluation (prediction of expected future reward) will not affect decision quality.

### Citations

1518 |
Iterative methods for sparse linear systems
- Saad
- 2003
(Show Context)
Citation Context ... compressed POMDP (solid arrows) where the next compressed belief state is accurately predicted. 2.3 Invariant and Krylov Subspaces We briefly review several linear algebraic concepts used later (see =-=[15]-=- for more details). Let S be a vector subspace. We say S is invariant with respect to matrix M if it is closed under multiplication by M (i.e., Mx 2 S; 8x 2 S). A Krylov subspace Kr(M;x) is the smalle... |

434 | The information bottleneck method
- Tishby, Pereira, et al.
- 1999
(Show Context)
Citation Context ...ction that maps each belief state b into some lower dimensional compressed belief state ~ b (see Figure 1(a)). Here ~ b can be viewed as a bottleneck (e.g., in the sense of the information bottleneck =-=[17]-=-) that filters the information contained in b before it's used to estimate future rewards. We desire a compression f such that ~ b corresponds to the smallest statistic sufficient for accurately predi... |

179 | SPUDD: Stochastic Planning using Decision Diagrams
- Hoey, St-Aubin, et al.
- 1999
(Show Context)
Citation Context ...ctions are represented using DBNs and structured CPTs (e.g., decision trees or algebraic decision diagrams), then the matrix operations required by the Krylov algorithm can be implemented effectively =-=[1, 7]-=-. Although this approach can offer substantial savings, the DTs or ADDs that represent the basis vectors of the Krylov subspace may still be much larger than the dimensionality of the compressed state... |

147 | Stochastic Dynamic Programming with Factored Representations
- Boutilier, Dearden, et al.
- 2000
(Show Context)
Citation Context ...nction can be additively separated when it decomposes into a sum of smaller terms. For instance, P r(ZjXY ) is separable if there exist conditional distributions P r X (ZjX) and P r Y (ZjY ), and 2 [=-=0; -=-1], such that P r(ZjXY ) = P r X (ZjX) + (1 )P r Y (ZjY ). This ensures that one need only know the marginals of X and Y (instead of their joint distribution) to infer Z. Pfeffer shows how additive se... |

113 | Computing Optimal Policies for Partially Observable Decision Processes Using Compact Representations
- Boutilier, Poole
- 1996
(Show Context)
Citation Context ...ely discarded. A number of schemes have been proposed for either directly or indirectly compressing POMDPs. For example, approaches using bounded memory [8, 10] and state aggregation---either dynamic =-=[2]-=- or static [5, 9]---can be viewed in this light. In this paper, we study the effect of static state-space compression on decision quality. We first characterize lossless compressions---those that do n... |

93 | Equivalence notions and model minimization in Markov decision processes
- Givan, Dean, et al.
(Show Context)
Citation Context .... Finally, we can solve the POMDP in the compressed state space by using ~ R and ~ T a;z . Note that this technique can be viewed as a generalization of Givan et al's MDP model minimization technique =-=[3]-=-. It is interesting to note that Littman et al. [9] proposed a similar iterative algorithm to compress POMDPs based on predicting future observations. 2 2 Assuming that rewards are functions of the ob... |

93 | Memoryless policies: Theoretical limitations and practical results
- Littman
- 1994
(Show Context)
Citation Context ...relevant information from that which can be safely discarded. A number of schemes have been proposed for either directly or indirectly compressing POMDPs. For example, approaches using bounded memory =-=[8, 10]-=- and state aggregation---either dynamic [2] or static [5, 9]---can be viewed in this light. In this paper, we study the effect of static state-space compression on decision quality. We first character... |

67 | Max-norm projections for factored MDPs
- Guestrin, Koller, et al.
- 2001
(Show Context)
Citation Context ...several techniques that allow one to exploit problem structure to find an acceptable lossy compression without state space enumeration. One approach is related to the basis function model proposed in =-=[4]-=-, in which we restrict F to be functions over some small set of factors (subsets of state variables.) This ensures that the number of unknown parameters in any column of F (which we optimize in Table ... |

35 | Hidden state and reinforcement learning with instance-based state identification
- McCallum
- 1996
(Show Context)
Citation Context ...relevant information from that which can be safely discarded. A number of schemes have been proposed for either directly or indirectly compressing POMDPs. For example, approaches using bounded memory =-=[8, 10]-=- and state aggregation---either dynamic [2] or static [5, 9]---can be viewed in this light. In this paper, we study the effect of static state-space compression on decision quality. We first character... |

30 | Greedy linear value-approximation for factored markov decision processes
- Patrascu, Poupart, et al.
- 2002
(Show Context)
Citation Context ... it is possible to efficiently solve the optimization program in Table 1. The question of factor selection remains: on what factors should F be defined? A version of this question has been tackled in =-=[12, 14]-=- in the context of selecting a basis to approximately solve MDPs. The techniques proposed in those papers could be adapted to our optimization program. An alternative method for structuring the comput... |

29 | Direct value-approximation for factored MDPs
- Schuurmans, Patrascu
- 2001
(Show Context)
Citation Context ...sed on F and the DBN structure to reduce the number of constraints to something (in the many cases) polynomial in the number of state variables. This can be achieved using the techniques described in =-=[4, 16]-=- to rewrite an LP with many fewer constraints or to generate small subsets of constraints incrementally. These techniques are rather involved, so we refer to the cited papers for details. By searching... |

26 | Solving factored pomdps with linear value functions
- Guestrin, Koller, et al.
- 2001
(Show Context)
Citation Context ... A number of schemes have been proposed for either directly or indirectly compressing POMDPs. For example, approaches using bounded memory [8, 10] and state aggregation---either dynamic [2] or static =-=[5, 9]-=----can be viewed in this light. In this paper, we study the effect of static state-space compression on decision quality. We first characterize lossless compressions---those that do not lead to any er... |

25 | Piecewise Linear Value Function Approximation for Factored MDPs
- Poupart, Boutilier, et al.
- 2002
(Show Context)
Citation Context ... it is possible to efficiently solve the optimization program in Table 1. The question of factor selection remains: on what factors should F be defined? A version of this question has been tackled in =-=[12, 14]-=- in the context of selecting a basis to approximately solve MDPs. The techniques proposed in those papers could be adapted to our optimization program. An alternative method for structuring the comput... |

21 | A Survey of POMDP Solution Techniques
- Murphy
- 2000
(Show Context)
Citation Context ... policy mapping belief states to actions. The value V of a policy is the expected sum of discounted rewards and is defined as: V (b) = R(b) +sX z V (T (b);z (b)) (1) A number of techniques [11] based on value iteration or policy iteration can be used to compute optimal or approximately optimal policies for POMDPs. 2.2 Conditional Independence and Additive Separability When our state space i... |

10 |
separability and temporal probabilistic models
- Sufficiency
- 2001
(Show Context)
Citation Context ...ansitions associated with each variable depend only on a small subset of variables. These representations can often be exploited to solve POMDPs without state space enumeration [2]. Recently, Pfeffer =-=[13]-=- showed that conditional independence combined with some form of additive separability can enable efficient inference in many DBNs. Roughly, a function can be additively separated when it decomposes i... |

2 | Information-theoretic features for reinforcement learning. Unpublished manuscript - Guestrin, Ormoneit |