## Perseus: Randomized point-based value iteration for POMDPs (2005)

### Cached

### Download Links

- [www.aaai.org]
- [www.aaai.org]
- [www.cs.cmu.edu]
- [staff.science.uva.nl]
- [www.science.uva.nl]
- [www.jair.org]
- [staff.science.uva.nl]
- [users.isr.ist.utl.pt]
- CiteULike
- DBLP

### Other Repositories/Bibliography

Venue: | Journal of Artificial Intelligence Research |

Citations: | 142 - 11 self |

### BibTeX

@ARTICLE{Spaan05perseus:randomized,

author = {Matthijs T. J. Spaan and Nikos Vlassis},

title = {Perseus: Randomized point-based value iteration for POMDPs},

journal = {Journal of Artificial Intelligence Research},

year = {2005},

volume = {24},

pages = {195--220}

}

### Years of Citing Articles

### OpenURL

### Abstract

Partially observable Markov decision processes (POMDPs) form an attractive and principled framework for agent planning under uncertainty. Point-based approximate techniques for POMDPs compute a policy based on a finite set of points collected in advance from the agent’s belief space. We present a randomized point-based value iteration algorithm called Perseus. The algorithm performs approximate value backup stages, ensuring that in each backup stage the value of each point in the belief set is improved; the key observation is that a single backup may improve the value of many belief points. Contrary to other point-based methods, Perseus backs up only a (randomly selected) subset of points in the belief set, sufficient for improving the value of each belief point in the set. We show how the same idea can be extended to dealing with continuous action spaces. Experimental results show the potential of Perseus in large scale POMDP problems. 1.

### Citations

3773 | Reinforcement Learning: An Introduction
- Sutton, Barto
- 1998
(Show Context)
Citation Context ...is uncertain about the exact effect of executing a certain action. This uncertainty is captured by a probabilistic transition model as is the case in a fully observable Markov decision process (MDP) (=-=Sutton & Barto, 1998-=-; Bertsekas & Tsitsiklis, 1996). An MDP defines a transition model which specifies the probabilistic effect of how each action changes the state. Extending 197sSpaan & Vlassis the MDP setting, a POMDP... |

3697 |
Artificial Intelligence : A Modern Approach
- Russell, Norvig
- 1995
(Show Context)
Citation Context ...ith continuous action spaces. Experimental results show the potential of Perseus in large scale POMDP problems. 1. Introduction A major goal of Artificial Intelligence is to build intelligent agents (=-=Russell & Norvig, 2003-=-). An intelligent agent, whether physical or simulated, should be able to autonomously perform a given task, and is often characterized by its sense–think–act loop: it uses sensors to observe the envi... |

2611 |
Dynamic Programming
- Bellman
- 1957
(Show Context)
Citation Context ... that satisfies the Bellman optimality equation V ∗ = HV ∗ : V ∗ (b) = max a∈A � � r(s, a)b(s) + γ � s∈S o∈O p(o|a, b)V ∗ (b o � a) , (3) with bo a given by (1), and H is the Bellman backup operator (=-=Bellman, 1957-=-). When (3) holds for every b ∈ ∆ we are ensured the solution is optimal. V ∗ can be approximated by iterating a number of stages, as we will see in the next section, at each stage considering a step ... |

825 | A.R.: Planning and acting in partially observable stochastic domains
- Kaelbling, Littman, et al.
- 1998
(Show Context)
Citation Context ...ses (POMDPs) provide a rich mathematical framework for acting optimally in such partially observable and stochastic environments (Dynkin, 1965; ˚Aström, 1965; Aoki, 1965; Sondik, 1971; Lovejoy, 1991; =-=Kaelbling, Littman, & Cassandra, 1998-=-). The POMDP defines a sensor model specifying the probability of observing a particular sensor reading in a specific state and a stochastic transition model which captures the uncertain outcome of ex... |

336 | The Optimal Control of Partially Observable Markov Processes - Sondik - 1971 |

314 | Prioritized sweeping: reinforcement learning with less data and less time
- Moore, Atkeson
- 1993
(Show Context)
Citation Context ...e backed up is arbitrary. This allows an algorithm to focus on backups which may have a high potential impact, as for instance in the prioritized sweeping algorithm for solving fully observable MDPs (=-=Moore & Atkeson, 1993-=-; Andre, Friedman, & Parr, 1998). A drawback is that the notion of an exact planning horizon is somewhat lost: in general, after performing n backup stages the computed plan will not be considering n ... |

307 |
The complexity of markov decision processes
- Papadimitriou, Tsitsiklis
- 1987
(Show Context)
Citation Context ...eded in order to ensure that the new value function is an upper bound to the previous one over the complete belief simplex. In general, computing exact solutions for POMDPs is an intractable problem (=-=Papadimitriou & Tsitsiklis, 1987-=-; Madani, Hanks, & Condon, 1999), calling for approximate solution techniques (Lovejoy, 1991; Hauskrecht, 2000). In practical tasks one would like to compute solutions only for those parts of the beli... |

298 | Least-squares policy iteration
- Lagoudakis, Parr
(Show Context)
Citation Context ...e additional computational cost of sampling multiple sets B. We note that similar ‘off-policy’ learning using a fixed set of sampled states has also been adopted by other recent algorithms like LSPI (=-=Lagoudakis & Parr, 2003-=-) and PSDP (Bagnell, Kakade, Ng, & Schneider, 2004). The backups of Perseus on a fixed set B can be viewed as a particular instance of asynchronous dynamic programming (Bertsekas & Tsitsiklis, 1989). ... |

295 |
The optimal control of partially observable Markov processes over a finite horizon
- Smallwood, Sondik
- 1973
(Show Context)
Citation Context ...er of stages, as we will see in the next section, at each stage considering a step further into the future. For problems with a finite planning horizon V ∗ will be piecewise linear and convex (PWLC) (=-=Smallwood & Sondik, 1973-=-), and for infinite horizon tasks V ∗ can be approximated arbitrary well by a PWLC value function. We parameterize a value function Vn at stage n by a finite set of vectors (hyperplanes) {αi n}, i = 1... |

245 | Point-based value iteration: An anytime algorithm for pomdps
- Pineau, Gordon, et al.
- 2003
(Show Context)
Citation Context ...ecently motivated the use of approximate solution techniques which focus on the use of a sampled set of belief points on which planning is performed (Hauskrecht, 2000; Poon, 2001; Roy & Gordon, 2003; =-=Pineau, Gordon, & Thrun, 2003-=-; Spaan & Vlassis, 2004), a possibility already mentioned by Lovejoy (1991). The idea is that instead of planning over the complete belief space of the agent (which is intractable for large state spac... |

207 | Pegasus: A policy search method for large mdps and pomdps
- Ng, Jordan
- 2000
(Show Context)
Citation Context ...MDP solution techniques targets discrete action spaces; exceptions include the application of a particle filter to a continuous state and action space (Thrun, 2000) and certain policy search methods (=-=Ng & Jordan, 2000-=-; Baxter & Bartlett, 2001). We report on experiments in a domain in which an agent equipped with proximity sensors can move at a continuous heading and distance, and we present experimental results fr... |

191 |
A survey of partially observable Markov decision processes: Theory, models, and algorithms
- Monahan
- 1982
(Show Context)
Citation Context ...on. Another set of exact POMDP value iteration algorithms do not focus on searching in the belief space, but instead consider enumerating all possible vectors of HVn and then pruning useless vectors (=-=Monahan, 1982-=-; Cassandra, Littman, & Zhang, 1997). As an example of exact value iteration let us consider the most straightforward way of computing HVn due to Monahan (1982). This involves calculating all possible... |

159 | Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes
- Cassandra, Littman, et al.
- 1997
(Show Context)
Citation Context ... of exact POMDP value iteration algorithms do not focus on searching in the belief space, but instead consider enumerating all possible vectors of HVn and then pruning useless vectors (Monahan, 1982; =-=Cassandra, Littman, & Zhang, 1997-=-). As an example of exact value iteration let us consider the most straightforward way of computing HVn due to Monahan (1982). This involves calculating all possible ways HVn could be constructed, exp... |

153 | Infinite-horizon policy-gradient estimation
- Baxter, Bartlett
- 2001
(Show Context)
Citation Context ...ques targets discrete action spaces; exceptions include the application of a particle filter to a continuous state and action space (Thrun, 2000) and certain policy search methods (Ng & Jordan, 2000; =-=Baxter & Bartlett, 2001-=-). We report on experiments in a domain in which an agent equipped with proximity sensors can move at a continuous heading and distance, and we present experimental results from a navigation task invo... |

111 | Approximate planning in large POMDPs via reusable trajectories
- Kearns, Mansour, et al.
- 2000
(Show Context)
Citation Context ...unded-size) stochastic finite state controllers by performing policy iteration steps. Other options for searching the policy space include gradient ascent (Meuleau, Kim, Kaelbling, & Cassandra, 1999; =-=Kearns, Mansour, & Ng, 2000-=-; Ng & Jordan, 2000; Baxter & Bartlett, 2001; Aberdeen & Baxter, 2002) and heuristic methods like stochastic local search (Braziunas & Boutilier, 2004). In particular, the Pegasus method (Ng & Jordan,... |

99 |
carlo POMDPs
- Thrun, “Monte
- 2000
(Show Context)
Citation Context ...ilt camera at any desired angle. Most work on POMDP solution techniques targets discrete action spaces; exceptions include the application of a particle filter to a continuous state and action space (=-=Thrun, 2000-=-) and certain policy search methods (Ng & Jordan, 2000; Baxter & Bartlett, 2001). We report on experiments in a domain in which an agent equipped with proximity sensors can move at a continuous headin... |

98 | Heuristic search value iteration for POMDPs
- Smith, Simmons
- 2004
(Show Context)
Citation Context ...effect on the approximation accuracy as dictated by the bounds of Pineau et al. (2003). Furthermore, compared with other methods that build the set B based on various heuristics (Pineau et al., 2003; =-=Smith & Simmons, 2004-=-), our build-up of B is cheap as it only requires sampling random trajectories starting from b0. Moreover, duplicate entries in B will only affect the probability that a particular b will be sampled i... |

94 | Solving POMDPs by Searching in Policy Space
- Hansen
- 1998
(Show Context)
Citation Context ...outines. An alternative to computing an (approximate) value function is policy search: these methods search for a good policy within a restricted class of controllers. For instance, policy iteration (=-=Hansen, 1998-=-b) and bounded policy iteration (BPI) (Poupart & Boutilier, 2004) search through the space of (bounded-size) stochastic finite state controllers by performing policy iteration steps. Other options for... |

94 |
Computationally feasible bounds for partially observed Markov decision processes
- Lovejoy
- 1991
(Show Context)
Citation Context ...decision processes (POMDPs) provide a rich mathematical framework for acting optimally in such partially observable and stochastic environments (Dynkin, 1965; ˚Aström, 1965; Aoki, 1965; Sondik, 1971; =-=Lovejoy, 1991-=-; Kaelbling, Littman, & Cassandra, 1998). The POMDP defines a sensor model specifying the probability of observing a particular sensor reading in a specific state and a stochastic transition model whi... |

82 | On the Undecidability of Probabilistic Planning and Infinite-horizon Partially Observable Markov Decision Problems
- Madani, Hanks, et al.
- 1999
(Show Context)
Citation Context ...ew value function is an upper bound to the previous one over the complete belief simplex. In general, computing exact solutions for POMDPs is an intractable problem (Papadimitriou & Tsitsiklis, 1987; =-=Madani, Hanks, & Condon, 1999-=-), calling for approximate solution techniques (Lovejoy, 1991; Hauskrecht, 2000). In practical tasks one would like to compute solutions only for those parts of the belief simplex that are reachable, ... |

78 | Bounded finite state controllers
- Poupart, Boutilier
- 2003
(Show Context)
Citation Context ...e) value function is policy search: these methods search for a good policy within a restricted class of controllers. For instance, policy iteration (Hansen, 1998b) and bounded policy iteration (BPI) (=-=Poupart & Boutilier, 2004-=-) search through the space of (bounded-size) stochastic finite state controllers by performing policy iteration steps. Other options for searching the policy space include gradient ascent (Meuleau, Ki... |

74 |
Algorithms for Partially Observable Markov Decision Processes
- Cheng
- 1988
(Show Context)
Citation Context ...ed vectors will be useless in the sense that their maximizing region is empty, but identifying and subsequently pruning them is an expensive operation. Exact value iteration algorithms (Sondik, 1971; =-=Cheng, 1988-=-; Kaelbling et al., 1998) search in each value iteration step the complete belief simplex for a minimal set of belief points that generate the necessary set of vectors for the next horizon value funct... |

73 | Solving Transition Independent Decentralized Markov Decision Processes
- Becker, Zilberstein, et al.
- 2004
(Show Context)
Citation Context ... & Parr, 2001; Theocharous, Murphy, & Kaelbling, 2004), as well as applying Perseus to cooperative multiagent domains, extending recent approaches (Emery-Montemerlo, Gordon, Schneider, & Thrun, 2004; =-=Becker, Zilberstein, Lesser, & Goldman, 2004-=-; Paquet, Tobin, & Chaib-draa, 2005). Acknowledgments We would like to thank Bruno Scherrer, Geoff Gordon, Pascal Poupart, and the anonymous reviewers for their comments. This research is supported by... |

72 | Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs
- Emery-Montemerlo, Gordon, et al.
- 2004
(Show Context)
Citation Context ...ternative compact representations (Guestrin, Koller, & Parr, 2001; Theocharous, Murphy, & Kaelbling, 2004), as well as applying Perseus to cooperative multiagent domains, extending recent approaches (=-=Emery-Montemerlo, Gordon, Schneider, & Thrun, 2004-=-; Becker, Zilberstein, Lesser, & Goldman, 2004; Paquet, Tobin, & Chaib-draa, 2005). Acknowledgments We would like to thank Bruno Scherrer, Geoff Gordon, Pascal Poupart, and the anonymous reviewers for... |

71 | Value-Directed Compression of POMDPs
- Poupart, Boutilier
- 2002
(Show Context)
Citation Context ...d, one can compute an approximate solution for the original POMDP using, e.g., MDP value iteration. Alternatively linear compression techniques can be used which preserve the shape of value function (=-=Poupart & Boutilier, 2003-=-). Such a property is desirable as it allows one to exploit the existing POMDP machinery. For instance, linear compression has been applied as a preprocessing step for BPI (Poupart & Boutilier, 2005) ... |

66 | Exponential family PCA for belief compression in POMDPs - Roy, Gordon - 2003 |

63 | Exploiting Structure to Efficiently Solve Large Scale Partially Observable Markov Decision Processes
- Poupart
- 2005
(Show Context)
Citation Context ...s desirable as it allows one to exploit the existing POMDP machinery. For instance, linear compression has been applied as a preprocessing step for BPI (Poupart & Boutilier, 2005) as well as Perseus (=-=Poupart, 2005-=-). The literature on POMDPs with continuous actions is still relatively sparse (Thrun, 2000; Ng & Jordan, 2000; Baxter & Bartlett, 2001). Thrun (2000) applies real-time dynamic programming on a POMDP ... |

63 | Finding Approximate POMDP Solutions Through Belief Compression - Roy, Gordon, et al. - 2005 |

62 | A decision-theoretic approach to task assistance for persons with dementia. IJCAI-05
- Boger, Poupart, et al.
- 2005
(Show Context)
Citation Context ... vision. We analyzed a number of different action sampling schemes and compared with discretized action spaces. Perseus has been recently extended to deal with structured state spaces (Poupart, 2005; =-=Boger, Poupart, Hoey, Boutilier, Fernie, & Mihailidis, 2005-=-), continuous observation spaces (Hoey & Poupart, 2005), and continuous state spaces (Porta, Spaan, & Vlassis, 2005). As future work we would like to explore alternative compact representations (Guest... |

58 | Solving POMDPs by searching the space of finite policies
- Meuleau, Kim, et al.
- 1999
(Show Context)
Citation Context ...ilier, 2004) search through the space of (bounded-size) stochastic finite state controllers by performing policy iteration steps. Other options for searching the policy space include gradient ascent (=-=Meuleau, Kim, Kaelbling, & Cassandra, 1999-=-; Kearns, Mansour, & Ng, 2000; Ng & Jordan, 2000; Baxter & Bartlett, 2001; Aberdeen & Baxter, 2002) and heuristic methods like stochastic local search (Braziunas & Boutilier, 2004). In particular, the... |

55 | A heuristic variable grid solution method for POMDPs
- Brafman
- 1997
(Show Context)
Citation Context ...gathering is necessary. One way to sidestep the intractability of exact POMDP value iteration is to grid the belief simplex, using either a fixed grid (Lovejoy, 1991; Bonet, 2002) or a variable grid (=-=Brafman, 1997-=-; Zhou & Hansen, 2001). Value backups are performed for every grid point, but only the value of each grid point is preserved and the gradient is ignored. The value of non-grid points is defined by an ... |

52 | Policy Search by Dynamic Programming
- Bagnell, Kakade, et al.
- 2003
(Show Context)
Citation Context ...sampling multiple sets B. We note that similar ‘off-policy’ learning using a fixed set of sampled states has also been adopted by other recent algorithms like LSPI (Lagoudakis & Parr, 2003) and PSDP (=-=Bagnell, Kakade, Ng, & Schneider, 2004-=-). The backups of Perseus on a fixed set B can be viewed as a particular instance of asynchronous dynamic programming (Bertsekas & Tsitsiklis, 1989). In asynchronous dynamic programming algorithms no ... |

52 | Speeding Up the Convergence Of Value Iteration in Partially Observable Markov Decision Processes
- Zhang, Zhang
- 2001
(Show Context)
Citation Context ... set B of reachable belief points, which remains fixed throughout the complete algorithm. We initialize the value function V0 as a single vector with all its components equal to 1 1−γ mins,a r(s, a) (=-=Zhang & Zhang, 2001-=-). Starting with V0, Perseus performs a number of backup stages until some convergence criterion is met. Each backup stage is defined as follows (where ˜ B is an auxiliary set containing the non-impro... |

41 | Optimal control of Markov processes with incomplete state information - Aström - 1965 |

40 | VDCBPI: an Approximate Scalable Algorithm for Large Scale POMDPs
- Poupart, Boutilier
- 2005
(Show Context)
Citation Context ... (Poupart & Boutilier, 2003). Such a property is desirable as it allows one to exploit the existing POMDP machinery. For instance, linear compression has been applied as a preprocessing step for BPI (=-=Poupart & Boutilier, 2005-=-) as well as Perseus (Poupart, 2005). The literature on POMDPs with continuous actions is still relatively sparse (Thrun, 2000; Ng & Jordan, 2000; Baxter & Bartlett, 2001). Thrun (2000) applies real-t... |

37 |
An Online POMDP Algorithm for Complex Multiagent Environments
- Paquet, Bernier, et al.
- 2005
(Show Context)
Citation Context ...g, 2004), as well as applying Perseus to cooperative multiagent domains, extending recent approaches (Emery-Montemerlo, Gordon, Schneider, & Thrun, 2004; Becker, Zilberstein, Lesser, & Goldman, 2004; =-=Paquet, Tobin, & Chaib-draa, 2005-=-). Acknowledgments We would like to thank Bruno Scherrer, Geoff Gordon, Pascal Poupart, and the anonymous reviewers for their comments. This research is supported by PROGRESS, the embedded systems res... |

36 | Solving POMDPs with continuous or large discrete observation spaces
- Hoey, Poupart
- 2005
(Show Context)
Citation Context ...zed action spaces. Perseus has been recently extended to deal with structured state spaces (Poupart, 2005; Boger, Poupart, Hoey, Boutilier, Fernie, & Mihailidis, 2005), continuous observation spaces (=-=Hoey & Poupart, 2005-=-), and continuous state spaces (Porta, Spaan, & Vlassis, 2005). As future work we would like to explore alternative compact representations (Guestrin, Koller, & Parr, 2001; Theocharous, Murphy, & Kael... |

33 |
Finite-memory control of partially observable systems
- Hansen
- 1998
(Show Context)
Citation Context ...outines. An alternative to computing an (approximate) value function is policy search: these methods search for a good policy within a restricted class of controllers. For instance, policy iteration (=-=Hansen, 1998-=-b) and bounded policy iteration (BPI) (Poupart & Boutilier, 2004) search through the space of (bounded-size) stochastic finite state controllers by performing policy iteration steps. Other options for... |

32 | A point-based POMDP algorithm for robot planning
- Spaan, Vlassis
- 2004
(Show Context)
Citation Context ...proximate solution techniques which focus on the use of a sampled set of belief points on which planning is performed (Hauskrecht, 2000; Poon, 2001; Roy & Gordon, 2003; Pineau, Gordon, & Thrun, 2003; =-=Spaan & Vlassis, 2004-=-), a possibility already mentioned by Lovejoy (1991). The idea is that instead of planning over the complete belief space of the agent (which is intractable for large state spaces), planning is carrie... |

31 |
A fast heuristic algorithm for decision-theoretic planning
- Poon
- 2001
(Show Context)
Citation Context ... As we ensure that the value of a particular belief point never decreases, we are guaranteed that Perseus will converge: the proof only requires observing that every added vector is always below V ∗ (=-=Poon, 2001-=-; Vlassis & Spaan, 2004). Moreover, as we explained above, Perseus can handle large belief sets B, thus obviating the use of dynamic belief point selection strategies like those proposed by Hauskrecht... |

29 | Stochastic local search for POMDP controllers
- Braziunas, Boutilier
- 2004
(Show Context)
Citation Context ...cent (Meuleau, Kim, Kaelbling, & Cassandra, 1999; Kearns, Mansour, & Ng, 2000; Ng & Jordan, 2000; Baxter & Bartlett, 2001; Aberdeen & Baxter, 2002) and heuristic methods like stochastic local search (=-=Braziunas & Boutilier, 2004-=-). In particular, the Pegasus method (Ng & Jordan, 2000) estimates the value of a policy by simulating a (bounded) number of trajectories from the POMDP using a fixed random seed, and then takes steps... |

28 | Representing hierarchical POMDPs as DBNs for multi-scale robot localization
- Theocharous, Murphy, et al.
- 2004
(Show Context)
Citation Context ...paces (Hoey & Poupart, 2005), and continuous state spaces (Porta, Spaan, & Vlassis, 2005). As future work we would like to explore alternative compact representations (Guestrin, Koller, & Parr, 2001; =-=Theocharous, Murphy, & Kaelbling, 2004-=-), as well as applying Perseus to cooperative multiagent domains, extending recent approaches (Emery-Montemerlo, Gordon, Schneider, & Thrun, 2004; Becker, Zilberstein, Lesser, & Goldman, 2004; Paquet,... |

27 | Generalized prioritized sweeping
- Andre, Friedman, et al.
- 1998
(Show Context)
Citation Context ...y. This allows an algorithm to focus on backups which may have a high potential impact, as for instance in the prioritized sweeping algorithm for solving fully observable MDPs (Moore & Atkeson, 1993; =-=Andre, Friedman, & Parr, 1998-=-). A drawback is that the notion of an exact planning horizon is somewhat lost: in general, after performing n backup stages the computed plan will not be considering n steps into the future, but less... |

26 | Solving factored pomdps with linear value functions
- Guestrin, Koller, et al.
- 2001
(Show Context)
Citation Context ... 2005), continuous observation spaces (Hoey & Poupart, 2005), and continuous state spaces (Porta, Spaan, & Vlassis, 2005). As future work we would like to explore alternative compact representations (=-=Guestrin, Koller, & Parr, 2001-=-; Theocharous, Murphy, & Kaelbling, 2004), as well as applying Perseus to cooperative multiagent domains, extending recent approaches (Emery-Montemerlo, Gordon, Schneider, & Thrun, 2004; Becker, Zilbe... |

24 | Generalized Markov decision processes: Dynamic-programming and reinforcement-learning algorithms
- Szepesvari, Littman
- 1996
(Show Context)
Citation Context ...nd one has to resort to sampling-based techniques. The idea here is to replace the full maximization over actions with a sampled max operator that performs the maximization over a random subset of A (=-=Szepesvári & Littman, 1996-=-). This also means that one has to compute the above models ‘on the fly’ for each sampled action, which requires an algorithm (a parameterized model family) that can generate all needed models for any... |

18 | Scaling Internal-State Policy-Gradient Methods for POMDPs
- Aberdeen, Baxter
- 2002
(Show Context)
Citation Context ...ation steps. Other options for searching the policy space include gradient ascent (Meuleau, Kim, Kaelbling, & Cassandra, 1999; Kearns, Mansour, & Ng, 2000; Ng & Jordan, 2000; Baxter & Bartlett, 2001; =-=Aberdeen & Baxter, 2002-=-) and heuristic methods like stochastic local search (Braziunas & Boutilier, 2004). In particular, the Pegasus method (Ng & Jordan, 2000) estimates the value of a policy by simulating a (bounded) numb... |

17 |
An epsilon-Optimal Grid-Based Algorithm for Partially Observable Markov Decision Processes
- Bonet
- 2002
(Show Context)
Citation Context ...mains where repeated information gathering is necessary. One way to sidestep the intractability of exact POMDP value iteration is to grid the belief simplex, using either a fixed grid (Lovejoy, 1991; =-=Bonet, 2002-=-) or a variable grid (Brafman, 1997; Zhou & Hansen, 2001). Value backups are performed for every grid point, but only the value of each grid point is preserved and the gradient is ignored. The value o... |

16 | Robot planning in partially observable continuous domains
- Porta, Spaan, et al.
- 2003
(Show Context)
Citation Context ...d to deal with structured state spaces (Poupart, 2005; Boger, Poupart, Hoey, Boutilier, Fernie, & Mihailidis, 2005), continuous observation spaces (Hoey & Poupart, 2005), and continuous state spaces (=-=Porta, Spaan, & Vlassis, 2005-=-). As future work we would like to explore alternative compact representations (Guestrin, Koller, & Parr, 2001; Theocharous, Murphy, & Kaelbling, 2004), as well as applying Perseus to cooperative mult... |

14 | A fast point-based algorithm for POMDPs
- Vlassis, Spaan
- 2004
(Show Context)
Citation Context ...ction by updating the value and its gradient only at the points of a (dynamically growing) belief set. In this work we describe Perseus, a randomized point-based value iteration algorithm for POMDPs (=-=Vlassis & Spaan, 2004-=-; Spaan & Vlassis, 2004). Perseus operates on a large set of beliefs which are gathered by simulating random interactions of the agent with the POMDP environment. On this belief set a number of value ... |

13 |
Value function approximations for partially observable markov decision processes
- Hauskrecht
(Show Context)
Citation Context ... general, computing exact solutions for POMDPs is an intractable problem (Papadimitriou & Tsitsiklis, 1987; Madani, Hanks, & Condon, 1999), calling for approximate solution techniques (Lovejoy, 1991; =-=Hauskrecht, 2000-=-). In practical tasks one would like to compute solutions only for those parts of the belief simplex that are reachable, i.e., that can be actually encountered by interacting with the environment. Thi... |