## Online Planning in Continuous POMDPs with Open-Loop Information-Gathering Plans

### BibTeX

@MISC{Hauser_onlineplanning,

author = {Kris Hauser},

title = {Online Planning in Continuous POMDPs with Open-Loop Information-Gathering Plans},

year = {}

}

### OpenURL

### Abstract

Abstract—This paper studies the convergence properties of a receding-horizon information-gathering strategy used in the recently presented RBSR planner for continuous POMDPs. The planner uses a combination of randomized exploration, particle filtering, and goal-seeking heuristic policies to achieve scalability to high-dimensional continuous spaces. We show that convergence is ensured in a subclass of problems where information gain rate exceeds the rate of information loss through process noise. Because these rates are not defined myopically RBSR is able to perform long open-loop information-gathering plans. The technique is demonstrated on a variety of discrete planning benchmarks as well as target-finding and localization problems in up to 7D continuous state spaces. I.

### Citations

890 | Probabilistic roadmaps for path planning in high-dimensional configuration spaces
- Kavraki, Svestka, et al.
- 1996
(Show Context)
Citation Context ...mality for reduction in computational expense. Several recently developed algorithms attempt to address continuous spaces by leveraging the success of probabilistic roadmaps (PRMs) in motion planning =-=[11]-=-, which build a network of states sampled at random from the configuration space. Alterovitz et al (2007) present a Stochastic MotionFig. 1: An execution trace of a robot (large circle) searching for... |

657 | On sequential Monte Carlo sampling methods for Bayesian filtering
- Doucet, Godsill, et al.
- 2000
(Show Context)
Citation Context ...he following scalable implementation [7]: 1) Belief states are represented as a weighted set of state hypotheses {(wi, si)} n i=1 and the update in (2) is computed using particle filtering techniques =-=[4]-=-. 2) For open-loop exploration RBSR uses a Voronoi exploration strategy to distribute N belief states sparsely. 3) The MDP value function (5) is approximated by sampling the state space in precomputat... |

245 | Point-based value iteration: An anytime algorithm for pomdps
- Pineau, Gordon, et al.
- 2003
(Show Context)
Citation Context ...paces [17]. Approximate planning in discrete spaces is a field of active research, yielding several techniques based on the point-based algorithms devised by Kearns et al [12] and Pineau et al (2003) =-=[18]-=-. For example, the SARSOP algorithm developed by Kurniawati et al (2008) has solved problems with thousands of discrete states in seconds [14]. One approach to continuous problems is to discretize sta... |

232 | Learning Policies for Partially Observable Environments: Scaling Up
- Littman, Cassandra, et al.
- 1995
(Show Context)
Citation Context ... space of policies that consist of open-loop actions followed by the closedloop QMDP policy (Figure 1). These components are complementary; QMDP provides excellent goal guidance under low uncertainty =-=[16]-=-, while the open-loop exploration enables active sensing actions to be taken as long as they aid progress toward the goal. Though the policy space is relatively limited, the system exhibits rich behav... |

228 | Rapidly-exploring random trees: Progress and prospects
- Lavalle, Kuffner
- 2000
(Show Context)
Citation Context ... in order to localize its x coordinate (left room). A. Voronoi-Biased Exploration Strategy The Voronoi-biasing exploration strategy is much like the Rapidly-Exploring Random Tree (RRT) motion planner =-=[15]-=- and is designed to cover the space of reachable open-loop motions quickly. To expand the tree, we sample a random target point stgt from the state space S, and sample a set of representative particle... |

111 | Approximate planning in large POMDPs via reusable trajectories
- Kearns, Mansour, et al.
- 2000
(Show Context)
Citation Context ...en for small discrete state spaces [17]. Approximate planning in discrete spaces is a field of active research, yielding several techniques based on the point-based algorithms devised by Kearns et al =-=[12]-=- and Pineau et al (2003) [18]. For example, the SARSOP algorithm developed by Kurniawati et al (2008) has solved problems with thousands of discrete states in seconds [14]. One approach to continuous ... |

107 | Point-based POMDP Algorithms: Improved Analysis and Implementation
- Smith, Simmons
- 2005
(Show Context)
Citation Context ...acking over long horizons. VI. EXPERIMENTAL RESULTS A. Discrete POMDP Benchmarks We evaluated RBSR against QMDP [16] and several modern offline, point-based POMDP solvers: PBVI [18], HSVI [23], HSVI2 =-=[24]-=-, and SARSOP [14]. The purpose of this comparison is not to treat RBSR in competition to these techniques but rather to explore how the heuristic information gatheringProblem Planner Return Time (s) ... |

99 |
carlo POMDPs
- Thrun, “Monte
- 2000
(Show Context)
Citation Context ...ates as particles or mixtures of Gaussians [20]. Thrun (2000) presented a technique that also works with continuous spaces by combining particle filtering with reinforcement learning on belief states =-=[25]-=-. For both of these methods, the need to approximate the value function over the infinitedimensional belief space (either using alpha-vector or Q-value representations, respectively) comes at a high c... |

98 | Heuristic search value iteration for POMDPs
- Smith, Simmons
- 2004
(Show Context)
Citation Context ...ief state tracking over long horizons. VI. EXPERIMENTAL RESULTS A. Discrete POMDP Benchmarks We evaluated RBSR against QMDP [16] and several modern offline, point-based POMDP solvers: PBVI [18], HSVI =-=[23]-=-, HSVI2 [24], and SARSOP [14]. The purpose of this comparison is not to treat RBSR in competition to these techniques but rather to explore how the heuristic information gatheringProblem Planner Retu... |

87 | SARSOP: Efficient Point-based POMDP Planning by Approximating Optimally Reachable Belief Spaces
- Kurniawati, Hsu, et al.
- 2008
(Show Context)
Citation Context ...thms devised by Kearns et al [12] and Pineau et al (2003) [18]. For example, the SARSOP algorithm developed by Kurniawati et al (2008) has solved problems with thousands of discrete states in seconds =-=[14]-=-. One approach to continuous problems is to discretize state/action/observation spaces. But because of the “curse of dimensionality”, any regular discretization of a highdimensional space will require... |

77 | The computational complexity of probabilistic planning
- Littman, Goldsmith, et al.
- 1998
(Show Context)
Citation Context ...o 7D state spaces. II. RELATED WORK Optimal planning in partially-observable problems is extremely computationally complex and is generally considered intractable even for small discrete state spaces =-=[17]-=-. Approximate planning in discrete spaces is a field of active research, yielding several techniques based on the point-based algorithms devised by Kearns et al [12] and Pineau et al (2003) [18]. For ... |

70 | Rollout Algorithms for Stochastic Scheduling Problems
- Bertsekas, Castanon
- 1999
(Show Context)
Citation Context ...bservations received. Recently we discovered a similarity between RBSR and the “Rollout” approach of Bertsekas and Castañon (1999) presented in the context of a sequential stochastic decision problem =-=[2]-=-. The major difference is that RBSR uses a Monte-Carlo search, biased by a Voronoi distance heuristic, to generate a much deeper rollout search tree. Our analysis applies to much more general POMDPs a... |

69 | Online Planning Algorithms for POMDPs
- Ross, Pineau, et al.
- 2008
(Show Context)
Citation Context ...heuristics that perform well in wide subclasses of POMDPs. In prior work [7] we presented a Randomized BeliefSpace Replanning (RBSR) technique that addressed continuous POMDPs using a online approach =-=[22]-=-, where each step maximizes expected return over a reduced space of policies that consist of open-loop actions followed by the closedloop QMDP policy (Figure 1). These components are complementary; QM... |

46 | The Stochastic Motion Roadmap: A sampling framework for planning with Markov motion uncertainty
- Alterovitz, Simeon, et al.
- 2007
(Show Context)
Citation Context ...ts) and the current plan (orange) is updated by replanning. Roadmap planner for continuous spaces with motion uncertainty, which solves an MDP using the discretization of state space induced by a PRM =-=[1]-=-. The techniques of Burns and Brock (2007) and Guibas et al (2008) augment roadmaps with edge costs for motions that have high probability of being in collision, and respectively address the problems ... |

35 | Point-based value iteration for continuous pomdps
- Porta, Vlassis, et al.
- 2006
(Show Context)
Citation Context ...ably large number of states. Porta et al (2006) has made progress in extending pointbased value iteration to the continuous setting by representing belief states as particles or mixtures of Gaussians =-=[20]-=-. Thrun (2000) presented a technique that also works with continuous spaces by combining particle filtering with reinforcement learning on belief states [25]. For both of these methods, the need to ap... |

32 |
The Belief Roadmap: Efficient Planning in Belief Space by Factoring the Covariance
- Prentice, Roy
- 2009
(Show Context)
Citation Context ...of Prentice and Roy (2009) computes a roadmap of belief states under both motion and sensing uncertainty, under the assumptions of Gaussian uncertainty and linear transition and observation functions =-=[21]-=-. van den Berg et al (2010) consider path planning while optimizing the likelihood that a path is collisionfree, under the assumption that a Linear-Quadratic-Gaussian feedback controller is used to fo... |

26 | Belief space planning assuming maximum likelihood observations
- Platt, Tedrake, et al.
- 2010
(Show Context)
Citation Context ... used to follow the path. Platt et al (2010) and du Toit and Burdick (2010) construct plans using a maximum-likelihood observation assumption, and correcting for observation errors by replanning [5], =-=[19]-=-. RBSR also uses a replanning strategy, but uses a particle-based uncertainty representation that is better at handling nonlinear and multimodal distributions, and makes no assumptions on the type of ... |

23 |
Planning under uncertainty with macro-actions
- He, Brunskill, et al.
- 2010
(Show Context)
Citation Context ...informationgathering macro-actions on the fly. Two other recent works have also tackled the problem of constructing macro-actions automatically and with increasing granularity during forward planning =-=[8]-=-, [13]. These approaches are limited to macroactions that reach subgoal states, and we suspect that RBSR constructs better information-gathering macro-actions using belief space criteria; on the other... |

19 | Motion planning under uncertainty for robotic tasks with long time horizons
- Kurniawati, Du, et al.
- 2011
(Show Context)
Citation Context ...mationgathering macro-actions on the fly. Two other recent works have also tackled the problem of constructing macro-actions automatically and with increasing granularity during forward planning [8], =-=[13]-=-. These approaches are limited to macroactions that reach subgoal states, and we suspect that RBSR constructs better information-gathering macro-actions using belief space criteria; on the other hand ... |

17 | Sampling-based motion planning with sensing uncertainty
- Burns, Brock
- 2007
(Show Context)
Citation Context ...et al (2008) augment roadmaps with edge costs for motions that have high probability of being in collision, and respectively address the problems of localization errors and environment sensing errors =-=[3]-=-, [6]. Huang and Gupta (2009) address planning for manipulators under base uncertainty by associating probabilistic roadmaps with particles representing state hypotheses and searching for a short path... |

16 | Robotic motion planning in dynamic, cluttered, uncertain environments
- Toit, Burdick
- 2010
(Show Context)
Citation Context ...er is used to follow the path. Platt et al (2010) and du Toit and Burdick (2010) construct plans using a maximum-likelihood observation assumption, and correcting for observation errors by replanning =-=[5]-=-, [19]. RBSR also uses a replanning strategy, but uses a particle-based uncertainty representation that is better at handling nonlinear and multimodal distributions, and makes no assumptions on the ty... |

11 | Bounded uncertainty roadmaps for path planning
- Guibas, Hsu, et al.
- 2008
(Show Context)
Citation Context ... (2008) augment roadmaps with edge costs for motions that have high probability of being in collision, and respectively address the problems of localization errors and environment sensing errors [3], =-=[6]-=-. Huang and Gupta (2009) address planning for manipulators under base uncertainty by associating probabilistic roadmaps with particles representing state hypotheses and searching for a short path that... |

10 | Randomized belief-space replanning in partially-observable continuous spaces
- Hauser
- 2011
(Show Context)
Citation Context ...ed on passive sensing or explicit information gathering rewards. Therefore it is highly valuable to identify scalable, general heuristics that perform well in wide subclasses of POMDPs. In prior work =-=[7]-=- we presented a Randomized BeliefSpace Replanning (RBSR) technique that addressed continuous POMDPs using a online approach [22], where each step maximizes expected return over a reduced space of poli... |

8 | Robust belief-based execution of manipulation programs
- Hsiao, Kaelbling, et al.
- 2008
(Show Context)
Citation Context ...OMDPs by reducing exploration breadth and depth [16]. Hsiao et. al. (2008) addressed a robot grasping problem using specially constructed macroactions that either provide information or seek the goal =-=[9]-=-. They demonstrate that if uncertainty grows slowly during information-gathering, then forward planning can be limited to depth one. RBSR can also be interpreted as depth-one forward planning, using t... |

7 |
Collision-probability constrained PRM for a manipulator with base pose uncertainty
- Huang, Gupta
- 2009
(Show Context)
Citation Context ... planning for manipulators under base uncertainty by associating probabilistic roadmaps with particles representing state hypotheses and searching for a short path that is likely to be collision free =-=[10]-=-. Another set of related approaches use assumptions of Gaussian observation and process noise, which makes planning much faster because probabilistic inference can be performed in closed form. The Bel... |