## Acting under Uncertainty: Discrete Bayesian Models for Mobile-Robot Navigation (1996)

Venue: | IN PROCEEDINGS OF THE IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS |

Citations: | 190 - 12 self |

### BibTeX

@INPROCEEDINGS{Cassandra96actingunder,

author = {A. Cassandra and L. Kaelbling and J. Kurien},

title = {Acting under Uncertainty: Discrete Bayesian Models for Mobile-Robot Navigation},

booktitle = {IN PROCEEDINGS OF THE IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS},

year = {1996},

pages = {963--972},

publisher = {}

}

### Years of Citing Articles

### OpenURL

### Abstract

Discrete Bayesian models have been used to model uncertainty for mobile-robot navigation, but the question of how actions should be chosen remains largely unexplored. This paper presents the optimal solution to the problem, formulated as a partially observable Markov decision process. Since solving for the optimal control policy is intractable, in general, it goes on to explore a variety of heuristic control strategies. The control strategies are compared experimentally, both in simulation and in runs on a robot.

### Citations

632 |
Markov Decision Processes
- Puterman
- 1994
(Show Context)
Citation Context ...S, Vs(s) ? Vs0 (s). A policy is optimal if it is not dominated by any other policy. Given a Markov decision process and a value for fl, it is possible to compute the optimal policy fairly efficiently =-=[16]-=-. We shall uses(s) to refer to an optimal policy for an mdp, and express the the optimal value and Q functions as V (s) and Q(s; a). 3.2 Adding Partial Observability When the state is not completely o... |

598 |
Use of the hough transformation to detect lines and curves in pictures
- Duda, Hart
- 1972
(Show Context)
Citation Context ...d axis is estimated, a simple search for doors, openings and walls in the robot's vicinity is made to the left, front and right of the robot. If there is a rough line segment (using a Hough transform =-=[4]-=-) in the occupancy grid that is parallel or perpendicular to the estimated axis, then a wall is observed. Similarly, if the occupancy grid is largely clear along an axis an open observation results. A... |

428 |
High resolution maps from wide angle sonar
- Moravec, Elfes
- 1985
(Show Context)
Citation Context ... grid, which is the basis of high-level feature detection. Occupancy Grid The pilot fuses ultrasonic measurements in an occupancy grid using a simplified variant of the algorithm of Moravec and Elfes =-=[12]-=-. Since the pilot is responsible solely for local navigation, the occupancy grid only maps features whose distance from the robot is within a small range. As the robot moves forward, the occupancy gri... |

350 |
The optimal control of partially observable markov decision processes over the infinite horizon : Discounted cost. Operations Research 12:282–304
- Sondik
- 1978
(Show Context)
Citation Context ...ptimal policies in mdps work only in finite state spaces and the existing exact pomdp solution procedures are computationally intractable [14]. A number of algorithms exist for solving the belief mdp =-=[19, 3, 11, 9]-=-, but even the most efficient of these would do well to solve problems with 10 states and 10 observations. 5 Heuristic Control Strategies Since it is computationally intractable to compute the optimal... |

319 |
The complexity of markov decision processes
- Papadimitriou, Tsitsiklis
- 1987
(Show Context)
Citation Context ...rocess is continuous; the established algorithms for finding optimal policies in mdps work only in finite state spaces and the existing exact pomdp solution procedures are computationally intractable =-=[14]-=-. A number of algorithms exist for solving the belief mdp [19, 3, 11, 9], but even the most efficient of these would do well to solve problems with 10 states and 10 observations. 5 Heuristic Control S... |

311 |
The optimal control of partially observable Markov processes over a finite horizon
- Smallwood, Sondik
- 1973
(Show Context)
Citation Context ...re Pr(o j a; b) is defined above. The reward function,sae, is constructed from R by taking expectations according to the belief state; that is, ae(b; a) = X s2S b(s)R(s; a) : The belief mdp is Markov =-=[18]-=-, that is, having information about previous belief states cannot improve the choice of action. Most importantly, if an agent adopts the optimal policy for the belief mdp, the resulting behavior will ... |

304 |
Switching and Finite Automata Theory
- Kohavi
- 1979
(Show Context)
Citation Context ...ue of being confused. We used a sequence of length 20, which consisted of a repetition of 5 moveforwardsactions followed by a single turn-left action. This sequence was based upon the homing sequence =-=[6]-=- for a determinized version of our domain. We define the weighted entropy value as EV (b) = ~ H(b 0 )(b 0 \Delta V L ) + (1 \Gamma ~ H(b 0 ))(b 0 \Delta V ) ; and the associated Q value as EQ(b; a) = ... |

289 | Acting optimally in partially observable stochastic domains
- Cassandra, Kaelbling, et al.
- 1994
(Show Context)
Citation Context ...partially observable Markov decision processes (pomdps). This model was developed in the operations research community [10, 20] and has been recently introduced to artificial intelligence researchers =-=[2]-=-. We start by describing the simpler class of Markov decision processes. 3.1 Markov Decision Processes An mdp is defined by the tuple hS; A; T; Ri, where S is a finite set of environment states that c... |

241 | Learning policies for partially observable environments: Scaling up
- Littman, Cassandra, et al.
- 1995
(Show Context)
Citation Context ...7]. The only difference is that they used a deterministic planning algorithm to choose the best action for each world state, rather than finding the best action in the underlying mdp. The QMDP method =-=[8]-=- is a more refined version of the voting method, in which the votes of each state are apportioned among the actions according to their Q value: Q mdp (b) = argmax a ( X s b(s) \Delta Q(s; a)) : This i... |

204 |
A survey of partially observable Markov decision processes: Theory, models, and algorithms
- Monahan
- 1982
(Show Context)
Citation Context ...ptimal policies in mdps work only in finite state spaces and the existing exact pomdp solution procedures are computationally intractable [14]. A number of algorithms exist for solving the belief mdp =-=[19, 3, 11, 9]-=-, but even the most efficient of these would do well to solve problems with 10 states and 10 observations. 5 Heuristic Control Strategies Since it is computationally intractable to compute the optimal... |

188 |
A survey of algorithmic methods for partially observed Markov decision processes
- Lovejoy
- 1991
(Show Context)
Citation Context ... 3 POMDP Model In this section, we briefly describe the class of models known as partially observable Markov decision processes (pomdps). This model was developed in the operations research community =-=[10, 20]-=- and has been recently introduced to artificial intelligence researchers [2]. We start by describing the simpler class of Markov decision processes. 3.1 Markov Decision Processes An mdp is defined by ... |

170 |
DERVISH: An office-navigating robot
- Nourbakhsh, Powers, et al.
- 1995
(Show Context)
Citation Context ...le, that the robot will know that it is in a corner of the building, but not which one. This is by no means the first project to use discrete belief models. The Dervish project at Stanford University =-=[13]-=- used a topological map combined with robust low-level behaviors. Their belief-state update was heuristic, based only on a model of observational error (but not error due to actions). The Xavier proje... |

121 | Approximating Optimal Policies for Partially Observable Stochastic Domains
- Parr, Russel
- 1995
(Show Context)
Citation Context ...tainty. None of them is able to take a long string of actions in order to disambiguate its belief state. We are beginning to apply some more sophisticated methods from the machine learning literature =-=[8, 15]-=- that find approximations to the true value function of the pomdp. We expect that they will improve performance when the starting location is extremely uncertain. The simple domains explored in this p... |

113 | The Mobile Robot Rhino
- Buhmann, Burgard, et al.
- 1995
(Show Context)
Citation Context ... model involving the spread of the ultrasonic wave over distance. Restricting the grid to a local region allows numerous simplifications over global occupancy-grid methods such as those used in RHINO =-=[1]-=-. First, even when the robot is operating in very narrow corridors there is no preprocessing of ultrasonic data to eliminate higher-order specular reflection. Any ultrasonic reading greater than 1.5 m... |

84 | Unsupervised learning of probabilistic models for robot navigation - Koenig, Simmons - 1996 |

84 | Probabilistic navigation in partially observable environments
- Simmons, Koening
- 1995
(Show Context)
Citation Context ...with robust low-level behaviors. Their belief-state update was heuristic, based only on a model of observational error (but not error due to actions). The Xavier project at Carnegie-Mellon University =-=[17]-=- used a full belief-state update. Both of these projects used fairly ad hoc action strategies, based on planning paths as if the domain were deterministic. The major contribution of this paper is to o... |

74 |
Algorithms for Partially Observable Markov Decision Processes
- Cheng
- 1988
(Show Context)
Citation Context ...ptimal policies in mdps work only in finite state spaces and the existing exact pomdp solution procedures are computationally intractable [14]. A number of algorithms exist for solving the belief mdp =-=[19, 3, 11, 9]-=-, but even the most efficient of these would do well to solve problems with 10 states and 10 observations. 5 Heuristic Control Strategies Since it is computationally intractable to compute the optimal... |

27 |
Partially observed Markov decision processes: A survey
- White
- 1991
(Show Context)
Citation Context ... 3 POMDP Model In this section, we briefly describe the class of models known as partially observable Markov decision processes (pomdps). This model was developed in the operations research community =-=[10, 20]-=- and has been recently introduced to artificial intelligence researchers [2]. We start by describing the simpler class of Markov decision processes. 3.1 Markov Decision Processes An mdp is defined by ... |

6 |
Mobile-robot localization by tracking geometric beacons
- Leonard, Durrant-Whyte
- 1991
(Show Context)
Citation Context ...for taking the robot's current beliefs and combining them with uncertain information gained from sensing and acting. The Bayesian framework has been used for a long time in robotics with good results =-=[7]-=-. In this paper, we develop a two-level architecture, with Bayesian modeling done at the top level only. In addition, we explore sub-optimal control strategies given the Bayesian belief state. The sta... |

4 |
An efficient algorithm for dynamic programming in partially observable markov decision processes
- Littman, Cassandra, et al.
- 1995
(Show Context)
Citation Context |