## Xavier: A Robot Navigation Architecture Based on Partially Observable Markov Decision Process Models (1998)

Venue: | Artificial Intelligence Based Mobile Robotics: Case Studies of Successful Robot Systems |

Citations: | 107 - 9 self |

### BibTeX

@INPROCEEDINGS{Koenig98xavier:a,

author = {Sven Koenig and Reid G. Simmons},

title = {Xavier: A Robot Navigation Architecture Based on Partially Observable Markov Decision Process Models},

booktitle = {Artificial Intelligence Based Mobile Robotics: Case Studies of Successful Robot Systems},

year = {1998},

pages = {91--122},

publisher = {MIT Press}

}

### Years of Citing Articles

### OpenURL

### Abstract

Autonomous mobile robots need very reliable navigation capabilities in order to operate unattended for long periods of time. We present a technique for achieving this goal that uses partially observable Markov decision process models (POMDPs) to explicitly model navigation uncertainty, including actuator and sensor uncertainty and approximate knowledge of the environment. This allows the robot to maintain a probability distribution over its current pose. Thus, while the robot rarely knows exactly where it is, it always has some belief as to what its true pose is, and is never completely lost. We present a navigation architecture based on POMDPs that provides a uniform framework with an established theoretical foundation for pose estimation, path planning, robot control during navigation, and learning. Our experiments show that this architecture indeed leads to robust corridor navigation for an actual indoor mobile robot. 1

### Citations

2856 |
Dynamic Programming
- Bellman
- 1957
(Show Context)
Citation Context ...licy assigns to the current state of the process. Such an optimal policy can be determined by solving the following system of |S| equations for the variables v(s), that is known as Bellman’s Equation =-=[1]-=-: v(s) = max [r(s, a) + γ a∈A(s) � [p(s ′ |s, a)v(s ′ )]] for all s ∈ S. (3) s ′ ∈S v(s) is the expected total reward if the process starts in state s and the decision maker acts optimally. The optima... |

1262 |
Error bounds for convolution codes and an asymptotically optimal decoding algorithm
- Viterbi
- 1967
(Show Context)
Citation Context ...e POMDP process at each point in time, merely connecting these states might not result in a continuous path. The Viterbi algorithm uses dynamic programming to compute the most likely path efficiently =-=[36]-=-. Its first three steps are Steps A1. to A3. in Section 3.3, except that the summations on Lines A3.(a) and A3.(b) are replaced by maximizations: 8 .stopological map prior actuator model prior sensor ... |

900 | An introduction to hidden Markov models
- Rabiner, Juang
- 1986
(Show Context)
Citation Context ...p, actuator, and sensor models. While there is no known technique for doing this efficiently, there exist efficient algorithms that approximate the optimal POMDP [4, 22, 34]. The Baum-Welch algorithm =-=[26]-=- is one such algorithm. This iterative expectation-maximization algorithm does not require control of the POMDP process and thus can be used by an observer to learn the POMDP. It overcomes the problem... |

350 |
The optimal control of partially observable markov decision processes over the infinite horizon : Discounted cost. Operations Research 12:282–304
- Sondik
- 1978
(Show Context)
Citation Context ...)α(s ′ )]]. for all s ∈ S. Any mapping from states to actions that is optimal for this completely observable Markov decision process model is also optimal for the POMDP (under reasonable assumptions) =-=[33]-=-. This means that 6sfor POMDPs, there always exists a POMDP policy (a mapping from state distributions to actions) that maximizes the expected total reward. This policy can be pre-computed. During act... |

321 |
On the representation and estimation of spatial uncertainty
- Smith, Cheeseman
- 1987
(Show Context)
Citation Context ...example, blockages can change over time as people close and open doors and block and unblock corridors. Previously reported approaches that maintain pose distributions often use either Kalman filters =-=[16, 32]-=- or temporal Bayesian networks [5]. Both approaches can utilize motion and sensor reports to update the pose distribution. Kalman filters model only restricted pose distributions in continuous pose sp... |

319 |
The complexity of markov decision processes
- Papadimitriou, Tsitsiklis
- 1987
(Show Context)
Citation Context ...finite. Thus, the completely observable Markov decision process model is infinite, and an optimal policy cannot be found efficiently. In fact, the POMDP planning problem is PSPACE-complete in general =-=[24]-=-. However, there are POMDP planning algorithms that trade off solution quality for speed, but that usually do not provide quality guarantees [19, 25]. The SPOVA-RL algorithm [25], for example, can det... |

316 | Using occupancy grids for mobile robot perception and navigation,” Computer 22(6), [I51
- Elfes
- 1991
(Show Context)
Citation Context ...ection 4.2.2). Table 1 lists the sensors that we currently use, together with the features that they report on. The sensor reports are derived from the raw sensor data by using a small occupancy grid =-=[8]-=- in the coordinates of the robot that is centered around the robot (Figure 3). The occupancy grid combines the raw data from all sonar sensors and integrates them over the recent past. The occupancy g... |

293 | Hidden Markov Models for Speech Recognition
- Huang, Ariki, et al.
- 1990
(Show Context)
Citation Context ...terministic effects. Properties of POMDPs have been studied extensively in Operations Research. In Artificial Intelligence and Robotics, POMDPs have been applied to speech and handwriting recognition =-=[11]-=- and the interpretation of tele-operation commands [10, 37]. They have also gained popularity in the Artificial Intelligence community as a formal model for planning under uncertainty [3, 12]. Consequ... |

289 | Acting optimally in partially observable stochastic domains
- Cassandra, Kaelbling, et al.
- 1994
(Show Context)
Citation Context ...recognition [11] and the interpretation of tele-operation commands [10, 37]. They have also gained popularity in the Artificial Intelligence community as a formal model for planning under uncertainty =-=[3, 12]-=-. Consequently, standard algorithms are available to solve tasks that are typically encountered by observers and decision makers. In the following, we describe some of these algorithms. 3.1 State Esti... |

263 | Probabilistic robot navigation in partially observable environments
- Simmons, Koenig
- 1995
(Show Context)
Citation Context ... complete the policy as follows: • The “Most Likely State” Strategy [23] executes the action that is assigned to the most likely state, that is, the action a(arg maxs∈S α(s)). • The “Voting” Strategy =-=[31]-=- executes � the action with the highest probability mass according to α, that is, the action arg maxa∈A s∈S|a(s)=a α(s). • The “Completely � Observable � after the First Step” Strategy [4, 35] execute... |

241 | Learning policies for partially observable environments: Scaling up
- Littman, Cassandra, et al.
- 1995
(Show Context)
Citation Context ...e POMDP planning problem is PSPACE-complete in general [24]. However, there are POMDP planning algorithms that trade off solution quality for speed, but that usually do not provide quality guarantees =-=[19, 25]-=-. The SPOVA-RL algorithm [25], for example, can determine approximate policies for POMDPs with about a hundred states in a reasonable amount of time. Further performance improvements are anticipated, ... |

200 | Reinforcement learning with perceptual aliasing: The perceptual distinctions approach
- Chrisman
- 1992
(Show Context)
Citation Context ...” Strategy [31] executes � the action with the highest probability mass according to α, that is, the action arg maxa∈A s∈S|a(s)=a α(s). • The “Completely � Observable � after the First Step” Strategy =-=[4, 35]-=- executes the action arg maxa∈A s∈S [α(s)(r(s, a)+γ s ′ ∈S [p(s′ |s, a)v(s ′ )])]. This approach allows one, for example, to choose the second best action if all states disagree on the best action but... |

190 | Acting under uncertainty: Discrete bayesian models for mobile-robot navigation
- Cassandra, Kaelbling, et al.
- 1996
(Show Context)
Citation Context ...ion of the robot is known with certainty, but do not utilize any metric information (the states of the robot are either at a topological node or somewhere in a connecting corridor). Cassandra et. al. =-=[2]-=- build on our work, and consequently use Markov models similar to ours, including modeling distance information, but assume that the distances are known with certainty. 3 An Introduction to POMDPs Thi... |

181 | Algorithms for Sequential Decision Making
- Littman
- 1996
(Show Context)
Citation Context ...te policies for POMDPs with about a hundred states in a reasonable amount of time. Further performance improvements are anticipated, since POMDP planning algorithms are the object of current research =-=[18]-=- and researchers are starting to investigate, for example, how to exploit the restricted topology of some POMDPs. We describe here three greedy POMDP planning approaches that can find policies for lar... |

174 |
Structured control for autonomous robots
- Simmons
- 1994
(Show Context)
Citation Context ...Architecture. The Task Control Architecture provides facilities for interprocess communication, task decomposition and sequencing, execution monitoring and exception handling, and resource management =-=[28]-=-. Finally, interaction with the robot is via the World Wide Web, which provides pages for both commanding the robot and monitoring its progress. In the following, Section 2 contrasts our POMDP-based n... |

171 |
an office-navigating robot
- Nourbakhsh, Powers, et al.
- 1995
(Show Context)
Citation Context ... that use Markov models for robot navigation: Dean et. al. [6] use Markov models, but, different from our approach, assume that the location of the robot is always known precisely. Nourbakhsh et. al. =-=[23]-=- use Markov models that do not assume that the location of the robot is known with certainty, but do not utilize any metric information (the states of the robot are either at a topological node or som... |

143 | Planning with deadlines in stochastic domains
- Dean, Kaelbling, et al.
- 1993
(Show Context)
Citation Context ...but do not suffer from the limited horizon problem for planning, since their lookahead is unlimited. There have been several other approaches that use Markov models for robot navigation: Dean et. al. =-=[6]-=- use Markov models, but, different from our approach, assume that the location of the robot is always known precisely. Nourbakhsh et. al. [23] use Markov models that do not assume that the location of... |

140 | Hidden markov model induction by bayesian model merging
- Stolcke, Omohundro
- 1993
(Show Context)
Citation Context ...odel from experience, including the map, actuator, and sensor models. While there is no known technique for doing this efficiently, there exist efficient algorithms that approximate the optimal POMDP =-=[4, 22, 34]-=-. The Baum-Welch algorithm [26] is one such algorithm. This iterative expectation-maximization algorithm does not require control of the POMDP process and thus can be used by an observer to learn the ... |

133 | On the complexity of solving markov decision problems
- Littman, Dean, et al.
- 1995
(Show Context)
Citation Context ...The optimal action to execute in state s is a(s) = arg maxa∈A(s)[r(s, a) + γ � s ′ ∈S [p(s′ |s, a)v(s ′ )]]. The system of equations can be solved in polynomial time using dynamic programming methods =-=[20]-=-. A popular dynamic programming method is value iteration [1] (we leave the termination criterion unspecified): 1. Set v1(s) := 0 for all s ∈ S. Set t := 1. 2. Set vt+1(s) := maxa∈A(s)[r(s, a) + γ � 3... |

121 | Approximating Optimal Policies for Partially Observable Stochastic Domains
- Parr, Russel
- 1995
(Show Context)
Citation Context ...e POMDP planning problem is PSPACE-complete in general [24]. However, there are POMDP planning algorithms that trade off solution quality for speed, but that usually do not provide quality guarantees =-=[19, 25]-=-. The SPOVA-RL algorithm [25], for example, can determine approximate policies for POMDPs with about a hundred states in a reasonable amount of time. Further performance improvements are anticipated, ... |

115 | The curvature-velocity method for local obstacle avoidance, in
- Simmons
- 1996
(Show Context)
Citation Context ...m include a servo-control layer that controls the motors of the robot, an obstacle avoidance layer that keeps the robot moving smoothly in a goal direction while avoiding static and dynamic obstacles =-=[29]-=-, a path planning layer that reasons about uncertainty to choose paths that have high expected utility [13], and a multiple-task planning layer that uses PRODIGY, a symbolic, non-linear planner, to in... |

108 |
Fast vision-guided mobile robot navigation using model-based reasoning and prediction of uncertainties
- Kosaka, Kak
- 1992
(Show Context)
Citation Context ...example, blockages can change over time as people close and open doors and block and unblock corridors. Previously reported approaches that maintain pose distributions often use either Kalman filters =-=[16, 32]-=- or temporal Bayesian networks [5]. Both approaches can utilize motion and sensor reports to update the pose distribution. Kalman filters model only restricted pose distributions in continuous pose sp... |

89 | A layered architecture for office delivery robots
- Simmons, Goodwin, et al.
- 1997
(Show Context)
Citation Context ... pose information and the plans) can be utilized by higher-level planning modules, such as task planners. The POMDP-based navigation architecture is one layer of our office delivery system (Figure 1) =-=[30]-=-. Besides the navigation layer described here, the layers of the system include a servo-control layer that controls the motors of the robot, an obstacle avoidance layer that keeps the robot moving smo... |

84 | Unsupervised learning of probabilistic models for robot navigation
- Koenig, Simmons
- 1996
(Show Context)
Citation Context ... := � � γt(s)/ γt(s) for all s ∈ S and all o ∈ O. t=1...T |ot=o t=1...T To apply the Baum-Welch algorithm to real-world problems, there exist standard techniques for dealing with the following issues =-=[15]-=-: when to stop iterating the algorithm, with which initial POMDP to start the algorithm and how often to apply it to different initial POMDPs, how to handle transition or observation probabilities tha... |

82 | A Robust, Qualitative Method for Robot Spatial Learning
- Kuipers, Byun
- 1988
(Show Context)
Citation Context ... While some landmark-based approaches use motion reports, mostly to resolve topological ambiguities, and some metric-based approaches use sensor reports to continuously realign the robot with the map =-=[17, 21]-=-, the two sources of information are treated differently. We want an approach that seamlessly integrates both sources of information, and is amenable to adding new sources such as a-priori information... |

64 | Interleaving planning and robot execution for asynchronous user requests
- Haigh, Veloso
- 1996
(Show Context)
Citation Context ...aths that have high expected utility [13], and a multiple-task planning layer that uses PRODIGY, a symbolic, non-linear planner, to integrate and schedule delivery requests that arrive asynchronously =-=[9]-=-. The layers, which are implemented as a number of distributed, concurrent processes operating on several processors, are integrated using the Task Control Architecture. The Task Control Architecture ... |

63 | Hidden Markov model approach to skill learning and its application in telerobotics
- Yang, Xu, et al.
- 1994
(Show Context)
Citation Context ...died extensively in Operations Research. In Artificial Intelligence and Robotics, POMDPs have been applied to speech and handwriting recognition [11] and the interpretation of tele-operation commands =-=[10, 37]-=-. They have also gained popularity in the Artificial Intelligence community as a formal model for planning under uncertainty [3, 12]. Consequently, standard algorithms are available to solve tasks tha... |

51 | Passive distance learning for robot navigation
- Koenig, Simmons
- 1996
(Show Context)
Citation Context ...we have extended the BaumWelch algorithm to address the issues of limited memory and the cost of collecting training data [15] and augmented it so that it is able to change the structure of the POMDP =-=[14]-=-. 3.4 Most Likely Path: Determining the State Sequence from Observations Assume that an observer wants to determine the most likely sequence of states that the POMDP process was in. This corresponds t... |

43 |
Baum's forward-backward algorithm revisited
- Devijver
- 1985
(Show Context)
Citation Context ...s the trace. The Baum-Welch algorithm estimates the improved POMDP in three steps. First Step: A dynamic programming approach (“forward-backward algorithm”) is used that applies Bayes rule repeatedly =-=[7]-=-. 7sThe forward phase calculates scaling factors scalet and alpha values αt(s) = p(st = s|o1...t, a1...t−1) for all s ∈ S and t = 1 . . . T . The alpha values are the state distributions calculated in... |

43 |
Hidden Markov Model Analysis of Force/Torque Information in Telemanipulation
- Hannaford, Lee
- 1991
(Show Context)
Citation Context ...died extensively in Operations Research. In Artificial Intelligence and Robotics, POMDPs have been applied to speech and handwriting recognition [11] and the interpretation of tele-operation commands =-=[10, 37]-=-. They have also gained popularity in the Artificial Intelligence community as a formal model for planning under uncertainty [3, 12]. Consequently, standard algorithms are available to solve tasks tha... |

40 | Instance-based state identification for reinforcement learning
- McCallum, Tesauro, et al.
- 1995
(Show Context)
Citation Context ...odel from experience, including the map, actuator, and sensor models. While there is no known technique for doing this efficiently, there exist efficient algorithms that approximate the optimal POMDP =-=[4, 22, 34]-=-. The Baum-Welch algorithm [26] is one such algorithm. This iterative expectation-maximization algorithm does not require control of the POMDP process and thus can be used by an observer to learn the ... |

27 |
Environmental learning using a distributed represen tation",IEEE
- Mataric
- 1990
(Show Context)
Citation Context ... While some landmark-based approaches use motion reports, mostly to resolve topological ambiguities, and some metric-based approaches use sensor reports to continuously realign the robot with the map =-=[17, 21]-=-, the two sources of information are treated differently. We want an approach that seamlessly integrates both sources of information, and is amenable to adding new sources such as a-priori information... |

24 |
Becoming increasingly reliable
- Simmons
- 1994
(Show Context)
Citation Context ...us navigation in office environments (with corridors, foyers, and rooms) for an actual indoor mobile robot, significantly outperforming the landmark-based navigation technique that we used previously =-=[27]-=-. The POMDP-based navigation architecture uses a compiler that automatically produces POMDPs from topological maps, actuator and sensor models, and uncertain knowledge of the environment. The resultin... |

23 |
Coping with uncertainty in a control system for navigation and exploration
- Dean, Basye, et al.
- 1990
(Show Context)
Citation Context ... as people close and open doors and block and unblock corridors. Previously reported approaches that maintain pose distributions often use either Kalman filters [16, 32] or temporal Bayesian networks =-=[5]-=-. Both approaches can utilize motion and sensor reports to update the pose distribution. Kalman filters model only restricted pose distributions in continuous pose space. In the simplest case, these a... |

23 |
Optimal probabilistic and decision-theoretic planning using Markovian decision theory
- Koenig
- 1992
(Show Context)
Citation Context ...recognition [11] and the interpretation of tele-operation commands [10, 37]. They have also gained popularity in the Artificial Intelligence community as a formal model for planning under uncertainty =-=[3, 12]-=-. Consequently, standard algorithms are available to solve tasks that are typically encountered by observers and decision makers. In the following, we describe some of these algorithms. 3.1 State Esti... |

22 |
Robot navigation with markov models: A framework for path planning and learning with limited computational resources
- Koenig, Goodwin, et al.
- 1995
(Show Context)
Citation Context ...ps the robot moving smoothly in a goal direction while avoiding static and dynamic obstacles [29], a path planning layer that reasons about uncertainty to choose paths that have high expected utility =-=[13]-=-, and a multiple-task planning layer that uses PRODIGY, a symbolic, non-linear planner, to integrate and schedule delivery requests that arrive asynchronously [9]. The layers, which are implemented as... |

12 |
Learning via task decomposition
- Tenenberg, Karlsson, et al.
- 1992
(Show Context)
Citation Context ...” Strategy [31] executes � the action with the highest probability mass according to α, that is, the action arg maxa∈A s∈S|a(s)=a α(s). • The “Completely � Observable � after the First Step” Strategy =-=[4, 35]-=- executes the action arg maxa∈A s∈S [α(s)(r(s, a)+γ s ′ ∈S [p(s′ |s, a)v(s ′ )])]. This approach allows one, for example, to choose the second best action if all states disagree on the best action but... |