Results 1 - 10
of
74
LQG-MP: Optimized Path Planning for Robots with Motion Uncertainty and Imperfect State Information
, 2010
"... This paper presents LQG-MP (linear-quadratic Gaussian motion planning), a new approach to robot motion planning that takes into account the sensors and the controller that will be used during execution of the robot’s path. LQG-MP is based on the linear-quadratic controller with Gaussian models of un ..."
Abstract
-
Cited by 48 (12 self)
- Add to MetaCart
(Show Context)
This paper presents LQG-MP (linear-quadratic Gaussian motion planning), a new approach to robot motion planning that takes into account the sensors and the controller that will be used during execution of the robot’s path. LQG-MP is based on the linear-quadratic controller with Gaussian models of uncertainty, and explicitly characterizes in advance (i.e., before execution) the a-priori probability distributions of the state of the robot along its path. These distributions canbeusedtoassessthequalityofthepath, forinstancebycomputingtheprobability of avoiding collisions. Many methods can be used to generate the needed ensemble of candidate paths from which the best path is selected; in this paper we report results using Rapidly-Exploring Random Trees (RRT). We study the performance of LQG-MP with simulation experiments in three scenarios: A) a kinodynamic car-like robot, B) multi-robot planning with differential-drive robots, and C) a 6-DOF serial manipulator. We also apply Kalman Smoothing to make paths Ck-continuous while avoiding obstacles and apply LQG-MP to precomputed roadmaps using a variant of Dijkstra’s algorithm to efficiently find near-optimal paths. 1 1
Reinforcement learning with limited reinforcement: Using Bayes risk for active learning in POMDPs
, 2008
"... Partially Observable Markov Decision Processes (POMDPs) have succeeded in planning domains that require balancing actions that increase an agent’s knowledge and actions that increase an agent’s reward. Unfortunately, most POMDPs are defined with a large number of parameters which are difficult to sp ..."
Abstract
-
Cited by 39 (8 self)
- Add to MetaCart
Partially Observable Markov Decision Processes (POMDPs) have succeeded in planning domains that require balancing actions that increase an agent’s knowledge and actions that increase an agent’s reward. Unfortunately, most POMDPs are defined with a large number of parameters which are difficult to specify only from domain knowledge. In this paper, we present an approximation approach that allows us to treat the POMDP model parameters as additional hidden state in a “model-uncertainty ” POMDP. Coupled with model-directed queries, our planner actively learns good policies. We demonstrate our approach on several POMDP problems.
Model-based Bayesian reinforcement learning in partially observable domains. ISAIM
, 2008
"... Bayesian reinforcement learning in partially observable domains is notoriously difficult, in part due to the unknown form of the beliefs and the optimal value function. We show that beliefs represented by mixtures of products of Dirichlet distributions are closed under belief updates for factored do ..."
Abstract
-
Cited by 34 (0 self)
- Add to MetaCart
(Show Context)
Bayesian reinforcement learning in partially observable domains is notoriously difficult, in part due to the unknown form of the beliefs and the optimal value function. We show that beliefs represented by mixtures of products of Dirichlet distributions are closed under belief updates for factored domains. Belief monitoring algorithms that use this mixture representation are proposed. We also show that the optimal value function is a linear combination of products of Dirichlets for factored domains. Finally, we extend BEETLE, which is a point-based value iteration algorithm for Bayesian RL in fully observable domains, to partially observable domains. 1
A survey of point-based POMDP solvers
- AUTON AGENT MULTI-AGENT SYST
, 2012
"... The past decade has seen a significant breakthrough in research on solving partially observable Markov decision processes (POMDPs). Where past solvers could not scale beyond perhaps a dozen states, modern solvers can handle complex domains with many thousands of states. This breakthrough was mainly ..."
Abstract
-
Cited by 33 (5 self)
- Add to MetaCart
The past decade has seen a significant breakthrough in research on solving partially observable Markov decision processes (POMDPs). Where past solvers could not scale beyond perhaps a dozen states, modern solvers can handle complex domains with many thousands of states. This breakthrough was mainly due to the idea of restricting value function computations to a finite subset of the belief space, permitting only local value updates for this subset. This approach, known as point-based value iteration, avoids the exponential growth of the value function, and is thus applicable for domains with longer horizons, even with relatively large state spaces. Many extensions were suggested to this basic idea, focusing on various aspects of the algorithm—mainly the selection of the belief space subset, and the order of value function updates. In this survey, we walk the reader through the fundamentals of point-based value iteration, explaining the main concepts and ideas. Then, we survey the major extensions to the basic algorithm, discussing their merits. Finally, we include an extensive empirical analysis using well known benchmarks, in order to shed light on the strengths and limitations of the various approaches.
A Scalable Method for Solving High-Dimensional Continuous POMDPs Using Local Approximation
- Conf. on Uncertainty in Artificial Intelligence
, 2010
"... Partially-Observable Markov Decision Processes (POMDPs) are typically solved by finding an approximate global solution to a corresponding belief-MDP. In this paper, we offer a new planning algorithm for POMDPs with continuous state, action and observation spaces. Since such domains have an inherent ..."
Abstract
-
Cited by 30 (1 self)
- Add to MetaCart
(Show Context)
Partially-Observable Markov Decision Processes (POMDPs) are typically solved by finding an approximate global solution to a corresponding belief-MDP. In this paper, we offer a new planning algorithm for POMDPs with continuous state, action and observation spaces. Since such domains have an inherent notion of locality, we can find an approximate solution using local optimization methods. We parameterize the belief distribution as a Gaussian mixture, and use the Extended Kalman Filter (EKF) to approximate the belief update. Since the EKF is a first-order filter, we can marginalize over the observations analytically. By using feedback control and state estimation during policy execution, we recover a behavior that is effectively conditioned on incoming observations despite the unconditioned planning. Local optimization provides no guarantees of global optimality, but it allows us to tackle domains that are at least an order of magnitude larger than the current state-of-the-art. We demonstrate the scalability of our algorithm by considering a simulated hand-eye coordination domain with 16 continuous state dimensions and 6 continuous action dimensions.
A decision-theoretic model of assistance
- In IJCAI
, 2007
"... There is a growing interest in intelligent assistants for a variety of applications from organizing tasks for knowledge workers to helping people with dementia. In this paper, we present and evaluate a decision-theoretic framework that captures the general notion of assistance. The objective is to o ..."
Abstract
-
Cited by 24 (3 self)
- Add to MetaCart
(Show Context)
There is a growing interest in intelligent assistants for a variety of applications from organizing tasks for knowledge workers to helping people with dementia. In this paper, we present and evaluate a decision-theoretic framework that captures the general notion of assistance. The objective is to observe a goal-directed agent and to select assistive actions in order to minimize the overall cost. We model the problem as an assistant POMDP where the hidden state corresponds to the agent’s unobserved goals. This formulation allows us to exploit domain models for both estimating the agent’s goals and selecting assistive action. In addition, the formulation naturally handles uncertainty, varying action costs, and customization to specific agents via learning. We argue that in many domains myopic heuristics will be adequate for selecting actions in the assistant POMDP and present two such heuristics. We evaluate our approach in two domains where human subjects perform tasks in game-like computer environments. The results show that the assistant substantially reduces user effort with only a modest computational effort. 1
Bayesian Reinforcement Learning in Continuous POMDPs with Application to Robot Navigation
"... We consider the problem of optimal control in continuous and partially observable environments when the parameters of the model are not known exactly. Partially Observable Markov Decision Processes (POMDPs) provide a rich mathematical model to handle such environments but require a known model to be ..."
Abstract
-
Cited by 23 (3 self)
- Add to MetaCart
We consider the problem of optimal control in continuous and partially observable environments when the parameters of the model are not known exactly. Partially Observable Markov Decision Processes (POMDPs) provide a rich mathematical model to handle such environments but require a known model to be solved by most approaches. This is a limitation in practice as the exact model parameters are often difficult to specify exactly. We adopt a Bayesian approach where a posterior distribution over the model parameters is maintained and updated through experience with the environment. We propose a particle filter algorithm to maintain the posterior distribution and an online planning algorithm, based on trajectory sampling, to plan the best action to perform under the current posterior. The resulting approach selects control actions which optimally trade-off between 1) exploring the environment to learn the model, 2) identifying the system’s state, and 3) exploiting its knowledge in order to maximize long-term rewards. Our preliminary results on a simulated robot navigation problem show that our approach is able to learn good models of the sensors and actuators, and performs as well as if it had the true model.
Monte Carlo Value Iteration for Continuous-State POMDPs
- WORKSHOP ON THE ALGORITHMIC FOUNDATIONS OF ROBOTICS
, 2010
"... Partially observable Markov decision processes (POMDPs) have been successfully applied to various robot motion planning tasks under uncertainty. However, most existing POMDP algorithms assume a discrete state space, while the natural state space of a robot is often continuous. This paper presents Mo ..."
Abstract
-
Cited by 22 (4 self)
- Add to MetaCart
Partially observable Markov decision processes (POMDPs) have been successfully applied to various robot motion planning tasks under uncertainty. However, most existing POMDP algorithms assume a discrete state space, while the natural state space of a robot is often continuous. This paper presents Monte Carlo Value Iteration (MCVI) for continuous-state POMDPs. MCVI samples both a robot’s state space and the corresponding belief space, and avoids inefficient a priori discretization of the state space as a grid. Both theoretical results and preliminary experimental results indicate that MCVI is a promising new approach for robot motion planning under uncertainty.
Motion Planning under Uncertainty using Iterative Local Optimization in Belief Space
, 2012
"... We present a new approach to motion planning under sensing and motion uncertainty by computing a locally optimal solution to a continuous partially observable Markov decision process (POMDP). Our approach represent beliefs (the distributions of the robot’s state estimate) by Gaussian distributions a ..."
Abstract
-
Cited by 21 (7 self)
- Add to MetaCart
We present a new approach to motion planning under sensing and motion uncertainty by computing a locally optimal solution to a continuous partially observable Markov decision process (POMDP). Our approach represent beliefs (the distributions of the robot’s state estimate) by Gaussian distributions and is applicable to robot systems with non-linear dynamics and observation models. The method follows the general POMDP solution framework in which we approximate the belief dynamics using an extended Kalman filter and represent the value function by a quadratic function that is valid in the vicinity of a nominal trajectory through belief space. Using a belief space variant of iterative LQG (iLQG), our approach iterates with secondorder convergence towards a linear control policy over the belief space that is locally optimal with respect to a user-defined cost function. Unlike previous work, our approach does not assume maximum-likelihood observations, does not assume fixed estimator or control gains, takes into account obstacles in the environment, and does not require discretization of the state and action spaces. The running time of the algorithm is polynomial (O[n 6]) in the dimension n of the state space. We demonstrate the potential of our approach in simulation for holonomic and nonholonomic robots maneuvering through environments with obstacles with noisy and partial sensing and with non-linear dynamics and observation models.
Optimal design of sequential real-time communication systems
- IEEE Trans. Inf. Theory
, 2009
"... Abstract—Optimal design of sequential real-time communication of a Markov source over a noisy channel is investigated. In such a system, the delay between the source output and its reconstruction at the receiver should equal a fixed prespecified amount. An optimal communication strategy must minimiz ..."
Abstract
-
Cited by 21 (5 self)
- Add to MetaCart
Abstract—Optimal design of sequential real-time communication of a Markov source over a noisy channel is investigated. In such a system, the delay between the source output and its reconstruction at the receiver should equal a fixed prespecified amount. An optimal communication strategy must minimize the total expected symbol-by-symbol distortion between the source output and its reconstruction. Design techniques or performance bounds for such real-time communication systems are unknown. In this paper a systematic methodology, based on the concepts of information structures and information states, to search for an optimal real-time communication strategy is presented. This methodology trades off complexity in communication length (linear in contrast to doubly exponential) with complexity in alphabet sizes (doubly exponential in contrast to exponential). As the communication length is usually order of magnitudes bigger