Results 1  10
of
74
LQGMP: Optimized Path Planning for Robots with Motion Uncertainty and Imperfect State Information
, 2010
"... This paper presents LQGMP (linearquadratic Gaussian motion planning), a new approach to robot motion planning that takes into account the sensors and the controller that will be used during execution of the robot’s path. LQGMP is based on the linearquadratic controller with Gaussian models of un ..."
Abstract

Cited by 48 (12 self)
 Add to MetaCart
(Show Context)
This paper presents LQGMP (linearquadratic Gaussian motion planning), a new approach to robot motion planning that takes into account the sensors and the controller that will be used during execution of the robot’s path. LQGMP is based on the linearquadratic controller with Gaussian models of uncertainty, and explicitly characterizes in advance (i.e., before execution) the apriori probability distributions of the state of the robot along its path. These distributions canbeusedtoassessthequalityofthepath, forinstancebycomputingtheprobability of avoiding collisions. Many methods can be used to generate the needed ensemble of candidate paths from which the best path is selected; in this paper we report results using RapidlyExploring Random Trees (RRT). We study the performance of LQGMP with simulation experiments in three scenarios: A) a kinodynamic carlike robot, B) multirobot planning with differentialdrive robots, and C) a 6DOF serial manipulator. We also apply Kalman Smoothing to make paths Ckcontinuous while avoiding obstacles and apply LQGMP to precomputed roadmaps using a variant of Dijkstra’s algorithm to efficiently find nearoptimal paths. 1 1
Reinforcement learning with limited reinforcement: Using Bayes risk for active learning in POMDPs
, 2008
"... Partially Observable Markov Decision Processes (POMDPs) have succeeded in planning domains that require balancing actions that increase an agent’s knowledge and actions that increase an agent’s reward. Unfortunately, most POMDPs are defined with a large number of parameters which are difficult to sp ..."
Abstract

Cited by 39 (8 self)
 Add to MetaCart
Partially Observable Markov Decision Processes (POMDPs) have succeeded in planning domains that require balancing actions that increase an agent’s knowledge and actions that increase an agent’s reward. Unfortunately, most POMDPs are defined with a large number of parameters which are difficult to specify only from domain knowledge. In this paper, we present an approximation approach that allows us to treat the POMDP model parameters as additional hidden state in a “modeluncertainty ” POMDP. Coupled with modeldirected queries, our planner actively learns good policies. We demonstrate our approach on several POMDP problems.
Modelbased Bayesian reinforcement learning in partially observable domains. ISAIM
, 2008
"... Bayesian reinforcement learning in partially observable domains is notoriously difficult, in part due to the unknown form of the beliefs and the optimal value function. We show that beliefs represented by mixtures of products of Dirichlet distributions are closed under belief updates for factored do ..."
Abstract

Cited by 34 (0 self)
 Add to MetaCart
(Show Context)
Bayesian reinforcement learning in partially observable domains is notoriously difficult, in part due to the unknown form of the beliefs and the optimal value function. We show that beliefs represented by mixtures of products of Dirichlet distributions are closed under belief updates for factored domains. Belief monitoring algorithms that use this mixture representation are proposed. We also show that the optimal value function is a linear combination of products of Dirichlets for factored domains. Finally, we extend BEETLE, which is a pointbased value iteration algorithm for Bayesian RL in fully observable domains, to partially observable domains. 1
A survey of pointbased POMDP solvers
 AUTON AGENT MULTIAGENT SYST
, 2012
"... The past decade has seen a significant breakthrough in research on solving partially observable Markov decision processes (POMDPs). Where past solvers could not scale beyond perhaps a dozen states, modern solvers can handle complex domains with many thousands of states. This breakthrough was mainly ..."
Abstract

Cited by 33 (5 self)
 Add to MetaCart
The past decade has seen a significant breakthrough in research on solving partially observable Markov decision processes (POMDPs). Where past solvers could not scale beyond perhaps a dozen states, modern solvers can handle complex domains with many thousands of states. This breakthrough was mainly due to the idea of restricting value function computations to a finite subset of the belief space, permitting only local value updates for this subset. This approach, known as pointbased value iteration, avoids the exponential growth of the value function, and is thus applicable for domains with longer horizons, even with relatively large state spaces. Many extensions were suggested to this basic idea, focusing on various aspects of the algorithm—mainly the selection of the belief space subset, and the order of value function updates. In this survey, we walk the reader through the fundamentals of pointbased value iteration, explaining the main concepts and ideas. Then, we survey the major extensions to the basic algorithm, discussing their merits. Finally, we include an extensive empirical analysis using well known benchmarks, in order to shed light on the strengths and limitations of the various approaches.
A Scalable Method for Solving HighDimensional Continuous POMDPs Using Local Approximation
 Conf. on Uncertainty in Artificial Intelligence
, 2010
"... PartiallyObservable Markov Decision Processes (POMDPs) are typically solved by finding an approximate global solution to a corresponding beliefMDP. In this paper, we offer a new planning algorithm for POMDPs with continuous state, action and observation spaces. Since such domains have an inherent ..."
Abstract

Cited by 30 (1 self)
 Add to MetaCart
(Show Context)
PartiallyObservable Markov Decision Processes (POMDPs) are typically solved by finding an approximate global solution to a corresponding beliefMDP. In this paper, we offer a new planning algorithm for POMDPs with continuous state, action and observation spaces. Since such domains have an inherent notion of locality, we can find an approximate solution using local optimization methods. We parameterize the belief distribution as a Gaussian mixture, and use the Extended Kalman Filter (EKF) to approximate the belief update. Since the EKF is a firstorder filter, we can marginalize over the observations analytically. By using feedback control and state estimation during policy execution, we recover a behavior that is effectively conditioned on incoming observations despite the unconditioned planning. Local optimization provides no guarantees of global optimality, but it allows us to tackle domains that are at least an order of magnitude larger than the current stateoftheart. We demonstrate the scalability of our algorithm by considering a simulated handeye coordination domain with 16 continuous state dimensions and 6 continuous action dimensions.
A decisiontheoretic model of assistance
 In IJCAI
, 2007
"... There is a growing interest in intelligent assistants for a variety of applications from organizing tasks for knowledge workers to helping people with dementia. In this paper, we present and evaluate a decisiontheoretic framework that captures the general notion of assistance. The objective is to o ..."
Abstract

Cited by 24 (3 self)
 Add to MetaCart
(Show Context)
There is a growing interest in intelligent assistants for a variety of applications from organizing tasks for knowledge workers to helping people with dementia. In this paper, we present and evaluate a decisiontheoretic framework that captures the general notion of assistance. The objective is to observe a goaldirected agent and to select assistive actions in order to minimize the overall cost. We model the problem as an assistant POMDP where the hidden state corresponds to the agent’s unobserved goals. This formulation allows us to exploit domain models for both estimating the agent’s goals and selecting assistive action. In addition, the formulation naturally handles uncertainty, varying action costs, and customization to specific agents via learning. We argue that in many domains myopic heuristics will be adequate for selecting actions in the assistant POMDP and present two such heuristics. We evaluate our approach in two domains where human subjects perform tasks in gamelike computer environments. The results show that the assistant substantially reduces user effort with only a modest computational effort. 1
Bayesian Reinforcement Learning in Continuous POMDPs with Application to Robot Navigation
"... We consider the problem of optimal control in continuous and partially observable environments when the parameters of the model are not known exactly. Partially Observable Markov Decision Processes (POMDPs) provide a rich mathematical model to handle such environments but require a known model to be ..."
Abstract

Cited by 23 (3 self)
 Add to MetaCart
We consider the problem of optimal control in continuous and partially observable environments when the parameters of the model are not known exactly. Partially Observable Markov Decision Processes (POMDPs) provide a rich mathematical model to handle such environments but require a known model to be solved by most approaches. This is a limitation in practice as the exact model parameters are often difficult to specify exactly. We adopt a Bayesian approach where a posterior distribution over the model parameters is maintained and updated through experience with the environment. We propose a particle filter algorithm to maintain the posterior distribution and an online planning algorithm, based on trajectory sampling, to plan the best action to perform under the current posterior. The resulting approach selects control actions which optimally tradeoff between 1) exploring the environment to learn the model, 2) identifying the system’s state, and 3) exploiting its knowledge in order to maximize longterm rewards. Our preliminary results on a simulated robot navigation problem show that our approach is able to learn good models of the sensors and actuators, and performs as well as if it had the true model.
Monte Carlo Value Iteration for ContinuousState POMDPs
 WORKSHOP ON THE ALGORITHMIC FOUNDATIONS OF ROBOTICS
, 2010
"... Partially observable Markov decision processes (POMDPs) have been successfully applied to various robot motion planning tasks under uncertainty. However, most existing POMDP algorithms assume a discrete state space, while the natural state space of a robot is often continuous. This paper presents Mo ..."
Abstract

Cited by 22 (4 self)
 Add to MetaCart
Partially observable Markov decision processes (POMDPs) have been successfully applied to various robot motion planning tasks under uncertainty. However, most existing POMDP algorithms assume a discrete state space, while the natural state space of a robot is often continuous. This paper presents Monte Carlo Value Iteration (MCVI) for continuousstate POMDPs. MCVI samples both a robot’s state space and the corresponding belief space, and avoids inefficient a priori discretization of the state space as a grid. Both theoretical results and preliminary experimental results indicate that MCVI is a promising new approach for robot motion planning under uncertainty.
Motion Planning under Uncertainty using Iterative Local Optimization in Belief Space
, 2012
"... We present a new approach to motion planning under sensing and motion uncertainty by computing a locally optimal solution to a continuous partially observable Markov decision process (POMDP). Our approach represent beliefs (the distributions of the robot’s state estimate) by Gaussian distributions a ..."
Abstract

Cited by 21 (7 self)
 Add to MetaCart
We present a new approach to motion planning under sensing and motion uncertainty by computing a locally optimal solution to a continuous partially observable Markov decision process (POMDP). Our approach represent beliefs (the distributions of the robot’s state estimate) by Gaussian distributions and is applicable to robot systems with nonlinear dynamics and observation models. The method follows the general POMDP solution framework in which we approximate the belief dynamics using an extended Kalman filter and represent the value function by a quadratic function that is valid in the vicinity of a nominal trajectory through belief space. Using a belief space variant of iterative LQG (iLQG), our approach iterates with secondorder convergence towards a linear control policy over the belief space that is locally optimal with respect to a userdefined cost function. Unlike previous work, our approach does not assume maximumlikelihood observations, does not assume fixed estimator or control gains, takes into account obstacles in the environment, and does not require discretization of the state and action spaces. The running time of the algorithm is polynomial (O[n 6]) in the dimension n of the state space. We demonstrate the potential of our approach in simulation for holonomic and nonholonomic robots maneuvering through environments with obstacles with noisy and partial sensing and with nonlinear dynamics and observation models.
Optimal design of sequential realtime communication systems
 IEEE Trans. Inf. Theory
, 2009
"... Abstract—Optimal design of sequential realtime communication of a Markov source over a noisy channel is investigated. In such a system, the delay between the source output and its reconstruction at the receiver should equal a fixed prespecified amount. An optimal communication strategy must minimiz ..."
Abstract

Cited by 21 (5 self)
 Add to MetaCart
Abstract—Optimal design of sequential realtime communication of a Markov source over a noisy channel is investigated. In such a system, the delay between the source output and its reconstruction at the receiver should equal a fixed prespecified amount. An optimal communication strategy must minimize the total expected symbolbysymbol distortion between the source output and its reconstruction. Design techniques or performance bounds for such realtime communication systems are unknown. In this paper a systematic methodology, based on the concepts of information structures and information states, to search for an optimal realtime communication strategy is presented. This methodology trades off complexity in communication length (linear in contrast to doubly exponential) with complexity in alphabet sizes (doubly exponential in contrast to exponential). As the communication length is usually order of magnitudes bigger