Results 1  10
of
17
Adaptive MultiRobot WideArea Exploration and Mapping
"... The exploration problem is a central issue in mobile robotics. A complete terrain coverage is not practical if the environment is large with only a few small hotspots. This paper presents an adaptive multirobot exploration strategy that is novel in performing both widearea coverage and hotspot sam ..."
Abstract

Cited by 14 (11 self)
 Add to MetaCart
The exploration problem is a central issue in mobile robotics. A complete terrain coverage is not practical if the environment is large with only a few small hotspots. This paper presents an adaptive multirobot exploration strategy that is novel in performing both widearea coverage and hotspot sampling using nonmyopic path planning. As a result, the environmental phenomena can be accurately mapped. It is based on a dynamic programming formulation, which we call the Multirobot Adaptive Sampling Problem (MASP). A key feature of MASP is in covering the entire adaptivity spectrum, thus allowing strategies of varying adaptivity to be formed and theoretically analyzed in their performance; a more adaptive strategy improves mapping accuracy. We apply MASP to sampling the Gaussian and log
Computational Approaches to Reachability Analysis of Stochastic Hybrid Systems
"... Abstract. This work investigates some of the computational issues involved in the solution of probabilistic reachability problems for discretetime, controlled stochastic hybrid systems. It is first argued that, under rather weak continuity assumptions on the stochastic kernels that characterize the ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
Abstract. This work investigates some of the computational issues involved in the solution of probabilistic reachability problems for discretetime, controlled stochastic hybrid systems. It is first argued that, under rather weak continuity assumptions on the stochastic kernels that characterize the dynamics of the system, the numerical solution of a discretized version of the probabilistic reachability problem is guaranteed to converge to the optimal one, as the discretization level decreases. With reference to a benchmark problem, it is then discussed how some of the structural properties of the hybrid system under study can be exploited to solve the probabilistic reachability problem more efficiently. Possible techniques that can increase the scaleup potential of the proposed numerical approximation scheme are suggested. 1
A Heuristic Search Approach to Planning with Continuous Resources in Stochastic Domains
"... We consider the problem of optimal planning in stochastic domains with resource constraints, where the resources are continuous and the choice of action at each step depends on resource availability. We introduce the HAO * algorithm, a generalization of the AO * algorithm that performs search in a h ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
We consider the problem of optimal planning in stochastic domains with resource constraints, where the resources are continuous and the choice of action at each step depends on resource availability. We introduce the HAO * algorithm, a generalization of the AO * algorithm that performs search in a hybrid state space that is modeled using both discrete and continuous state variables, where the continuous variables represent monotonic resources. Like other heuristic search algorithms, HAO * leverages knowledge of the start state and an admissible heuristic to focus computational effort on those parts of the state space that could be reached from the start state by following an optimal policy. We show that this approach is especially effective when resource constraints limit how much of the state space is reachable. Experimental results demonstrate its effectiveness in the domain that motivates our research: automated planning for planetary exploration rovers. 1.
FACTORED VALUE ITERATION CONVERGES
, 801
"... Abstract. In this paper we propose a novel algorithm, factored value iteration (FVI), for the approximate solution of factored Markov decision processes (fMDPs). The traditional approximate value iteration algorithm is modified in two ways. For one, the leastsquares projection operator is modified ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
Abstract. In this paper we propose a novel algorithm, factored value iteration (FVI), for the approximate solution of factored Markov decision processes (fMDPs). The traditional approximate value iteration algorithm is modified in two ways. For one, the leastsquares projection operator is modified so that it does not increase maxnorm, and thus preserves convergence. The other modification is that we uniformly sample polynomially many samples from the (exponentially large) state space. This way, the complexity of our algorithm becomes polynomial in the size of the fMDP description length. We prove that the algorithm is convergent. We also derive an upper bound on the difference between our approximate solution and the optimal one, and also on the error introduced by sampling. We analyze various projection operators with respect to their computation complexity and their convergence when combined with approximate value iteration. factored Markov decision process, value iteration, reinforcement learning 1.
Symbolic dynamic programming for discrete and continuous state mdps
 In UAI2011
, 2011
"... Many realworld decisiontheoretic planning problems can be naturally modeled with discrete and continuous state Markov decision processes (DCMDPs). While previous work has addressed automated decisiontheoretic planning for DCMDPs, optimal solutions have only been defined so far for limited setti ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
Many realworld decisiontheoretic planning problems can be naturally modeled with discrete and continuous state Markov decision processes (DCMDPs). While previous work has addressed automated decisiontheoretic planning for DCMDPs, optimal solutions have only been defined so far for limited settings, e.g., DCMDPs having hyperrectangular piecewise linear value functions. In this work, we extend symbolic dynamic programming (SDP) techniques to provide optimal solutions for a vastly expanded class of DCMDPs. To address the inherent combinatorial aspects of SDP, we introduce the XADD — a continuous variable extension of the algebraic decision diagram (ADD) — that maintains compact representations of the exact value function. Empirically, we demonstrate an implementation of SDP with XADDs on various DCMDPs, showing the first optimal automated solutions to DCMDPs with linear and nonlinear piecewise partitioned value functions and showing the advantages of constraintbased pruning for XADDs. 1
Kernelbased reinforcement learning on representative states
 In Proceedings of the TwentySixth AAAI Conference on Artificial Intelligence, (AAAI
"... Markov decision processes (MDPs) are an established framework for solving sequential decisionmaking problems under uncertainty. In this work, we propose a new method for batchmode reinforcement learning (RL) with continuous state variables. The method is an approximation to kernelbased RL on a set ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Markov decision processes (MDPs) are an established framework for solving sequential decisionmaking problems under uncertainty. In this work, we propose a new method for batchmode reinforcement learning (RL) with continuous state variables. The method is an approximation to kernelbased RL on a set of k representative states. Similarly to kernelbased RL, our solution is a fixed point of a kernelized Bellman operator and can approximate the optimal solution to an arbitrary level of granularity. Unlike kernelbased RL, our method is fast. In particular, our policies can be computed in O(n) time, where n is the number of training examples. The time complexity of kernelbased RL is Ω(n 2). We introduce our method, analyze its convergence, and compare it to existing work. The method is evaluated on two existing control problems with 2 to 4 continuous variables and a new problem with 64 variables. In all cases, we outperform stateoftheart results and offer simpler solutions.
Intelligent Planning for Autonomous Underwater Vehicles
, 2007
"... The aim of my PhD is to develop novel algorithms to allow an Autonomous Underwater Vehicle (AUV) to locate hydrothermal vents on the ocean floor. Hydrothermal vents are tectonicallydriven outgassings of mineralrich superheated water, and they produce a chemicaladvecting plume that can be detected ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
The aim of my PhD is to develop novel algorithms to allow an Autonomous Underwater Vehicle (AUV) to locate hydrothermal vents on the ocean floor. Hydrothermal vents are tectonicallydriven outgassings of mineralrich superheated water, and they produce a chemicaladvecting plume that can be detected from kilometres away. Finding vents is challenging firstly because detecting a chemical tracer from a plume gives very little information on the bearing or range to the source, and secondly because tracers from different vents combine in an additive way, and there is no a priori way of telling how many vents have contributed to a measured signal. I have decomposed the task of finding vents into a mapping problem, where a probabilistic map of nearby vents is constructed, and a planning problem, which uses the uncertain map to determine actions the AUV should take to allow it to find as many vents as possible on a mission, subject to the limited power resources it has. Both problems will require the development of new methods to solve them. The mapping problem is novel because sensors do not provide even an approximate range to their target, there are potentially multiple targets, and
Incremental Structure Learning in Factored MDPs with Continuous States and Actions
"... Learning factored transition models of structured environments has been shown to provide significant leverage when computing optimal policies for tasks within those environments. Previous work has focused on learning the structure of factored Markov Decision Processes (MDPs) with finite sets of stat ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Learning factored transition models of structured environments has been shown to provide significant leverage when computing optimal policies for tasks within those environments. Previous work has focused on learning the structure of factored Markov Decision Processes (MDPs) with finite sets of states and actions. In this work we present an algorithm for online incremental learning of transition models of factored MDPs that have continuous, multidimensional state and action spaces. We use incremental density estimation techniques and informationtheoretic principles to learn a factored model of the transition dynamics of an FMDP online from a single, continuing trajectory of experience. 1
Hierarchical Strategy Learning with Hybrid Representations
"... Good problem solving knowledge for real life domains is hard to define in a single representation. In some situations, a direct policy is a better choice while in others, value function is better. Typically, direct policy representation is better suited to strategic level plans, while value function ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Good problem solving knowledge for real life domains is hard to define in a single representation. In some situations, a direct policy is a better choice while in others, value function is better. Typically, direct policy representation is better suited to strategic level plans, while value function representation is better suited to tactical level plans. We propose a hybrid hierarchical representation machine (HHRM) where direct policy representation and value function based representation can coexist in a levelwise fashion. We provide simple learning and planning algorithms with our new representation and discuss their application to Airspace Deconfliction domain. In our experiments, we provided our system LSP with two level HHRM for the domain. LSP could successfully learn from limited number of experts ’ solution traces and show superior performance compared to average of human novice learners.
Symbolic Dynamic Programming for Continuous State and Action MDPs
"... Many realworld decisiontheoretic planning problems are naturally modeled using both continuous state and action (CSA) spaces, yet little work has provided exact solutions for the case of continuous actions. In this work, we propose a symbolic dynamic programming (SDP) solution to obtain the optima ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Many realworld decisiontheoretic planning problems are naturally modeled using both continuous state and action (CSA) spaces, yet little work has provided exact solutions for the case of continuous actions. In this work, we propose a symbolic dynamic programming (SDP) solution to obtain the optimal closedform value function and policy for CSAMDPs with multivariate continuous state and actions, discrete noise, piecewise linear dynamics, and piecewise linear (or restricted piecewise quadratic) reward. Our key contribution over previous SDP work is to show how the continuous action maximization step in the dynamic programming backup can be evaluated optimally and symbolically — a task which amounts to symbolic constrained optimization subject to unknown state parameters; we further integrate this technique to work with an efficient and compact data structure for SDP — the extended algebraic decision diagram (XADD). We demonstrate empirical results on a didactic nonlinear planning example and two domains from operations research to show the first automated exact solution to these problems.