Results 1  10
of
29
Adaptive MultiRobot WideArea Exploration and Mapping
"... The exploration problem is a central issue in mobile robotics. A complete terrain coverage is not practical if the environment is large with only a few small hotspots. This paper presents an adaptive multirobot exploration strategy that is novel in performing both widearea coverage and hotspot sam ..."
Abstract

Cited by 34 (22 self)
 Add to MetaCart
(Show Context)
The exploration problem is a central issue in mobile robotics. A complete terrain coverage is not practical if the environment is large with only a few small hotspots. This paper presents an adaptive multirobot exploration strategy that is novel in performing both widearea coverage and hotspot sampling using nonmyopic path planning. As a result, the environmental phenomena can be accurately mapped. It is based on a dynamic programming formulation, which we call the Multirobot Adaptive Sampling Problem (MASP). A key feature of MASP is in covering the entire adaptivity spectrum, thus allowing strategies of varying adaptivity to be formed and theoretically analyzed in their performance; a more adaptive strategy improves mapping accuracy. We apply MASP to sampling the Gaussian and log
Computational Approaches to Reachability Analysis of Stochastic Hybrid Systems
"... Abstract. This work investigates some of the computational issues involved in the solution of probabilistic reachability problems for discretetime, controlled stochastic hybrid systems. It is first argued that, under rather weak continuity assumptions on the stochastic kernels that characterize the ..."
Abstract

Cited by 21 (8 self)
 Add to MetaCart
(Show Context)
Abstract. This work investigates some of the computational issues involved in the solution of probabilistic reachability problems for discretetime, controlled stochastic hybrid systems. It is first argued that, under rather weak continuity assumptions on the stochastic kernels that characterize the dynamics of the system, the numerical solution of a discretized version of the probabilistic reachability problem is guaranteed to converge to the optimal one, as the discretization level decreases. With reference to a benchmark problem, it is then discussed how some of the structural properties of the hybrid system under study can be exploited to solve the probabilistic reachability problem more efficiently. Possible techniques that can increase the scaleup potential of the proposed numerical approximation scheme are suggested. 1
A Heuristic Search Approach to Planning with Continuous Resources in Stochastic Domains
"... We consider the problem of optimal planning in stochastic domains with resource constraints, where the resources are continuous and the choice of action at each step depends on resource availability. We introduce the HAO * algorithm, a generalization of the AO * algorithm that performs search in a h ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
(Show Context)
We consider the problem of optimal planning in stochastic domains with resource constraints, where the resources are continuous and the choice of action at each step depends on resource availability. We introduce the HAO * algorithm, a generalization of the AO * algorithm that performs search in a hybrid state space that is modeled using both discrete and continuous state variables, where the continuous variables represent monotonic resources. Like other heuristic search algorithms, HAO * leverages knowledge of the start state and an admissible heuristic to focus computational effort on those parts of the state space that could be reached from the start state by following an optimal policy. We show that this approach is especially effective when resource constraints limit how much of the state space is reachable. Experimental results demonstrate its effectiveness in the domain that motivates our research: automated planning for planetary exploration rovers. 1.
Factored value iteration converges
 Acta Cyb
"... Abstract. In this paper we propose a novel algorithm, factored value iteration (FVI), for the approximate solution of factored Markov decision processes (fMDPs). The traditional approximate value iteration algorithm is modified in two ways. For one, the leastsquares projection operator is modified ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
(Show Context)
Abstract. In this paper we propose a novel algorithm, factored value iteration (FVI), for the approximate solution of factored Markov decision processes (fMDPs). The traditional approximate value iteration algorithm is modified in two ways. For one, the leastsquares projection operator is modified so that it does not increase maxnorm, and thus preserves convergence. The other modification is that we uniformly sample polynomially many samples from the (exponentially large) state space. This way, the complexity of our algorithm becomes polynomial in the size of the fMDP description length. We prove that the algorithm is convergent. We also derive an upper bound on the difference between our approximate solution and the optimal one, and also on the error introduced by sampling. We analyze various projection operators with respect to their computation complexity and their convergence when combined with approximate value iteration. factored Markov decision process, value iteration, reinforcement learning 1.
Symbolic Dynamic Programming for Continuous State and Action MDPs
"... Many realworld decisiontheoretic planning problems are naturally modeled using both continuous state and action (CSA) spaces, yet little work has provided exact solutions for the case of continuous actions. In this work, we propose a symbolic dynamic programming (SDP) solution to obtain the optima ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
Many realworld decisiontheoretic planning problems are naturally modeled using both continuous state and action (CSA) spaces, yet little work has provided exact solutions for the case of continuous actions. In this work, we propose a symbolic dynamic programming (SDP) solution to obtain the optimal closedform value function and policy for CSAMDPs with multivariate continuous state and actions, discrete noise, piecewise linear dynamics, and piecewise linear (or restricted piecewise quadratic) reward. Our key contribution over previous SDP work is to show how the continuous action maximization step in the dynamic programming backup can be evaluated optimally and symbolically — a task which amounts to symbolic constrained optimization subject to unknown state parameters; we further integrate this technique to work with an efficient and compact data structure for SDP — the extended algebraic decision diagram (XADD). We demonstrate empirical results on a didactic nonlinear planning example and two domains from operations research to show the first automated exact solution to these problems.
Kernelbased reinforcement learning on representative states
 In Proceedings of the TwentySixth AAAI Conference on Artificial Intelligence, (AAAI
"... Markov decision processes (MDPs) are an established framework for solving sequential decisionmaking problems under uncertainty. In this work, we propose a new method for batchmode reinforcement learning (RL) with continuous state variables. The method is an approximation to kernelbased RL on a set ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
Markov decision processes (MDPs) are an established framework for solving sequential decisionmaking problems under uncertainty. In this work, we propose a new method for batchmode reinforcement learning (RL) with continuous state variables. The method is an approximation to kernelbased RL on a set of k representative states. Similarly to kernelbased RL, our solution is a fixed point of a kernelized Bellman operator and can approximate the optimal solution to an arbitrary level of granularity. Unlike kernelbased RL, our method is fast. In particular, our policies can be computed in O(n) time, where n is the number of training examples. The time complexity of kernelbased RL is Ω(n 2). We introduce our method, analyze its convergence, and compare it to existing work. The method is evaluated on two existing control problems with 2 to 4 continuous variables and a new problem with 64 variables. In all cases, we outperform stateoftheart results and offer simpler solutions.
Symbolic dynamic programming for discrete and continuous state mdps
 In UAI2011
, 2011
"... Many realworld decisiontheoretic planning problems can be naturally modeled with discrete and continuous state Markov decision processes (DCMDPs). While previous work has addressed automated decisiontheoretic planning for DCMDPs, optimal solutions have only been defined so far for limited setti ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
(Show Context)
Many realworld decisiontheoretic planning problems can be naturally modeled with discrete and continuous state Markov decision processes (DCMDPs). While previous work has addressed automated decisiontheoretic planning for DCMDPs, optimal solutions have only been defined so far for limited settings, e.g., DCMDPs having hyperrectangular piecewise linear value functions. In this work, we extend symbolic dynamic programming (SDP) techniques to provide optimal solutions for a vastly expanded class of DCMDPs. To address the inherent combinatorial aspects of SDP, we introduce the XADD — a continuous variable extension of the algebraic decision diagram (ADD) — that maintains compact representations of the exact value function. Empirically, we demonstrate an implementation of SDP with XADDs on various DCMDPs, showing the first optimal automated solutions to DCMDPs with linear and nonlinear piecewise partitioned value functions and showing the advantages of constraintbased pruning for XADDs. 1
Ergodic control and polyhedral approaches to PageRank optimization
 IEEE Transactions on Automatic Control
, 2013
"... We study a general class of PageRank optimization problems which consist in finding an optimal outlink strategy for a web site subject to design constraints. We consider both a continuous problem, in which one can choose the intensity of a link, and a discrete one, in which in each page, there are o ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
We study a general class of PageRank optimization problems which consist in finding an optimal outlink strategy for a web site subject to design constraints. We consider both a continuous problem, in which one can choose the intensity of a link, and a discrete one, in which in each page, there are obligatory links, facultative links and forbidden links. We show that the continuous problem, as well as its discrete variant when there are no constraints coupling different pages, can both be modeled by constrained Markov decision processes with ergodic reward, in which the webmaster determines the transition probabilities of websurfers. Although the number of actions turns out to be exponential, we show that an associated polytope of transition measures has a concise representation, from which we deduce that the continuous problem is solvable in polynomial time, and that the same is true for the discrete problem when there are no coupling constraints. We also provide efficient algorithms, adapted to very large networks. Then, we investigate the qualitative features of optimal outlink strategies, and identify in particular assumptions under which there exists a “master” page to which all controlled pages should point. We report numerical results on fragments of the real web graph.
Probabilistic reachability for stochastic hybrid systems: Theory, computations, and applications
, 2007
"... Copyright c © 2007 by Alessandro Abate Probabilistic Reachability for Stochastic Hybrid Systems: ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Copyright c © 2007 by Alessandro Abate Probabilistic Reachability for Stochastic Hybrid Systems:
Approximate dynamic programming via sum of squares programming
 In Proc. of the European Control Conference
, 2013
"... Abstract — We describe an approximate dynamic programming method for stochastic control problems on infinite state and input spaces. The optimal value function is approximated by a linear combination of basis functions with coefficients as decision variables. By relaxing the Bellman equation to an ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Abstract — We describe an approximate dynamic programming method for stochastic control problems on infinite state and input spaces. The optimal value function is approximated by a linear combination of basis functions with coefficients as decision variables. By relaxing the Bellman equation to an inequality, one obtains a linear program in the basis coefficients with an infinite set of constraints. We show that a recently introduced method, which obtains convex quadratic value function approximations, can be extended to higher order polynomial approximations via sum of squares programming techniques. An approximate value function can then be computed offline by solving a semidefinite program, without having to sample the infinite constraint. The policy is evaluated online by solving a polynomial optimization problem, which also turns out to be convex in some cases. We experimentally validate the method on an autonomous helicopter testbed using a 10dimensional helicopter model. I.