Results 1 - 10
of
28
Planning with Incomplete Information as Heuristic Search in Belief Space
, 2000
"... The formulation of planning as heuristic search with heuristics derived from problem representations has turned out to be a fruitful approach for classical planning. In this paper, we pursue a similar idea in the context planning with incomplete information. Planning with incomplete information ..."
Abstract
-
Cited by 174 (23 self)
- Add to MetaCart
The formulation of planning as heuristic search with heuristics derived from problem representations has turned out to be a fruitful approach for classical planning. In this paper, we pursue a similar idea in the context planning with incomplete information. Planning with incomplete information can be formulated as a problem of search in belief space, where belief states can be either sets of states or more generally probability distribution over states. While the formulation (as the formulation of classical planning as heuristic search) is not particularly novel, the contribution of this paper is to make it explicit, to test it over a number of domains, and to extend it to tasks like planning with sensing where the standard search algorithms do not apply. The resulting planner appears to be competitive with the most recent conformant and contingent planners (e.g., cgp, sgp, and cmbp) while at the same time is more general as it can handle probabilistic actions and se...
A Robust and Fast Action Selection Mechanism for Planning
- In Proceedings of AAAI-97
, 1997
"... The ability to plan and react in dynamic environments is central to intelligent behavior yet few algorithms have managed to combine fast planning with a robust execution. In this paper we develop one such algorithm by looking at planning as real time search. For that we develop a variation of Korf's ..."
Abstract
-
Cited by 127 (17 self)
- Add to MetaCart
The ability to plan and react in dynamic environments is central to intelligent behavior yet few algorithms have managed to combine fast planning with a robust execution. In this paper we develop one such algorithm by looking at planning as real time search. For that we develop a variation of Korf's Learning Real Time A algorithm together with a suitable heuristic function. The resulting algorithm interleaves lookahead with execution and never builds a plan. It is an action selection mechanism that decides at each time point what to do next. Yet it solves hard planning problems faster than any domain independent planning algorithm known to us, including the powerful SAT planner recently introduced by Kautz and Selman. It also works in the presence of perturbations and noise, and can be given a fixed time window to operate. We illustrate each of these features by running the algorithm on a number of benchmark problems. 1 Introduction The ability to plan and react ...
Average Reward Reinforcement Learning: Foundations, Algorithms, and Empirical Results
, 1996
"... This paper presents a detailed study of average reward reinforcement learning, an undiscounted optimality framework that is more appropriate for cyclical tasks than the much better studied discounted framework. A wide spectrum of average reward algorithms are described, ranging from synchronous dyna ..."
Abstract
-
Cited by 80 (12 self)
- Add to MetaCart
This paper presents a detailed study of average reward reinforcement learning, an undiscounted optimality framework that is more appropriate for cyclical tasks than the much better studied discounted framework. A wide spectrum of average reward algorithms are described, ranging from synchronous dynamic programming methods to several (provably convergent) asynchronous algorithms from optimal control and learning automata. A general sensitive discount optimality metric called n-discount-optimality is introduced, and used to compare the various algorithms. The overview identifies a key similarity across several asynchronous algorithms that is crucial to their convergence, namely independent estimation of the average reward and the relative values. The overview also uncovers a surprising limitation shared by the different algorithms: while several algorithms can provably generate gain-optimal policies that maximize average reward, none of them can reliably filter these to produce bias-optimal (or T-optimal) policies that also maximize the finite reward to absorbing goal states. This paper also presents a detailed empirical study of R-learning, an average reward reinforcement learning method, using two empirical testbeds: a stochastic grid world domain and a simulated robot environment. A detailed sensitivity analysis of R-learning is carried out to test its dependence on learning rates and exploration levels. The results suggest that R-learning is quite sensitive to exploration strategies, and can fall into sub-optimal limit cycles. The performance of R-learning is also compared with that of Q-learning, the best studied discounted RL method. Here, the results suggest that R-learning can be fine-tuned to give better performance than Q-learning in both domains.
Auto-exploratory Average Reward Reinforcement Learning
- Artificial Intelligence
, 1996
"... We introduce a model-based average reward Reinforcement Learning method called H-learning and compare it with its discounted counterpart, Adaptive Real-Time Dynamic Programming, in a simulated robot scheduling task. We also introduce an extension to H-learning, which automatically explores the unexp ..."
Abstract
-
Cited by 28 (8 self)
- Add to MetaCart
We introduce a model-based average reward Reinforcement Learning method called H-learning and compare it with its discounted counterpart, Adaptive Real-Time Dynamic Programming, in a simulated robot scheduling task. We also introduce an extension to H-learning, which automatically explores the unexplored parts of the state space, while always choosing greedy actions with respect to the current value function. We show that this "Auto-exploratory H-learning" performs better than the original H-learning under previously studied exploration methods such as random, recency-based, or counter-based exploration. Introduction Reinforcement Learning (RL) is the study of learning agents that improve their performance at some task by receiving rewards and punishments from the environment. Most approaches to reinforcement learning, including Q-learning (Watkins and Dayan 92) and Adaptive Real-Time Dynamic Programming (ARTDP) (Barto, Bradtke, & Singh 95), optimize the total discounted reward the ...
Scaling Up Average Reward Reinforcement Learning by Approximating the Domain Models and the Value Function
- In Saitta
, 1996
"... Almost all the work in Average-reward Reinforcement Learning (ARL) so far has focused on table-based methods which do not scale to domains with large state spaces. In this paper, we propose two extensions to a model-based ARL method called H-learning to address the scale-up problem. We extend H-lear ..."
Abstract
-
Cited by 25 (1 self)
- Add to MetaCart
Almost all the work in Average-reward Reinforcement Learning (ARL) so far has focused on table-based methods which do not scale to domains with large state spaces. In this paper, we propose two extensions to a model-based ARL method called H-learning to address the scale-up problem. We extend H-learning to learn action models and reward functions in the form of Bayesian networks, and approximate its value function using local linear regression. We test our algorithms on several scheduling tasks for a simulated Automatic Guided Vehicle (AGV) and show that they are effective in significantly reducing the space requirement of H-learning and making it converge faster. To the best of our knowledge, our results are the first in applying function approximation to ARL. 1 Introduction Most Reinforcement Learning (RL) methods optimize the discounted total reward received by an agent (Barto, Bradtke, & Singh, 1995; Watkins & Dayan, 1992). However, in many real-world domains, the natural criterio...
Greedy linear value-approximation for factored Markov decision processes
- In Proceedings of the 18th National Conference on Artificial Intelligence
, 2002
"... Significant recent work has focused on using linear representations to approximate value functions for factored Markov decision processes (MDPs). Current research has adopted linear programming as an effective means to calculate approximations for a given set of basis functions, tackling very la ..."
Abstract
-
Cited by 24 (7 self)
- Add to MetaCart
Significant recent work has focused on using linear representations to approximate value functions for factored Markov decision processes (MDPs). Current research has adopted linear programming as an effective means to calculate approximations for a given set of basis functions, tackling very large MDPs as a result. However, a number of issues remain unresolved: How accurate are the approximations produced by linear programs? How hard is it to produce better approximations ? and Where do the basis functions come from? To address these questions, we first investigate the complexity of minimizing the Bellman error of a linear value function approximation---showing that this is an inherently hard problem.
Relational reinforcement learning: An overview
- In Proceedings of the ICML’04 Workshop on Relational Reinforcement Learning
, 2004
"... Relational reinforcement learning (RRL) is both a young and an old eld. In this paper, we trace the history of the eld to related disciplines, outline some current work and promising new directions, and survey the research issues and opportunities that lie ahead. 1. ..."
Abstract
-
Cited by 23 (3 self)
- Add to MetaCart
Relational reinforcement learning (RRL) is both a young and an old eld. In this paper, we trace the history of the eld to related disciplines, outline some current work and promising new directions, and survey the research issues and opportunities that lie ahead. 1.
On-line Decision-Theoretic Golog for Unpredictable Domains
"... DTGolog was proposed by Boutilier et al. as an integration of decision-theoretic (DT) planning and the programming language Golog. Advantages include the ability to handle large state spaces and to limit the search space during planning with explicit programming. Soutchanski developed a version of D ..."
Abstract
-
Cited by 17 (7 self)
- Add to MetaCart
DTGolog was proposed by Boutilier et al. as an integration of decision-theoretic (DT) planning and the programming language Golog. Advantages include the ability to handle large state spaces and to limit the search space during planning with explicit programming. Soutchanski developed a version of DTGolog, where a program is executed on-line and DT planning can be applied to parts of a program only. One of the limitations is that DT planning generally cannot be applied to programs containing sensing actions. In order to deal with robotic scenarios in unpredictable domains, where certain kinds of sensing like measuring one's own position are ubiquitous, we propose a strategy where sensing during deliberation is replaced by suitable models like computed trajectories so that DT planning remains applicable. In the paper we discuss the necessary changes to DTGolog entailed by this strategy and an application of our approach in the ROBOCUP domain.
Machine Learning for Robots: A Comparison of Different Paradigms
- in Workshop on Towards Real Autonomy , IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS-96
, 1996
"... For robots to be truly flexible, they need to be able to learn to adapt to partially known or dynamic environments, to teach themselves new tasks, and to compensate for sensor and effector defects. The problem of robot learning has been an intensively studied research topic over the last decade. In ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
For robots to be truly flexible, they need to be able to learn to adapt to partially known or dynamic environments, to teach themselves new tasks, and to compensate for sensor and effector defects. The problem of robot learning has been an intensively studied research topic over the last decade. In this paper we critically examine four major formulations of the robot learning problem: inductive concept learning, explanation-based learning, reinforcement learning, and evolutionary learning. We describe some well-known examples of systems that fit under each formulation, and discuss their strengths and limitations.
The NSF Workshop on Reinforcement Learning: Summary and Observations
- AI Magazine
, 1996
"... Reinforcement learning (RL) has become one of the most actively studied learning ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
Reinforcement learning (RL) has become one of the most actively studied learning

