• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

On planning and exploration in non-discrete environments (1991)

by Sebastian B Thrun, Knut Moller
Add To MetaCart

Tools

Sorted by:
Results 1 - 6 of 6

Efficient Exploration In Reinforcement Learning

by Sebastian B. Thrun , 1992
"... Exploration plays a fundamental role in any active learning system. This study evaluates the role of exploration in active learning and describes several local techniques for exploration in finite, discrete domains, embedded in a reinforcement learning framework (delayed reinforcement). This paper d ..."
Abstract - Cited by 115 (4 self) - Add to MetaCart
Exploration plays a fundamental role in any active learning system. This study evaluates the role of exploration in active learning and describes several local techniques for exploration in finite, discrete domains, embedded in a reinforcement learning framework (delayed reinforcement). This paper distinguishes between two families of exploration schemes: undirected and directed exploration. While the former family is closely related to random walk exploration, directed exploration techniques memorize exploration-specific knowledge which is used for guiding the exploration search. In many finite deterministic domains, any learning technique based on undirected exploration is inefficient in terms of learning time, i.e. learning time is expected to scale exponentially with the size of the state space (Whitehead, 1991b) . We prove that for all these domains, reinforcement learning using a directed technique can always be performed in polynomial time, demonstrating the important role of e...

Curious Model-Building Control Systems

by Jürgen Schmidhuber - In Proc. International Joint Conference on Neural Networks, Singapore , 1991
"... A controller is a device which receives inputs from a (dynamic) environment and produces outputs that manipulate the environmental state. A model-building control system is a controller with an additional module (the `world model') which is trained to predict future inputs from previous input/action ..."
Abstract - Cited by 77 (19 self) - Add to MetaCart
A controller is a device which receives inputs from a (dynamic) environment and produces outputs that manipulate the environmental state. A model-building control system is a controller with an additional module (the `world model') which is trained to predict future inputs from previous input/action pairs. The novel curious model-building control system described in this paper is a model-building control system which actively tries to provoke situations for which it learned to expect to learn something about the environment. Such a system has been implemented as a 4-network system based on Watkins' Q-learning algorithm which can be used to maximize the expectation of the temporal derivative of the adaptive assumed reliability of future predictions. An experiment with an artificial non-deterministic environment demonstrates that the system can be superior to previous model-building control systems (the latter do not address the problem of modelling the reliability of the world model's p...

Exploration of Multi-State Environments: Local Measures and Back-Propagation of Uncertainty

by Nicolas Meuleau, Sridhar Mahadevan , 1998
"... . This paper presents an action selection technique for reinforcement learning in stationary Markovian environments. This technique may be used in direct algorithms such as Q-learning, or in indirect algorithms such as adaptive dynamic programming. It is based on two principles. The rst is to dene a ..."
Abstract - Cited by 39 (1 self) - Add to MetaCart
. This paper presents an action selection technique for reinforcement learning in stationary Markovian environments. This technique may be used in direct algorithms such as Q-learning, or in indirect algorithms such as adaptive dynamic programming. It is based on two principles. The rst is to dene a local measure of the uncertainty using the theory of bandit problems. We show that such a measure suers from several drawbacks. In particular, a direct application of it leads to algorithms of low quality that can be easily misled by particular congurations of the environment. The second basic principle was introduced to eliminate this drawback. It consists of assimilating the local measures of uncertainty to rewards, and back-propagating them with the dynamic programming or temporal dierence mechanisms. This allows reproducing global-scale reasoning about the uncertainty, using only local measures of it. Numerical simulations clearly show the eciency of these propositions. Keywords: ...

Adaptive Confidence And Adaptive Curiosity

by Jürgen Schmidhuber - Institut fur Informatik, Technische Universitat Munchen, Arcisstr. 21, 800 Munchen 2 , 1991
"... Much of the recent research on adaptive neuro-control and reinforcement learning focusses on systems with adaptive `world models'. Previous approaches, however, do not address the problem of modelling the reliability of the world model's predictions in uncertain environments. Furthermore, with previ ..."
Abstract - Cited by 14 (0 self) - Add to MetaCart
Much of the recent research on adaptive neuro-control and reinforcement learning focusses on systems with adaptive `world models'. Previous approaches, however, do not address the problem of modelling the reliability of the world model's predictions in uncertain environments. Furthermore, with previous approaches usually some ad-hoc method (like random search) is used to train the world model to predict future environmental inputs from previous inputs and control outputs of the system. This paper introduces ways for modelling the reliability of the outputs of adaptive predictors, and it describes more sophisticated and sometimes more efficient methods for their adaptive construction by on-line state space exploration: For instance, a 4-network reinforcement learning system is described which tries to maximize the expectation of the temporal derivative of the adaptive assumed reliability of future predictions. The system is `curious' in the sense that it actively tries to provoke situat...

Learning By Error-Driven Decomposition

by Dieter Fox, Volker Heinze, Knut Möller, Sebastian Thrun, Gerd Veenker - In Proc. of NcuroNimcs , 1991
"... this paper we describe a new selforganizing decomposition technique for learning high-dimensional mappings. Problem decomposition is performed in an error-driven manner, such that the resulting subtasks (patches) are equally well approximated. Our method combines an unsupervised learning scheme (Fea ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
this paper we describe a new selforganizing decomposition technique for learning high-dimensional mappings. Problem decomposition is performed in an error-driven manner, such that the resulting subtasks (patches) are equally well approximated. Our method combines an unsupervised learning scheme (Feature Maps [Koh84]) with a nonlinear approximator (Backpropagation [RHW86]). The resulting learning system is more stable and effective in changing environments than plain backpropagation and much more powerful than extended feature maps as proposed by [RS88, RMS89]. Extensions of our method give rise to active exploration strategies for autonomous agents facing unknown environments. The appropriateness of our general purpose method will be demonstrated with an example from mathematical function approximation. 1 Introduction

Cooperation through Reinforcement Learning

by Philip Sterne Computer, Philip Sterne
"... Can cooperation be learnt through reinforcement learning? This is the central question we pose in this paper. To answer it first requires an examination of what constitutes reinforcement learning. We also examine some of the issues associated with the design of a reinforcement learning system; these ..."
Abstract - Add to MetaCart
Can cooperation be learnt through reinforcement learning? This is the central question we pose in this paper. To answer it first requires an examination of what constitutes reinforcement learning. We also examine some of the issues associated with the design of a reinforcement learning system; these include: the choice of an update rule, whether or not to implement an eligibility trace. In this paper we set ourselves four tasks that need solving, each task shows us certain aspects of reinforcement learning. Each task is of increasing complexity, the first two allow us to explore reinforcement learning on its own, while the last two allow us to examine reinforcement learning in a multi-agent setting. We begin with a system that learns to play blackjack; it allows us to examine how robust reinforcement learning algorithms are. The second system learns to run through a maze; here we learn how to correctly implement an eligibility trace, and explore different updating rules. The two multi-agent systems involve a traffic simulation, as well as a cellular simulation. The traffic simulation shows the weaknesses in reinforcement learning that show up when applying it to a multi-agent setting. In our cellular simulation, we show that it is possible to implement a reinforcement learning algorithm in continuous statespace.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University