• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Continual Learning in Reinforcement Environments (1994)

by M Ring
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 53
Next 10 →

Reinforcement learning: a survey

by Leslie Pack Kaelbling, Michael L. Littman, Andrew W. Moore - Journal of Artificial Intelligence Research , 1996
"... This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem ..."
Abstract - Cited by 1134 (21 self) - Add to MetaCart
This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word "reinforcement." The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.

Algorithms for Sequential Decision Making

by Michael Lederman Littman , 1996
"... Sequential decision making is a fundamental task faced by any intelligent agent in an extended interaction with its environment; it is the act of answering the question "What should I do now?" In this thesis, I show how to answer this question when "now" is one of a finite set of states, "do" is one ..."
Abstract - Cited by 158 (7 self) - Add to MetaCart
Sequential decision making is a fundamental task faced by any intelligent agent in an extended interaction with its environment; it is the act of answering the question "What should I do now?" In this thesis, I show how to answer this question when "now" is one of a finite set of states, "do" is one of a finite set of actions, "should" is maximize a long-run measure of reward, and "I" is an automated planning or learning system (agent). In particular,

Incremental Evolution of Complex General Behavior

by Faustino Gomez, Risto Miikkulainen - Adaptive Behavior , 1997
"... Several researchers have demonstrated how complex action sequences can be learned through neuro-evolution (i.e. evolving neural networks with genetic algorithms). However, complex general behavior such as evading predators or avoiding obstacles, which is not tied to specific environments, turns out ..."
Abstract - Cited by 121 (25 self) - Add to MetaCart
Several researchers have demonstrated how complex action sequences can be learned through neuro-evolution (i.e. evolving neural networks with genetic algorithms). However, complex general behavior such as evading predators or avoiding obstacles, which is not tied to specific environments, turns out to be very difficult to evolve. Often the system discovers mechanical strategies (such as moving back and forth) that help the agent cope, but are not very effective, do not appear believable and would not generalize to new environments. The problem is that a general strategy is too difficult for the evolution system to discover directly. This paper proposes an approach where such complex general behavior is learned incrementally, by starting with simpler behavior and gradually making the task more challenging and general. The task transitions are implemented through successive stages of delta-coding (i.e. evolving modifications), which allows even converged populations to adapt to the new t...

Map Learning with Uninterpreted Sensors and Effectors

by David Pierce, Benjamin Kuipers - Artificial Intelligence , 1997
"... This paper presents a set of methods by which a learning agent can learn a sequence of increasingly abstract and powerful interfaces to control a robot whose sensorimotor apparatus and environment are initially unknown. The result of the learning is a rich hierarchical model of the robot's world (it ..."
Abstract - Cited by 103 (16 self) - Add to MetaCart
This paper presents a set of methods by which a learning agent can learn a sequence of increasingly abstract and powerful interfaces to control a robot whose sensorimotor apparatus and environment are initially unknown. The result of the learning is a rich hierarchical model of the robot's world (its sensorimotor apparatus and environment). The learning methods rely on generic properties of the robot's world such as almost-everywhere smooth e ects of motor control signals on sensory features. At thelowest level of the hierarchy, the learning agent analyzes the e ects of its motor control signals in order to de ne a new set of control signals, one for each of the robot's degrees of freedom. It uses a generate-and-test approach to de ne sensory features that capture important aspects of the environment. It uses linear regression to learn models that characterize context-dependent e ects of the control signals on the learned features. It uses these models to de ne high-level control laws for nding and following paths de ned using constraints on the learned features. The agent abstracts these control laws, which interact with the continuous environment, to a nite set of actions that implement discrete state transitions. At this point, the agent has abstracted the robot's continuous world to a nite-state world and can use existing methods to learn its structure. The learning agent's methods are evaluated on several simulated robots with di erent sensorimotor systems and environments.

Learning to Perceive the World as Articulated: An Approach for Hierarchical Learning in Sensory-Motor Systems

by Jun Tani, Stefano Nolfi - NEURAL NETWORKS , 1999
"... This paper describes how agents can learn an internal model of the world structurally by focusing on the problem of behavior-based articulation. We develop an on-line learning scheme -- the so-called mixture of recurrent neural net (RNN) experts -- in which a set of RNN modules becomes self-organ ..."
Abstract - Cited by 82 (24 self) - Add to MetaCart
This paper describes how agents can learn an internal model of the world structurally by focusing on the problem of behavior-based articulation. We develop an on-line learning scheme -- the so-called mixture of recurrent neural net (RNN) experts -- in which a set of RNN modules becomes self-organized as experts on multiple levels in order to account for the different categories of sensory-motor flow which the robot experiences. Autonomous switching of activated modules in the lower level actually represents the articulation of the sensory-motor flow. In the meanwhile, a set of RNNs in the higher level competes to learn the sequences of module switching in the lower level, by which articulation at a further more abstract level can be achieved. The proposed scheme was examined through simulation experiments involving the navigation learning problem. Our dynamical systems analysis clarified the mechanism of the articulation; the possible correspondence between the articulation...

Evolutionary Algorithms for Reinforcement Learning

by David E. Moriarty, Alan C. Schultz, John J. Grefenstette - Journal of Artificial Intelligence Research , 1999
"... There are two distinct approaches to solving reinforcement learning problems, namely, searching in value function space and searching in policy space. Temporal difference methods and evolutionary algorithms are well-known examples of these approaches. Kaelbling, Littman and Moore recently provided a ..."
Abstract - Cited by 76 (1 self) - Add to MetaCart
There are two distinct approaches to solving reinforcement learning problems, namely, searching in value function space and searching in policy space. Temporal difference methods and evolutionary algorithms are well-known examples of these approaches. Kaelbling, Littman and Moore recently provided an informative survey of temporal difference methods. This article focuses on the application of evolutionary algorithms to the reinforcement learning problem, emphasizing alternative policy representations, credit assignment methods, and problem-specific genetic operators. Strengths and weaknesses of the evolutionary approach to reinforcement learning are presented, along with a survey of representative applications. 1. Introduction Kaelbling, Littman, and Moore (1996) and more recently Sutton and Barto (1998) provide informative surveys of the field of reinforcement learning (RL). They characterize two classes of methods for reinforcement learning: methods that search the space of value fu...

Robust Non-linear Control through Neuroevolution

by Faustino John Gomez , 2003
"... ..."
Abstract - Cited by 75 (18 self) - Add to MetaCart
Abstract not found

Solving Non-Markovian Control Tasks with Neuroevolution

by Faustino J. Gomez, Risto Miikkulainen - In Proceedings of the 16th International Joint Conference on Artificial Intelligence , 1999
"... The success of evolutionary methods on standard control learning tasks has created a need for new benchmarks. The classic pole balancing problem is no longer difficult enough to serve as a viable yardstick for measuring the learning efficiency of these systems. The double pole case, where two poles ..."
Abstract - Cited by 70 (22 self) - Add to MetaCart
The success of evolutionary methods on standard control learning tasks has created a need for new benchmarks. The classic pole balancing problem is no longer difficult enough to serve as a viable yardstick for measuring the learning efficiency of these systems. The double pole case, where two poles connected to the cart must be balanced simultaneously is much more difficult, especially when velocity information is not available. In this article, we demonstrate a neuroevolution system, Enforced Sub-populations (ESP), that is used to evolve a controller for the standard double pole task and a much harder, non-Markovian version. In both cases, our results show that ESP is faster than other neuroevolution methods. In addition, we introduce an incremental method that evolves on a sequence of tasks, and utilizes a local search technique (DeltaCoding) to sustain diversity. This method enables the system to solve even more difficult versions of the task where direct evolution cannot. 1 Introdu...

Symbiotic Evolution of Neural Networks in Sequential Decision Tasks

by David Eric Moriarty , 1997
"... ..."
Abstract - Cited by 58 (5 self) - Add to MetaCart
Abstract not found

Shifting Inductive Bias with Success-Story Algorithm, Adaptive Levin Search, and Incremental Self-Improvement

by Jürgen Schmidhuber, Jieyu Zhao, Marco Wiering - MACHINE LEARNING , 1997
"... We study task sequences that allow for speeding up the learner's average reward intake through appropriate shifts of inductive bias (changes of the learner's policy). To evaluate long-term effects of bias shifts setting the stage for later bias shifts we use the "success-story algorithm" (SSA). SSA ..."
Abstract - Cited by 58 (27 self) - Add to MetaCart
We study task sequences that allow for speeding up the learner's average reward intake through appropriate shifts of inductive bias (changes of the learner's policy). To evaluate long-term effects of bias shifts setting the stage for later bias shifts we use the "success-story algorithm" (SSA). SSA is occasionally called at times that may depend on the policy itself. It uses backtracking to undo those bias shifts that have not been empirically observed to trigger longterm reward accelerations (measured up until the current SSA call). Bias shifts that survive SSA represent a lifelong success history. Until the next SSA call, they are considered useful and build the basis for additional bias shifts. SSA allows for plugging in a wide variety of learning algorithms. We plug in (1) a novel, adaptive extension of Levin search and (2) a method for embedding the learner's policy modification strategy within the policy itself (incremental self-improvement). Our inductive transfer case studies...
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University