Results 1 - 10
of
229
Reinforcement learning: a survey
- Journal of Artificial Intelligence Research
, 1996
"... This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem ..."
Abstract
-
Cited by 1134 (21 self)
- Add to MetaCart
This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word "reinforcement." The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.
Wrappers for feature subset selection
- ARTIFICIAL INTELLIGENCE
, 1997
"... In the feature subset selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus its attention, while ignoring the rest. To achieve the best possible performance with a particular learning algorithm on a particular training set, a ..."
Abstract
-
Cited by 776 (3 self)
- Add to MetaCart
In the feature subset selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus its attention, while ignoring the rest. To achieve the best possible performance with a particular learning algorithm on a particular training set, a feature subset selection method should consider how the algorithm and the training set interact. We explore the relation between optimal feature subset selection and relevance. Our wrapper method searches for an optimal feature subset tailored to a particular algorithm and a domain. We study the strengths and weaknesses of the wrapper approach and show a series of improved designs. We compare the wrapper approach to induction without feature subset selection and to Relief, a filter approach to feature subset selection. Significant improvement in accuracy is achieved for some datasets for the two families of induction algorithms used: decision trees and
Dyna, an Integrated Architecture for Learning, Planning, and Reacting
- WORKING NOTES OF THE 1991 AAAI SPRING SYMPOSIUM
, 1991
"... Dyna is an AI architecture that integrates learning, planning, and reactive execution. Learning methods are used in Dyna both for compiling planning results and for updating a model of the effects of the agent's actions on the world. Planning is incremental and can use the probabilistic and ofttimes ..."
Abstract
-
Cited by 427 (13 self)
- Add to MetaCart
Dyna is an AI architecture that integrates learning, planning, and reactive execution. Learning methods are used in Dyna both for compiling planning results and for updating a model of the effects of the agent's actions on the world. Planning is incremental and can use the probabilistic and ofttimes incorrect world models generated by learning processes. Execution is fully reactive in the sense that no planning intervenes between perception and action. Dyna relies on machine learning methods for learning from examples -- these are among the basic building blocks making up the architecture -- yet is not tied to any particular method. This paper briefly introduces Dyna and discusses its strengths and weaknesses with respect to other architectures.
Prioritized sweeping: Reinforcement learning with less data and less time
- Machine Learning
, 1993
"... We present a new algorithm, Prioritized Sweeping, for e cient prediction and control of stochas-tic Markov systems. Incremental learning methods such asTemporal Di erencing and Q-learning have fast real time performance. Classical methods are slower, but more accurate, because they make full use of ..."
Abstract
-
Cited by 275 (5 self)
- Add to MetaCart
We present a new algorithm, Prioritized Sweeping, for e cient prediction and control of stochas-tic Markov systems. Incremental learning methods such asTemporal Di erencing and Q-learning have fast real time performance. Classical methods are slower, but more accurate, because they make full use of the observations. Prioritized Sweeping aims for the best of both worlds. It uses all previous experiences both to prioritize important dynamic programming sweeps and to guide the exploration of state-space. We compare Prioritized Sweeping with other reinforcement learning schemes for a number of di erent stochastic optimal control prob-lems. It successfully solves large state-space real time problems with which other methods have di culty. 1 1
Population-Based Incremental Learning: A Method for Integrating Genetic Search Based Function Optimization and Competitive Learning
, 1994
"... Genetic algorithms (GAs) are biologically motivated adaptive systems which have been used, with varying degrees of success, for function optimization. In this study, an abstraction of the basic genetic algorithm, the Equilibrium Genetic Algorithm (EGA), and the GA in turn, are reconsidered within th ..."
Abstract
-
Cited by 257 (11 self)
- Add to MetaCart
Genetic algorithms (GAs) are biologically motivated adaptive systems which have been used, with varying degrees of success, for function optimization. In this study, an abstraction of the basic genetic algorithm, the Equilibrium Genetic Algorithm (EGA), and the GA in turn, are reconsidered within the framework of competitive learning. This new perspective reveals a number of different possibilities for performance improvements. This paper explores population-based incremental learning (PBIL), a method of combining the mechanisms of a generational genetic algorithm with simple competitive learning. The combination of these two methods reveals a tool which is far simpler than a GA, and which out-performs a GA on large set of optimization problems in terms of both speed and accuracy. This paper presents an empirical analysis of where the proposed technique will outperform genetic algorithms, and describes a class of problems in which a genetic algorithm may be able to perform better. Extensions to this algorithm are discussed and analyzed. PBIL and extensions are compared with a standard GA on twelve problems, including standard numerical optimization functions, traditional GA test suite problems, and NP-Complete problems.
Cooperative mobile robotics: Antecedents and directions
, 1995
"... There has been increased research interest in systems composed of multiple autonomous mobile robots exhibiting collective behavior. Groups of mobile robots are constructed, with an aim to studying such issues as group architecture, resource conflict, origin of cooperation, learning, and geometric pr ..."
Abstract
-
Cited by 255 (3 self)
- Add to MetaCart
There has been increased research interest in systems composed of multiple autonomous mobile robots exhibiting collective behavior. Groups of mobile robots are constructed, with an aim to studying such issues as group architecture, resource conflict, origin of cooperation, learning, and geometric problems. As yet, few applications of collective robotics have been reported, and supporting theory is still in its formative stages. In this paper, we give a critical survey of existing works and discuss open problems in this field, emphasizing the various theoretical issues that arise in the study of cooperative robotics. We describe the intellectual heritages that have guided early research, as well as possible additions to the set of existing motivations. 1
Self-improving reactive agents based on reinforcement learning, planning and teaching
- Machine Learning
, 1992
"... Abstract. To date, reinforcement learning has mostly been studied solving simple learning tasks. Reinforcement learning methods that have been studied so far typically converge slowly. The purpose of this work is thus two-fold: 1) to investigate the utility of reinforcement learning in solving much ..."
Abstract
-
Cited by 255 (2 self)
- Add to MetaCart
Abstract. To date, reinforcement learning has mostly been studied solving simple learning tasks. Reinforcement learning methods that have been studied so far typically converge slowly. The purpose of this work is thus two-fold: 1) to investigate the utility of reinforcement learning in solving much more complicated learning tasks than previously studied, and 2) to investigate methods that will speed up reinforcement learning. This paper compares eight reinforcement learning frameworks: adaptive heuristic critic (AHC) learning due to Sutton, Q-learning due to Watkins, and three extensions to both basic methods for speeding up learning. The three extensions are experience replay, learning action models for planning, and teaching. The frameworks were investigated using connectionism as an approach to generalization. To evaluate the performance of different frame-works, a dynamic environment was used as a testbed. The enviromaaent is moderately complex and nondetermin-istic. This paper describes these frameworks and algorithms in detail and presents empirical evaluation of the frameworks.
The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces
- Machine Learning
, 1995
"... Abstract. Parti-game is a new algorithm for learning feasible trajectories to goal regions in high dimensional continuous state-spaces. In high dimensions it is essential that learning does not plan uniformly over a state-space. Parti-game maintains a decision-tree partitioning of state-space and ap ..."
Abstract
-
Cited by 203 (8 self)
- Add to MetaCart
Abstract. Parti-game is a new algorithm for learning feasible trajectories to goal regions in high dimensional continuous state-spaces. In high dimensions it is essential that learning does not plan uniformly over a state-space. Parti-game maintains a decision-tree partitioning of state-space and applies techniques from game-theory and computational geometry to e ciently and adaptively concentrate high resolution only on critical areas. The currentversion of the algorithm is designed to nd feasible paths or trajectories to goal regions in high dimensional spaces. Future versions will be designed to nd a solution that optimizes a real-valued criterion. Many simulated problems have been tested, ranging from two-dimensional to nine-dimensional state-spaces, including mazes, path planning, non-linear dynamics, and planar snake robots in restricted spaces. In all cases, a good solution is found in less than ten trials and a few minutes.
Learning to Coordinate Behaviors
, 1990
"... We describe an algorithm which allows a behavior-based robot to learn on the basis of positive and negative feedback when to activate its behaviors. In accordance with the philosophy of behavior-based robots, the algorithm is completely distributed: each of the behaviors independently tries to find ..."
Abstract
-
Cited by 190 (3 self)
- Add to MetaCart
We describe an algorithm which allows a behavior-based robot to learn on the basis of positive and negative feedback when to activate its behaviors. In accordance with the philosophy of behavior-based robots, the algorithm is completely distributed: each of the behaviors independently tries to find out (i) whether it is relevant (ie. whether it is at all correlated to positive feedback) and (ii) what the conditions are under which it becomes reliable (i.e. the conditions under which it maximizes the probability of receiving positive feedback and minimizes the probability of receiving negative feedback). The algorithm has been tested successfully on an autonomous 6-legged robot which had to learn how to coordinate its legs so as to walk forward. Situation of the Problem Since 1985, the MIT Mobile Robot group has advocated a radically different architecture for autonomous intelligent agents (Brooks, 1986). Instead of decomposing the architecture into functional modules, such as percept...

