• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Learning and sequential decision making (1990)

by A G Barto, R S Sutton
Add To MetaCart

Tools

Sorted by:
Results 11 - 20 of 150
Next 10 →

Locally Weighted Learning for Control

by Christopher G. Atkeson, Andrew W. Moore, Stefan Schaal , 1996
"... Lazy learning methods provide useful representations and training algorithms for learning about complex phenomena during autonomous adaptive control of complex systems. This paper surveys ways in which locally weighted learning, a type of lazy learning, has been applied by us to control tasks. We ex ..."
Abstract - Cited by 137 (17 self) - Add to MetaCart
Lazy learning methods provide useful representations and training algorithms for learning about complex phenomena during autonomous adaptive control of complex systems. This paper surveys ways in which locally weighted learning, a type of lazy learning, has been applied by us to control tasks. We explain various forms that control tasks can take, and how this affects the choice of learning paradigm. The discussion section explores the interesting impact that explicitly remembering all previous experiences has on the problem of learning to control.

Learning Sequential Decision Rules Using Simulation Models and Competition

by John Grefenstette, Connie Loggia Ramsey, Alan C. Schultz , 1990
"... . The problem of learning decision rules for sequential tasks is addressed, focusing on the problem of learning tactical decision rules from a simple flight simulator. The learning method relies on the notion of competition and employs genetic algorithms to search the space of decision policies. Sev ..."
Abstract - Cited by 135 (36 self) - Add to MetaCart
. The problem of learning decision rules for sequential tasks is addressed, focusing on the problem of learning tactical decision rules from a simple flight simulator. The learning method relies on the notion of competition and employs genetic algorithms to search the space of decision policies. Several experiments are presented that address issues arising from differences between the simulation model on which learning occurs and the target environment on which the decision rules are ultimately tested. Key words: sequential decision rules, competition-based learning, genetic algorithms Running Head: Learning Sequential Decision Rules Machine Learning 5(4), 355-381. - 2 - 1. Introduction In response to the knowledge acquisition bottleneck associated with the design of expert systems, research in machine learning attempts to automate the knowledge acquisition process and to broaden the base of accessible sources of knowledge. The choice of an appropriate learning technique depends on ...

Efficient Exploration In Reinforcement Learning

by Sebastian B. Thrun , 1992
"... Exploration plays a fundamental role in any active learning system. This study evaluates the role of exploration in active learning and describes several local techniques for exploration in finite, discrete domains, embedded in a reinforcement learning framework (delayed reinforcement). This paper d ..."
Abstract - Cited by 115 (4 self) - Add to MetaCart
Exploration plays a fundamental role in any active learning system. This study evaluates the role of exploration in active learning and describes several local techniques for exploration in finite, discrete domains, embedded in a reinforcement learning framework (delayed reinforcement). This paper distinguishes between two families of exploration schemes: undirected and directed exploration. While the former family is closely related to random walk exploration, directed exploration techniques memorize exploration-specific knowledge which is used for guiding the exploration search. In many finite deterministic domains, any learning technique based on undirected exploration is inefficient in terms of learning time, i.e. learning time is expected to scale exponentially with the size of the state space (Whitehead, 1991b) . We prove that for all these domains, reinforcement learning using a directed technique can always be performed in polynomial time, demonstrating the important role of e...

Neuro-Fuzzy Modeling and Control

by Jyh-Shing Roger Jang, Chuen-Tsai Sun - PROCEEDINGS OF THE IEEE , 1995
"... Fundamental and advanced developments in neuro-fuzzy synergisms for modeling and control are reviewed. The essential part of neuro-fuzzy synergisms comes from a common framework called adaptive networks, which unifies both neural networks and fuzzy models. The fuzzy models under the framework of ada ..."
Abstract - Cited by 110 (1 self) - Add to MetaCart
Fundamental and advanced developments in neuro-fuzzy synergisms for modeling and control are reviewed. The essential part of neuro-fuzzy synergisms comes from a common framework called adaptive networks, which unifies both neural networks and fuzzy models. The fuzzy models under the framework of adaptive networks is called ANFIS (Adaptive-Network-based Fuzzy Inference System), which possess certain advantages over neural networks. We introduce the design methods for ANFIS in both modeling and control applications. Current problems and future directions for neuro-fuzzy approaches are also addressed.

Model Minimization in Markov Decision Processes

by Thomas Dean, Robert Givan - In Proceedings of the Fourteenth National Conference on Artificial Intelligence , 1997
"... We use the notion of stochastic bisimulation homogeneity to analyze planning problems represented as Markov decision processes (MDPs). Informally, a partition of the state space for an MDP is said to be homogeneous if for each action, states in the same block have the same probability of being ..."
Abstract - Cited by 97 (7 self) - Add to MetaCart
We use the notion of stochastic bisimulation homogeneity to analyze planning problems represented as Markov decision processes (MDPs). Informally, a partition of the state space for an MDP is said to be homogeneous if for each action, states in the same block have the same probability of being carried to each other block. We provide an algorithm for finding the coarsest homogeneous refinement of any partition of the state space of an MDP. The resulting partition can be used to construct a reduced MDP which is minimal in a well defined sense and can be used to solve the original MDP. Our algorithm is an adaptation of known automata minimization algorithms, and is designed to operate naturally on factored or implicit representations in which the full state space is never explicitly enumerated. We show that simple variations on this algorithm are equivalent or closely similar to several different recently published algorithms for finding optimal solutions to (partially ...

Hierarchical Learning in Stochastic Domains: Preliminary Results

by Leslie Pack Kaelbling - In Proceedings of the Tenth International Conference on Machine Learning , 1993
"... This paper presents the HDG learning algorithm, which uses a hierarchical decomposition of the state space to make learning to achieve goals more efficient with a small penalty in path quality. Special care must be taken when performing hierarchical planning and learning in stochastic domains, ..."
Abstract - Cited by 94 (7 self) - Add to MetaCart
This paper presents the HDG learning algorithm, which uses a hierarchical decomposition of the state space to make learning to achieve goals more efficient with a small penalty in path quality. Special care must be taken when performing hierarchical planning and learning in stochastic domains, because macro-operators cannot be executed ballistically. The HDG algorithm, which is a descendent of Watkins' Q-learning algorithm, is described here and preliminary empirical results are presented. 1 INTRODUCTION Reinforcement learning is a general tool for deriving strategies that optimize a fixed reinforcement function in a stochastic environment. A crucial problem in reinforcement learning is temporal credit assignment: how to choose actions based on good results that happen after (perhaps long after) the action is taken. This problem is solved well in the general case by temporal difference methods, such as Watkins' Q learning [Barto et al., 1989, Watkins, 1989] and Sutton's TD ...

Memoryless Policies: Theoretical Limitations and Practical Results

by Michael L. Littman - In , 1994
"... One form of adaptive behavior is "goal-seeking" in which an agent acts so as to minimize the time it takes to reach a goal state. This paper presents some theoretical and empirical findings on algorithms that devise goal-seeking behaviors for "memoryless" agents who base their behavioral decisions s ..."
Abstract - Cited by 88 (3 self) - Add to MetaCart
One form of adaptive behavior is "goal-seeking" in which an agent acts so as to minimize the time it takes to reach a goal state. This paper presents some theoretical and empirical findings on algorithms that devise goal-seeking behaviors for "memoryless" agents who base their behavioral decisions solely on current sensations. The basic results are that (1) the general problem of finding good deterministic memoryless policies is intractable, however, (2) simple branch-and-bound heuristics can be used to find optimal memoryless policies extremely quickly for some established example environments. 1 Introduction This paper looks at a class of behaviors, or policies, that can be called "memoryless" since action decisions are made solely on this basis of the agent's current sensation. In nature, it would seem that memoryless behavior makes little sense. What organism would possibly ignore recent events in deciding how to act? Research on artificial agents, however, is more apt to focus o...

Creating Advice-Taking Reinforcement Learners

by Richard Maclin, Jude W. Shavlik, Pack Kaelbling - Machine Learning , 1996
"... . Learning from reinforcements is a promising approach for creating intelligent agents. However, reinforcement learning usually requires a large number of training episodes. We present and evaluate a design that addresses this shortcoming by allowing a connectionist Q-learner to accept advice given, ..."
Abstract - Cited by 84 (10 self) - Add to MetaCart
. Learning from reinforcements is a promising approach for creating intelligent agents. However, reinforcement learning usually requires a large number of training episodes. We present and evaluate a design that addresses this shortcoming by allowing a connectionist Q-learner to accept advice given, at any time and in a natural manner, by an external observer. In our approach, the advice-giver watches the learner and occasionally makes suggestions, expressed as instructions in a simple imperative programming language. Based on techniques from knowledge-based neural networks, we insert these programs directly into the agent's utility function. Subsequent reinforcement learning further integrates and refines the advice. We present empirical evidence that investigates several aspects of our approach and show that, given good advice, a learner can achieve statistically significant gains in expected reward. A second experiment shows that advice improves the expected reward regardless of the...

Evolutionary Algorithms for Reinforcement Learning

by David E. Moriarty, Alan C. Schultz, John J. Grefenstette - Journal of Artificial Intelligence Research , 1999
"... There are two distinct approaches to solving reinforcement learning problems, namely, searching in value function space and searching in policy space. Temporal difference methods and evolutionary algorithms are well-known examples of these approaches. Kaelbling, Littman and Moore recently provided a ..."
Abstract - Cited by 76 (1 self) - Add to MetaCart
There are two distinct approaches to solving reinforcement learning problems, namely, searching in value function space and searching in policy space. Temporal difference methods and evolutionary algorithms are well-known examples of these approaches. Kaelbling, Littman and Moore recently provided an informative survey of temporal difference methods. This article focuses on the application of evolutionary algorithms to the reinforcement learning problem, emphasizing alternative policy representations, credit assignment methods, and problem-specific genetic operators. Strengths and weaknesses of the evolutionary approach to reinforcement learning are presented, along with a survey of representative applications. 1. Introduction Kaelbling, Littman, and Moore (1996) and more recently Sutton and Barto (1998) provide informative surveys of the field of reinforcement learning (RL). They characterize two classes of methods for reinforcement learning: methods that search the space of value fu...

Lamarckian Learning in Multi-agent Environments

by John J. Grefenstette - Proceedings of the Fourth International Conference on Genetic Algorithms , 1991
"... Genetic algorithms gain much of their power from mechanisms derived from the field of population genetics. However, it is possible, and in some cases desirable, to augment the standard mechanisms with additional features not available in biological systems. In this paper, we examine the use of Lamar ..."
Abstract - Cited by 71 (13 self) - Add to MetaCart
Genetic algorithms gain much of their power from mechanisms derived from the field of population genetics. However, it is possible, and in some cases desirable, to augment the standard mechanisms with additional features not available in biological systems. In this paper, we examine the use of Lamarckian learning operators in the SAMUEL architecture. The use of the operators is illustrated on three tasks in multi-agent environments. 1 INTRODUCTION The goal of this work is to explore the application of machine learning techniques to reactive control problems arising in competitive, multi-agent domains. In such domains, traditional AI planning approaches are usually infeasible, because of the complexity of the multi-agent interactions and the inherent uncertainty about the future actions of other agents. On the other hand, genetic algorithms [11] appear to be a promising approach to developing high performance control strategies. SAMUEL is our platform for exploring the use of genetic...
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University