Results 1  10
of
90
Reinforcement Learning Methods for ContinuousTime Markov Decision Problems
 Advances in Neural Information Processing Systems
, 1994
"... SemiMarkov Decision Problems are continuous time generalizations of discrete time Markov Decision Problems. A number of reinforcement learning algorithms have been developed recently for the solution of Markov Decision Problems, based on the ideas of asynchronous dynamic programming and stochastic ..."
Abstract

Cited by 134 (0 self)
 Add to MetaCart
SemiMarkov Decision Problems are continuous time generalizations of discrete time Markov Decision Problems. A number of reinforcement learning algorithms have been developed recently for the solution of Markov Decision Problems, based on the ideas of asynchronous dynamic programming and stochastic approximation. Among these are TD(), Qlearning, and Realtime Dynamic Programming. After reviewing semiMarkov Decision Problems and Bellman's optimality equation in that context, we propose algorithms similar to those named above, adapted to the solution of semiMarkov Decision Problems. We demonstrate these algorithms by applying them to the problem of determining the optimal control for a simple queueing system. We conclude with a discussion of circumstances under which these algorithms may be usefully applied. 1 Introduction A number of reinforcement learning algorithms based on the ideas of asynchronous dynamic programming and stochastic approximation have been developed recently for...
Continual Learning In Reinforcement Environments
, 1994
"... Continual learning is the constant development of complex behaviors with no final end in mind. It is the process of learning ever more complicated skills by building on those skills already developed. In order for learning at one stage of development to serve as the foundation for later learning, a ..."
Abstract

Cited by 88 (14 self)
 Add to MetaCart
Continual learning is the constant development of complex behaviors with no final end in mind. It is the process of learning ever more complicated skills by building on those skills already developed. In order for learning at one stage of development to serve as the foundation for later learning, a continuallearning agent should learn hierarchically. CHILD, an agent capable of Continual, Hierarchical, Incremental Learning and Development is proposed, described, tested, and evaluated in this dissertation. CHILD accumulates useful behaviors in reinforcement environments by using the Temporal Transition Hierarchies learning algorithm, also derived in the dissertation. This constructive algorithm generates a hierarchical, higherorder neural network that can be used for predicting contextdependent temporal sequences and can learn sequentialtask benchmarks more than two orders of magnitude faster than competing neuralnetwork systems. Consequently, CHILD can quickly solve complicated non...
Reinforcement Learning Applied to Linear Quadratic Regulation
 In Advances in Neural Information Processing Systems 5
, 1993
"... Recent research on reinforcement learning has focused on algorithms based on the principles of Dynamic Programming (DP). One of the most promising areas of application for these algorithms is the control of dynamical systems, and some impressive results have been achieved. However, there are sig ..."
Abstract

Cited by 62 (3 self)
 Add to MetaCart
Recent research on reinforcement learning has focused on algorithms based on the principles of Dynamic Programming (DP). One of the most promising areas of application for these algorithms is the control of dynamical systems, and some impressive results have been achieved. However, there are significant gaps between practice and theory. In particular, there are no convergence proofs for problems with continuous state and action spaces, or for systems involving nonlinear function approximators (such as multilayer perceptrons). This paper presents research applying DPbased reinforcement learning theory to Linear Quadratic Regulation (LQR), an important class of control problems involving continuous state and action spaces and requiring a simple type of nonlinear function approximator. We describe an algorithm based on Qlearning that is proven to converge to the optimal controller for a large class of LQR problems. We also describe a slightly different algorithm that is...
Reinforcement Learning And Its Application To Control
, 1992
"... Learning control involves modifying a controller's behavior to improve its performance as measured by some predefined index of performance (IP). If control actions that improve performance as measured by the IP are known, supervised learning methods, or methods for learning from examples, can ..."
Abstract

Cited by 56 (2 self)
 Add to MetaCart
Learning control involves modifying a controller's behavior to improve its performance as measured by some predefined index of performance (IP). If control actions that improve performance as measured by the IP are known, supervised learning methods, or methods for learning from examples, can be used to train the controller. But when such control actions are not known a priori, appropriate control behavior has to be inferred from observations of the IP. One can distinguish between two classes of methods for training controllers under such circumstances. Indirect methods involve constructing a model of the problem's IP and using the model to obtain training information for the controller. On the other hand, direct, or modelfree,...
Learning Evaluation Functions For Global Optimization
, 1998
"... In complex sequential decision problems suchasscheduling factory production, planning medical treatments, and playing backgammon, optimal decision policies are in general unknown, and it is often difficult, even for human domain experts, to manually encode good decision policies in software. The rei ..."
Abstract

Cited by 37 (6 self)
 Add to MetaCart
In complex sequential decision problems suchasscheduling factory production, planning medical treatments, and playing backgammon, optimal decision policies are in general unknown, and it is often difficult, even for human domain experts, to manually encode good decision policies in software. The reinforcementlearning methodology of "value function approximation" (VFA) offers an alternative: systems can learn effective decision policies autonomously, simply by simulating the task and keeping statistics on which decisions lead to good ultimate performance and which do not. This thesis advances the state of the art in VFA in two ways. First, it
Adaptive Linear Quadratic Control Using Policy Iteration
, 1994
"... In this paper we present stability and convergence results for Dynamic Programmingbased reinforcement learning applied to Linear Quadratic Regulation (LQR). The specific algorithm we analyze is based on Qlearning and it is proven to converge to the optimal controller provided that the underlying sy ..."
Abstract

Cited by 32 (2 self)
 Add to MetaCart
In this paper we present stability and convergence results for Dynamic Programmingbased reinforcement learning applied to Linear Quadratic Regulation (LQR). The specific algorithm we analyze is based on Qlearning and it is proven to converge to the optimal controller provided that the underlying system is controllable and a particular signal vector is persistently excited. The performance of the algorithm is illustrated by applying it to a model of a flexible beam. This work was supported by the Air Force Office of Scientific Research, Bolling AFB, under Grant AFOSR F496209310269 and by the National Science Foundation under Grant ECS9214866 to Prof. Andrew Barto. 1 Introduction In many practical applications a stabilizing feedback control for the system may be known. In this paper we discuss the problem of how to improve this controller and, under certain circumstances, make it converge to the optimal. The approach we take can be classified as direct optimal adaptive control a...
Incremental Dynamic Programming for OnLine Adaptive Optimal Control
, 1994
"... Reinforcement learning algorithms based on the principles of Dynamic Programming (DP) have enjoyed a great deal of recent attention both empirically and theoretically. These algorithms have been referred to generically as Incremental Dynamic Programming (IDP) algorithms. IDP algorithms are intended ..."
Abstract

Cited by 24 (2 self)
 Add to MetaCart
Reinforcement learning algorithms based on the principles of Dynamic Programming (DP) have enjoyed a great deal of recent attention both empirically and theoretically. These algorithms have been referred to generically as Incremental Dynamic Programming (IDP) algorithms. IDP algorithms are intended for use in situations where the information or computational resources needed by traditional dynamic programming algorithms are not available. IDP algorithms attempt to find a global solution to a DP problem by incrementally improving local constraint satisfaction properties as experience is gained through interaction with the environment. This class of algorithms is not new, going back at least as far as Samuel's adaptive checkersplaying programs,...
Information genealogy: Uncovering the flow of ideas in nonhyperlinked document databases
 In Knowledge Discovery and Data Mining (KDD) Conference
, 2007
"... We now have incrementallygrown databases of text documents ranging back for over a decade in areas ranging from personal email, to newsarticles and conference proceedings. While accessing individual documents is easy, methods for overviewing and understanding these collections as a whole are lacki ..."
Abstract

Cited by 20 (1 self)
 Add to MetaCart
We now have incrementallygrown databases of text documents ranging back for over a decade in areas ranging from personal email, to newsarticles and conference proceedings. While accessing individual documents is easy, methods for overviewing and understanding these collections as a whole are lacking in number and in scope. In this paper, we address one such global analysis task, namely the problem of automatically uncovering how ideas spread through the collection over time. We refer to this problem as Information Genealogy. In contrast to bibliometric methods that are limited to collections with explicit citation structure, we investigate contentbased methods requiring only the text and timestamps of the documents. In particular, we propose a languagemodeling approach and a likelihood ratio test to detect influence between documents in a statistically wellfounded way. Furthermore, we show how this method can be used to infer citation graphs and to identify the most influential documents in the collection. Experiments on the NIPS conference proceedings and the Physics ArXiv show that our method is more effective than methods based on
Integration of semantic and syntactic constraints for structural noun phrase disambiguation
 In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence
, 1989
"... A fundamental problem in Natural Language Processing is the integration of syntactic and semantic constraints. In this paper we describe a new approach for the integration of syntactic and semantic constraints which takes advantage of a learned memory model. Our model combines localist representatio ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
A fundamental problem in Natural Language Processing is the integration of syntactic and semantic constraints. In this paper we describe a new approach for the integration of syntactic and semantic constraints which takes advantage of a learned memory model. Our model combines localist representations for the integration of constraints and distributed representations for learning semantic constraints. We apply this model to the problem of structural disambiguation of noun phrases and show that a learned connectionist model can scale up the underlying memory of a Natural Language Processing system. 1
Explaining Temporal Differences to Create Useful Concepts for Evaluating States
 In Proceedings of the Eighth National Conference on AI, Menlo Park
, 1990
"... We describe a technique for improving problemsolving performance by creating concepts that allow problem states to be evaluated through an efficient recognition process. A temporaldifference (TD) method is used to bootstrap a collection of useful concepts by backing up evaluations from recogn ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
We describe a technique for improving problemsolving performance by creating concepts that allow problem states to be evaluated through an efficient recognition process. A temporaldifference (TD) method is used to bootstrap a collection of useful concepts by backing up evaluations from recognized states to their predecessors. This procedure is combined with explanationbased generalization (EBG) and goal regression to use knowledge of the problem domain to help generalize the new concept definitions. This maintains the efficiency of using the concepts and accelerates the learning process in comparison to knowledgefree approaches. Also, because the learned definitions may describe negative conditions, it becomes possible to use EBG to explain why some instance is not an example of a concept. The learning technique has been elaborated for minimax gameplaying and tested on a TicTacToe system, T2. Given only concepts defining the endgame states and constrained to ...
Results 1  10
of
90