Results 1  10
of
36
Between MDPs and SemiMDPs: A Framework for Temporal Abstraction in Reinforcement Learning
 Artificial Intelligence
, 1999
"... Learning, planning, and representing knowledge at multiple levels of temporal abstraction are key, longstanding challenges for AI. In this paper we consider how these challenges can be addressed within the mathematical framework of reinforcement learning and Markov decision processes (MDPs). We ..."
Abstract

Cited by 427 (29 self)
 Add to MetaCart
Learning, planning, and representing knowledge at multiple levels of temporal abstraction are key, longstanding challenges for AI. In this paper we consider how these challenges can be addressed within the mathematical framework of reinforcement learning and Markov decision processes (MDPs). We extend the usual notion of action in this framework to include optionsclosedloop policies for taking action over a period of time. Examples of options include picking up an object, going to lunch, and traveling to a distant city, as well as primitive actions such as muscle twitches and joint torques. Overall, we show that options enable temporally abstract knowledge and action to be included in the reinforcement learning framework in a natural and general way. In particular, we show that options may be used interchangeably with primitive actions in planning methods such as dynamic programming and in learning methods such as Qlearning.
Efficient algorithms for minimizing cross validation error
 In Proceedings of the Eleventh International Conference on Machine Learning
, 1994
"... Model selection is important in many areas of supervised learning. Given a dataset and a set of models for predicting with that dataset, we must choose the model which is expected to best predict future data. In some situations, such as online learning for control of robots or factories, data is che ..."
Abstract

Cited by 128 (6 self)
 Add to MetaCart
Model selection is important in many areas of supervised learning. Given a dataset and a set of models for predicting with that dataset, we must choose the model which is expected to best predict future data. In some situations, such as online learning for control of robots or factories, data is cheap and human expertise costly. Cross validation can then be a highly effective method for automatic model selection. Large scale cross validation search can, however, be computationally expensive. This paper introduces new algorithms to reduce the computational burden of such searches. We show how experimental design methods can achieve this, using a technique similar to a Bayesian version of Kaelblingâ€™s Interval Estimation. Several improvements are then given, including (1) the use of blocking to quickly spot nearidentical models, and (2) schemata search: a new method for quickly finding families of relevant features. Experiments are presented for robot data and noisy synthetic datasets. The new algorithms speed up computation without sacrificing reliability, and in some cases are more reliable than conventional techniques. 1
Hoeffding Races: Accelerating Model Selection Search for Classification and Function Approximation
 In Advances in neural information processing systems 6
, 1994
"... Selecting a good model of a set of input points by cross validation is a computationally intensive process, especially if the number of possible models or the number of training points is high. Techniques such as gradient descent are helpful in searching through the space of models, but problems suc ..."
Abstract

Cited by 101 (9 self)
 Add to MetaCart
Selecting a good model of a set of input points by cross validation is a computationally intensive process, especially if the number of possible models or the number of training points is high. Techniques such as gradient descent are helpful in searching through the space of models, but problems such as local minima, and more importantly, lack of a distance metric between various models reduce the applicability of these search methods. Hoeffding Races is a technique for finding a good model for the data by quickly discarding bad models, and concentrating the computational effort at differentiating between the better ones. This paper focuses on the special case of leaveoneout cross validation applied to memorybased learning algorithms, but we also argue that it is applicable to any class of model selection problems. 1 Introduction Model selection addresses "high level" decisions about how best to tune learning algorithm architectures for particular tasks. Such decisions include which...
Temporal Abstraction in Reinforcement Learning
, 2000
"... Decision making usually involves choosing among different courses of action over a broad range of time scales. For instance, a person planning a trip to a distant location makes highlevel decisions regarding what means of transportation to use, but also chooses lowlevel actions, such as the moveme ..."
Abstract

Cited by 59 (2 self)
 Add to MetaCart
Decision making usually involves choosing among different courses of action over a broad range of time scales. For instance, a person planning a trip to a distant location makes highlevel decisions regarding what means of transportation to use, but also chooses lowlevel actions, such as the movements for getting into a car. The problem of picking an appropriate time scale for reasoning and learning has been explored in artificial intelligence, control theory and robotics. In this dissertation we develop a framework that allows novel solutions to this problem, in the context of Markov Decision Processes (MDPs) and reinforcement learning. In this dissertation, we present a general framework for prediction, control and learning at multipl...
Between MDPs and semiMDPs: Learning, planning, and representing knowledge at multiple temporal scales
 Journal of Artificial Intelligence Research
, 1998
"... Learning, planning, and representing knowledge at multiple levels of temporal abstraction are key challenges for AI. In this paper we develop an approach to these problems based on the mathematical framework of reinforcement learning and Markov decision processes (MDPs). We extend the usual notion o ..."
Abstract

Cited by 55 (7 self)
 Add to MetaCart
Learning, planning, and representing knowledge at multiple levels of temporal abstraction are key challenges for AI. In this paper we develop an approach to these problems based on the mathematical framework of reinforcement learning and Markov decision processes (MDPs). We extend the usual notion of action to include optionsâ€”whole courses of behavior that may be temporally extended, stochastic, and contingent on events. Examples of options include picking up an object, going to lunch, and traveling to a distant city, as well as primitive actions such as muscle twitches and joint torques. Options may be given a priori, learned by experience, or both. They may be used interchangeably with actions in a variety of planning and learning methods. The theory of semiMarkov decision processes (SMDPs) can be applied to model the consequences of options and as a basis for planning and learning methods using them. In this paper we develop these connections, building on prior work by Bradtke and Duff (1995), Parr (in prep.) and others. Our main novel results concern the interface between the MDP and SMDP levels of analysis. We show how a set of options can be altered by changing only their termination conditions
The Racing Algorithm: Model Selection for Lazy Learners
 Artificial Intelligence Review
, 1997
"... Given a set of models and some training data, we would like to find the model that best describes the data. Finding the model with the lowest generalization error is a computationally expensive process, especially if the number of testing points is high or if the number of models is large. Optimizat ..."
Abstract

Cited by 50 (3 self)
 Add to MetaCart
Given a set of models and some training data, we would like to find the model that best describes the data. Finding the model with the lowest generalization error is a computationally expensive process, especially if the number of testing points is high or if the number of models is large. Optimization techniques such as hill climbing or genetic algorithms are helpful but can end up with a model that is arbitrarily worse than the best one or cannot be used because there is no distance metric on the space of discrete models. In this paper we develop a technique called "racing" that tests the set of models in parallel, quickly discards those models that are clearly inferior and concentrates the computational effort on differentiating among the better models. Racing is especially suitable for selecting among lazy learners since training requires negligible expense, and incremental testing using leaveoneout cross validation is efficient. We use racing to select among various lazy learnin...
CompetitionBased Learning
, 1992
"... This paper summarizes recent research on competitionbased learning procedures performed by the Navy Center for Applied Research in Artificial Intelligence at the Naval Research Laboratory. We have focused on a particularly interesting class of competitionbased techniques called genetic algorithms. ..."
Abstract

Cited by 44 (5 self)
 Add to MetaCart
This paper summarizes recent research on competitionbased learning procedures performed by the Navy Center for Applied Research in Artificial Intelligence at the Naval Research Laboratory. We have focused on a particularly interesting class of competitionbased techniques called genetic algorithms. Genetic algorithms are adaptive search algorithms based on principles derived from the mechanisms of biological evolution. Recent results on the analysis of the implicit parallelism of alternative selection algorithms are summarized, along with an analysis of alternative crossover operators. Applications of these results in practical learning systems for sequential decision problems and for concept classification are also presented. INTRODUCTION One approach to the design of more flexible computer systems is to extract heuristics from existing adaptive systems. We have focused on a class of learning systems that use competitionbased procedures, called genetic algorithms (GAs). GAs are ba...
Memorybased Stochastic Optimization
 Neural Information Processing Systems 8
, 1995
"... In this paper we introduce new algorithms for optimizing noisy plants in which each experiment is very expensive. The algorithms build a global nonlinear model of the expected output at the same time as using Bayesian linear regression analysis of locally weighted polynomial models. The local model ..."
Abstract

Cited by 41 (7 self)
 Add to MetaCart
In this paper we introduce new algorithms for optimizing noisy plants in which each experiment is very expensive. The algorithms build a global nonlinear model of the expected output at the same time as using Bayesian linear regression analysis of locally weighted polynomial models. The local model answers queries about confidence, noise, gradient and Hessians, and use them to make automated decisions similar to those made by a practitioner of Response Surface Methodology. The global and local models are combined naturally as a locally weighted regression. We examine the question of whether the global model can really help optimization, and we extend it to the case of timevarying functions. We compare the new algorithms with a highly tuned higherorder stochastic optimization algorithm on randomlygenerated functions and a simulated manufacturing task. We note significant improvements in total regret, time to converge, and final solution quality. 1 INTRODUCTION In a stochastic optim...
Adaptive problemsolving for largescale scheduling problems: A case study
, 1996
"... Although most scheduling problems are NPhard, domain specific techniques perform well in practice but are quite expensive to construct. In adaptive problemsolving, domain specific knowledge is acquired automatically for a general problem solver with a flexible control architecture. In this approac ..."
Abstract

Cited by 26 (3 self)
 Add to MetaCart
Although most scheduling problems are NPhard, domain specific techniques perform well in practice but are quite expensive to construct. In adaptive problemsolving, domain specific knowledge is acquired automatically for a general problem solver with a flexible control architecture. In this approach, a learning system explores a space of possible heuristic methods for one wellsuited to the eccentricities of the given domain and problem distribution. In this article, we discuss an application of the approach to scheduling satellite communications. Using problem distributions based on actual mission requirements, our approach identifies strategies that not only decrease the amount of CPU time required to produce schedules, but also increase the percentage of problems that are solvable within computational resource limitations. 1.
Automatically Choosing the Number of Fitness Cases: The Rational Allocation of Trials
 Genetic Programming 1997: Proceedings of the Second Annual Conference
"... For many problems to which genetic programming has been applied, choosing the number of fitness cases with which to evaluate the individuals is a crucial decision. If too few fitness cases are used, overfitting may occur, and the measured fitness of an individual may not be representative of its tru ..."
Abstract

Cited by 23 (1 self)
 Add to MetaCart
For many problems to which genetic programming has been applied, choosing the number of fitness cases with which to evaluate the individuals is a crucial decision. If too few fitness cases are used, overfitting may occur, and the measured fitness of an individual may not be representative of its true fitness. On the other hand, if too many fitness cases are used, a great deal of computer time can be wasted. This paper presents a method for the Rational Allocation of Trials (RAT) that dynamically allocates a boundedly optimal number of fitness cases for each individual. RAT allocates individuals to tournaments prior to their evaluation, and then, borrowing from previous work in model selection, allocates trials (fitness cases) only to those individuals for whom the cost of evaluating another fitness case is outweighed by the expected utility that the new information will provide. For most evolutionary computation approaches, including genetic programming, and for most problems, the RAT ...