Results 1  10
of
13
Learning to predict by the methods of temporal differences
 MACHINE LEARNING
, 1988
"... This article introduces a class of incremental learning procedures specialized for prediction – that is, for using past experience with an incompletely known system to predict its future behavior. Whereas conventional predictionlearning methods assign credit by means of the difference between predi ..."
Abstract

Cited by 1231 (45 self)
 Add to MetaCart
This article introduces a class of incremental learning procedures specialized for prediction – that is, for using past experience with an incompletely known system to predict its future behavior. Whereas conventional predictionlearning methods assign credit by means of the difference between predicted and actual outcomes, the new methods assign credit by means of the difference between temporally successive predictions. Although such temporaldifference methods have been used in Samuel's checker player, Holland's bucket brigade, and the author's Adaptive Heuristic Critic, they have remained poorly understood. Here we prove their convergence and optimality for special cases and relate them to supervisedlearning methods. For most realworld prediction problems, temporaldifference methods require less memory and less peak computation than conventional methods and they produce more accurate predictions. We argue that most problems to which supervised learning is currently applied are really prediction problems of the sort to which temporaldifference methods can be applied to advantage.
Practical Issues in Temporal Difference Learning
 Machine Learning
, 1992
"... This paper examines whether temporal difference methods for training connectionist networks, such as Suttons's TD(lambda) algorithm can be successfully applied to complex realworld problems. A number of important practical issues are identified and discussed from a general theoretical perspective. ..."
Abstract

Cited by 368 (2 self)
 Add to MetaCart
This paper examines whether temporal difference methods for training connectionist networks, such as Suttons's TD(lambda) algorithm can be successfully applied to complex realworld problems. A number of important practical issues are identified and discussed from a general theoretical perspective. These practical issues are then examined in the context of a case study in which TD(lambda) is applied to learning the game of backgammon from the outcome of selfplay. This is apparently the first application of this algorithm to a complex nontrivial task. It is found that, with zero knowledge built in, the network is able to learn from scratch to play the entire game at a fairly strong intermediate level of performance which is clearly better than conventional commercial programs and which in fact surpasses comparable networks trained on a massive human expert data set. This indicates that TD learning may work better in practice than one would expect based on current theory, and it suggests that further analysis of TD methods, as well as applications in other complex domains may be worth investigating.
Machine discovery of effective admissible heuristics
 Machine Learning
, 1993
"... Abstract. Admissible heuristics are an important class of heuristics worth discovering: they guarantee shortest path solutions in search algorithms such as A * and they guarantee less expensively produced, but boundedly longer solutions in search algorithms such as dynamic weighting. Unfortunately, ..."
Abstract

Cited by 29 (0 self)
 Add to MetaCart
Abstract. Admissible heuristics are an important class of heuristics worth discovering: they guarantee shortest path solutions in search algorithms such as A * and they guarantee less expensively produced, but boundedly longer solutions in search algorithms such as dynamic weighting. Unfortunately, effective (accurate and cheap to compute) admissible heuristics can take years for people to discover. Several researchers have suggested that certain transformations of a problem can be used to generate admissible heuristics. This article defines a more general class of transformations, called abstractions, that are guaranteed to generate only admissible heuristics. It also describes and evaluates an implemented program (Absolver IO that uses a meansends analysis search control strategy to discover abstracted problems that result in effective admissible heuristics. Absolver I/discovered several wellknown and a few novel admissible heuristics, including the first known effective one for Rubik's Cube, thus concretely demonstrating that effective admissible heuristics can be tractably discovered by a machine.
Incremental Dynamic Programming for OnLine Adaptive Optimal Control
, 1994
"... Reinforcement learning algorithms based on the principles of Dynamic Programming (DP) have enjoyed a great deal of recent attention both empirically and theoretically. These algorithms have been referred to generically as Incremental Dynamic Programming (IDP) algorithms. IDP algorithms are intended ..."
Abstract

Cited by 20 (2 self)
 Add to MetaCart
Reinforcement learning algorithms based on the principles of Dynamic Programming (DP) have enjoyed a great deal of recent attention both empirically and theoretically. These algorithms have been referred to generically as Incremental Dynamic Programming (IDP) algorithms. IDP algorithms are intended for use in situations where the information or computational resources needed by traditional dynamic programming algorithms are not available. IDP algorithms attempt to find a global solution to a DP problem by incrementally improving local constraint satisfaction properties as experience is gained through interaction with the environment. This class of algorithms is not new, going back at least as far as Samuel's adaptive checkersplaying programs,...
Automated Learning of LoadBalancing Strategies For A Distributed Computer System
, 1992
"... (or derived) decision metrics are exemplified by MinLoad, which denotes the least among all the Load values. ###################################################################################### SENDERSIDE RULES (s) Possibledestinations = { site: Load(site)  Reference(s) < d(s) } Destination = ..."
Abstract

Cited by 17 (4 self)
 Add to MetaCart
(or derived) decision metrics are exemplified by MinLoad, which denotes the least among all the Load values. ###################################################################################### SENDERSIDE RULES (s) Possibledestinations = { site: Load(site)  Reference(s) < d(s) } Destination = Random(Possibledestinations) IF Load(s)  Reference(s) > q 1 (s) THEN Send RECEIVERSIDE RULES (r) IF Load(r) < q 2 (r) THEN Receive Figure 3. The loadbalancing policy considered in this thesis The senderside rules are applied by the loadbalancing software at the site of arrival (s) of a task. Reference can be either 0 or MinLoad; the other parameters  d, q 1 , and q 2  take nonnegative floatingpoint values. A remote destination (r) is chosen randomly from Destinations, a set of sites whose load index falls within a small neighborhood of Reference. If Destinations is the empty set, or if the rule for sending fails, then the task is executed locally at s, its site of arrival; ot...
Learning inadmissible heuristics during search
 In Proceedings of ICAPS11
, 2011
"... Suboptimal search algorithms offer shorter solving times by sacrificing guaranteed solution optimality. While optimal search algorithms like A * and IDA * require admissible heuristics, suboptimal search algorithms need not constrain their guidance in this way. Previous work has explored using offl ..."
Abstract

Cited by 12 (8 self)
 Add to MetaCart
Suboptimal search algorithms offer shorter solving times by sacrificing guaranteed solution optimality. While optimal search algorithms like A * and IDA * require admissible heuristics, suboptimal search algorithms need not constrain their guidance in this way. Previous work has explored using offline training to transform admissible heuristics into more effective inadmissible ones. In this paper we demonstrate that this transformation can be performed online, during search. In addition to not requiring training instances and extensive precomputation, an online approach allows the learned heuristic to be tailored to a specific problem instance. We evaluate our techniques in four different benchmark domains using both greedy bestfirst search and bounded suboptimal search. We find that heuristics learned online result in both faster search and better solutions while relying only on information readily available in any bestfirst search.
A PatternWeight Formulation of Search Knowledge
 Computational Intelligence
, 1994
"... this paper begins to address. ..."
Pruning Algorithms for Multimodel Adversary Search
 Artificial Intelligence
, 1998
"... The Multimodel search framework generalizes minimax to allow exploitation of recursive opponent models. In this work we consider adding pruning to the multimodel search. We prove a sufficient condition that enables pruning and describe two pruning algorithms, fffi and fffi 1p . We prove corr ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
The Multimodel search framework generalizes minimax to allow exploitation of recursive opponent models. In this work we consider adding pruning to the multimodel search. We prove a sufficient condition that enables pruning and describe two pruning algorithms, fffi and fffi 1p . We prove correctness and optimality of the algorithms and provide an experimental study of their pruning power. We show that for opponent models that are not radically different from the player's strategy, the pruning power of these algorithms is significant. 1 Introduction The minimax algorithm [21] and its fffi version [12] have served as the fundamental decision procedures for zerosum games since the early days of computer science. The basic assumption behind minimax is that the player has no knowledge about the opponent's decision procedure. In the absence of such knowledge, minimax assumes that the opponent selects an alternative which is the worst from the player's point of view. However, it is q...
Explorations of the Practical Issues of Learning PredictionControl Tasks Using Temporal Difference Learning Methods
 Master’s thesis, MIT
, 1992
"... There has been recent interest in using a class of incremental learning algorithms called temporal difference learning methods to attack problems of prediction. These algorithms have been brought to bear on various prediction problems in the past, but have remained poorly understood. It is the purpo ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
There has been recent interest in using a class of incremental learning algorithms called temporal difference learning methods to attack problems of prediction. These algorithms have been brought to bear on various prediction problems in the past, but have remained poorly understood. It is the purpose of this thesis to further explore this class of algorithms, particularly the TD (l ) algorithm. A number of practical issues are raised and discussed from a general theoretical perspective and then explored in the context of several case studies. The thesis presents a framework for viewing these algorithms independent of the particular task at hand and uses this framework to explore not only tasks of prediction, but also prediction tasks that require control, whether complete or partial. This includes applying the TD (l) algorithm to two tasks: 1) learning to play tictactoe from the outcome of selfplay and the outcome of play against a perfectlyplaying opponent and 2) learning two sim...