Results 1 - 10
of
32
Practical Issues in Temporal Difference Learning
- Machine Learning
, 1992
"... This paper examines whether temporal difference methods for training connectionist networks, such as Suttons's TD(lambda) algorithm can be successfully applied to complex real-world problems. A number of important practical issues are identified and discussed from a general theoretical perspective. ..."
Abstract
-
Cited by 334 (2 self)
- Add to MetaCart
This paper examines whether temporal difference methods for training connectionist networks, such as Suttons's TD(lambda) algorithm can be successfully applied to complex real-world problems. A number of important practical issues are identified and discussed from a general theoretical perspective. These practical issues are then examined in the context of a case study in which TD(lambda) is applied to learning the game of backgammon from the outcome of self-play. This is apparently the first application of this algorithm to a complex nontrivial task. It is found that, with zero knowledge built in, the network is able to learn from scratch to play the entire game at a fairly strong intermediate level of performance which is clearly better than conventional commercial programs and which in fact surpasses comparable networks trained on a massive human expert data set. This indicates that TD learning may work better in practice than one would expect based on current theory, and it suggests that further analysis of TD methods, as well as applications in other complex domains may be worth investigating.
An empirical comparison of pattern recognition, neural nets, and machine learning classification methods
- In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence
, 1989
"... Classification methods from statistical pattern recognition, neural nets, and machine learning were applied to four real-world data sets. Each of these data sets has been previously analyzed and reported in the statistical, medical, or machine learning literature. The data sets are characterized by ..."
Abstract
-
Cited by 122 (2 self)
- Add to MetaCart
Classification methods from statistical pattern recognition, neural nets, and machine learning were applied to four real-world data sets. Each of these data sets has been previously analyzed and reported in the statistical, medical, or machine learning literature. The data sets are characterized by statisucal uncertainty; there is no completely accurate solution to these problems. Training and testing or resampling techniques are used to estimate the true error rates of the classification methods. Detailed attention is given to the analysis of performance of the neural nets using back propagation. For these problems, which have relatively few hypotheses and features, the machine learning procedures for rule induction or tree induction clearly performed best. 1
Two Kinds of Training Information for Evaluation Function Learning
- In Proceedings of the Ninth Annual Conference on Artificial Intelligence
, 1991
"... This paper identifies two fundamentally different kinds of training information for learning search control in terms of an evaluation function. Each kind of training information suggests its own set of methods for learning an evaluation function. The paper shows that one can integrate the methods an ..."
Abstract
-
Cited by 51 (3 self)
- Add to MetaCart
This paper identifies two fundamentally different kinds of training information for learning search control in terms of an evaluation function. Each kind of training information suggests its own set of methods for learning an evaluation function. The paper shows that one can integrate the methods and learn simultaneously from both kinds of information.
Automatic Feature Generation for Problem Solving Systems
- Proceedings of the 9th International Conference on Machine Learning
, 1992
"... Existing methods for constructive induction usually isolate feature generation from problem solving, and do not exploit information about the purpose for which features are created. This paper describes a theory of feature generation that creates features using both a domain theory and feedback from ..."
Abstract
-
Cited by 32 (0 self)
- Add to MetaCart
Existing methods for constructive induction usually isolate feature generation from problem solving, and do not exploit information about the purpose for which features are created. This paper describes a theory of feature generation that creates features using both a domain theory and feedback from a concept learner. An evaluation function can then be learned using these features that is able to direct a problem-solver. The theory has been implemented in a system called Zenith, which has been applied to two domains. Zenith is able to generate useful features for each domain, given only a domain theory and the ability to solve problems in the domain. Automatic Feature Generation for Problem Solving Systems 1 1 Introduction In his pioneering work in artificial intelligence, Arthur Samuel (1959) developed a program that was able to play the board game checkers. Samuel's program used a set of features to characterize board positions, and by adjusting the coefficients of these features ...
Modular Neural Networks for Learning Context-Dependent Game Strategies
- Master’s thesis, Computer Speech and Language Processing
, 1992
"... The method of temporal differences (TD) is a learning technique which specialises in predicting the likely outcome of a sequence over time. Examples of such sequences include speech frame vectors, whose outcome is a phoneme or word decision, and positions in a board game, whose outcome is a win/loss ..."
Abstract
-
Cited by 31 (3 self)
- Add to MetaCart
The method of temporal differences (TD) is a learning technique which specialises in predicting the likely outcome of a sequence over time. Examples of such sequences include speech frame vectors, whose outcome is a phoneme or word decision, and positions in a board game, whose outcome is a win/loss decision. Recent results by Tesauro in the domain of backgammon indicate that a neural network, trained by TD methods to evaluate positions generated by self-play, can reach an advanced level of backgammon skill. For my summer thesis project, I first implemented the TD/neural network learning algorithms and confirmed Tesauro's results, using the domains of tic-tac-toe and backgammon. Then, motivated by Waibel's success with modular neural networks for phoneme recognition, I experimented with using two modular architectures (DDD and Meta-Pi) in place of the monolithic networks. I found that using the modular networks significantly enhanced the ability of the backgammon evaluator to change it...
Experiments with Multi-ProbCut and a New High-Quality Evaluation Function for Othello
- Games in AI Research
, 1997
"... This paper presents ideas concerning game--tree evaluation that recently improved the author's strong Othello program LOGISTELLO considerably. ..."
Abstract
-
Cited by 23 (4 self)
- Add to MetaCart
This paper presents ideas concerning game--tree evaluation that recently improved the author's strong Othello program LOGISTELLO considerably.
Search and Planning under Incomplete Information - A Study using Bridge Card Play
, 1996
"... This thesis investigates problem-solving in domains featuring incomplete information and multiple agents with opposing goals. In particular, we describe Finesse --- a system that forms plans for the problem of declarer play in the game of Bridge. We begin by examining the problem of search. We form ..."
Abstract
-
Cited by 23 (1 self)
- Add to MetaCart
This thesis investigates problem-solving in domains featuring incomplete information and multiple agents with opposing goals. In particular, we describe Finesse --- a system that forms plans for the problem of declarer play in the game of Bridge. We begin by examining the problem of search. We formalise a best defence model of incomplete information games in which equilibrium point strategies can be identified, and identify two specific problems that can affect algorithms in such domains. In Bridge, we show that the best defence model corresponds to the typical model analysed in expert texts, and examine search algorithms which overcome the problems we have identified. Next, we look at how planning algorithms can be made to cope with the difficulties of such domains. This calls for the development of new techniques for representing uncertainty and actions with disjunctive effects, for coping with an opposition, and for reasoning about compound actions. We tackle these problems with a...
Automated Learning of Load-Balancing Strategies For A Distributed Computer System
, 1992
"... (or derived) decision metrics are exemplified by MinLoad, which denotes the least among all the Load values. ###################################################################################### SENDER-SIDE RULES (s) Possible-destinations = { site: Load(site) - Reference(s) < d(s) } Destination = ..."
Abstract
-
Cited by 17 (4 self)
- Add to MetaCart
(or derived) decision metrics are exemplified by MinLoad, which denotes the least among all the Load values. ###################################################################################### SENDER-SIDE RULES (s) Possible-destinations = { site: Load(site) - Reference(s) < d(s) } Destination = Random(Possible-destinations) IF Load(s) - Reference(s) > q 1 (s) THEN Send RECEIVER-SIDE RULES (r) IF Load(r) < q 2 (r) THEN Receive Figure 3. The load-balancing policy considered in this thesis The sender-side rules are applied by the load-balancing software at the site of arrival (s) of a task. Reference can be either 0 or MinLoad; the other parameters --- d, q 1 , and q 2 --- take non-negative floating-point values. A remote destination (r) is chosen randomly from Destinations, a set of sites whose load index falls within a small neighborhood of Reference. If Destinations is the empty set, or if the rule for sending fails, then the task is executed locally at s, its site of arrival; ot...
A Strategic Metagame Player for General Chess-Like Games
- Computational Intelligence
, 1994
"... This paper reviews the concept of Metagame and discusses the implementation of metagamer, a program which plays Metagame in the class of symmetric chess-like games, which includes chess, Chinese-chess, checkers, draughts, and shoji. The program takes as input just the rules of any game in this clas ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
This paper reviews the concept of Metagame and discusses the implementation of metagamer, a program which plays Metagame in the class of symmetric chess-like games, which includes chess, Chinese-chess, checkers, draughts, and shoji. The program takes as input just the rules of any game in this class, including games unknown to its programmer, and plays the game against opponents without further human intervention. Using an evaluation function for the entire class of games, the program applies more general knowledge to each new game to produce a game-specific analysis. This allows metagamer to compete reasonably well against existing game-specific opponents in both chess and checkers, in some cases "re-discovering" strategies which are well-known to human experts. The next major test will be to play metagamer against a variety of opponents on a set of generated games unknown to its designer in advance of the competition. 1 The Problem Virtually all past research in computer game-playi...
Statistical feature combination for the evaluation of game positions
- Journal of Artificial Intelligence Research
, 1995
"... This article describes an application of three well{known statistical methodsinthe eld of game{tree search: using a large number of classi ed Othello positions, feature weights for evaluation functions with a game{phase{independent meaning are estimated by means of logistic regression, Fisher's line ..."
Abstract
-
Cited by 12 (5 self)
- Add to MetaCart
This article describes an application of three well{known statistical methodsinthe eld of game{tree search: using a large number of classi ed Othello positions, feature weights for evaluation functions with a game{phase{independent meaning are estimated by means of logistic regression, Fisher's linear discriminant, and the quadratic discriminantfunction for normally distributed features. Thereafter, the playing strengths are compared by means of tournaments between the resulting versions of a world{class Othello program. In this application, logistic regression | which is used here for the rst timeinthecontext of game playing | leads to better results than the other approaches. 1.

