MetaCart Sign in to MyCiteSeerX

Include Citations | Advanced Search | Help

Disambiguated Search | Include Citations | Advanced Search | Help

Learning Evaluation Functions (1996) [2 citations — 0 self]

by Justin A. Boyan ,  Scott E. Fahlman ,  Tom Mitchell
CMU CS Thesis Proposal
Add To MetaCart

Abstract:

Evaluation functions are an essential component of practical search algorithms for optimization, planning and control. Examples of such algorithms include hillclimbing, simulated annealing, best-first search, A*, and alpha-beta. In all of these, the evaluation functions are typically built manually by domain experts, and may require considerable tweaking to work well. I will investigate the thesis that statistical machine learning can be used to automatically generate high-quality evaluation functions for practical combinatorial problems. The data for such learning is gathered by running trajectories through the search space. The learned evaluation function may be applied either to guide further exploration of the same space, or to improve performance in new problem spaces which share similar features. Two general families of learning algorithms apply here: reinforcement learning and meta-optimization. The reinforcement learning approach, dating back to Samuel's checkers player [ 1959 ...

Citations

6121 Introduction to Algorithms – Cormen, Leiserson, et al. - 2001
1487 Dynamic programming – Bellman - 1957
1127 Numerical Recipes In C: The Art of Scientific Computing – Flannery - 1992
968 Learning from delayed rewards – Watkins - 1989
931 Learning to predict by the methods of temporal differences – Sutton - 1988
562 Lebiere C: The cascade-correlation learning architecture – Fahlman - 1991
487 Some studies in machine learning using the game of checkers II: Recent progress – Samuel - 1967
374 Integrated architectures for learning, planning, and reacting based on approximating dynamic programming – Sutton - 1990
359 Dynamic Programming and Markov Processes – Howard - 1960
298 Practical issues in temporal difference learning – Tesauro - 1992
280 Simulated annealing : theory and applications – Laarhoven, Aarts - 1988
247 Learning in embedded systems – Kaelbling - 1993
236 An effective heuristic algorithm for the traveling salesman problem – Lin, Kernighan - 1973
223 Improving elevator performance using reinforcement learning – Crites, Barto - 1996
186 Generalization in reinforcement learning: Safely approximating the value function – Boyan, Moore - 1995
180 Learning and sequential decision making – Barto, Sutton, et al. - 1990
179 Reinforcement Learning for Robots Using Neural Networks – Lin - 1993
155 TD-Gammon, a self-teaching backgammon program, achieves master-level play – Tesauro - 1994
133 An analysis of temporal-difference learning with function approximation – Tsitsiklis, Roy - 1997
132 Stable function approximation in dynamic programming – Gordon - 1995
129 Bandit problems; Sequential Allocation of Experiments – Berry, Fristed - 1985
108 Real-time learning and control using asynchronous dynamic programming – Barto, Bradtke, et al. - 1991
103 Efficient training of artificial neural networks for autonomous navigation. Neural Comp 3(1):88–97 – Pomerleau - 1991
88 A reinforcement learning approach to job-shop scheduling – Zhang, Dietterich - 1995
87 Scheduling and Rescheduling with Iterative Repair – Zweben, Daun, et al. - 1994
64 Coevolution of a backgammon player – Pollack, Blair, et al. - 1996
61 The convergence of TD( ) for general – Dayan - 1992
58 A parallel network that learns to play backgammon – Tesauro, Sejnowski - 1989
52 Issues in using function approximation for reinforcement learning – Thrun, Schwartz - 1993
52 Two kinds of training information for evaluation function learning – Utgoff - 1991
50 Simulated Annealing for VLSI Design – Wong, Leong, et al. - 1988
43 A pattern classification approach to evaluation function learning – Lee, Mahajan - 1988
36 Discovering complex othello strategies through evolutionary neural networks – Moriarty, Miikkulainen - 1995
34 A generalized reinforcement-learning model: Convergence and applications – Littman, Szepesvári - 1996
31 Modular neural networks for learning context-dependent game strategies – Boyan - 1992
30 Dynamic Programming: models and applications – Denardo - 1982
23 Memory-based stochastic optimization – Moore, Schneider - 1996
21 Learning evaluation functions for large acyclic domains – Boyan, Moore - 1996
19 Advantage updating applied to a differential game – Baird, Harmon, et al. - 1994
17 A counterexample for temporal differences learning – Bertsekas - 1995
16 Machine discovery of effective admissible heuristics – Prieditis - 1993
14 Reinforcement Learning for Job-Shop Scheduling – Zhang - 1996
11 Synthesis of High-Performance Analog Cells in ASTRX/OBLX – Ochotta - 1994
11 A simulated annealing-based approach to three-dimensional component packing – Szykman, Cagan - 1995
9 Implementation details of the TD() procedure for the case of vector predictions and backpropagation – Sutton - 1989
7 An Introduction to Artificial Intelligence: Can Computers Think – Bellman - 1978
7 An efficient lower bound algorithm for channel routing. Integration: The VLSI Journal – Chao, Harper - 1996
7 Automatic Device Placement for Analog Cells in KOAN – Cohn - 1992
4 A computational solution of the inverse problem in radiation-therapy treatment planning – Censor, Altschuler, et al.
3 Genetic algorithms optimizing evaluation functions – Tunstall-Pedoe - 1991