Evaluation functions are an essential component of practical search algorithms for optimization, planning and control. Examples of such algorithms include hillclimbing, simulated annealing, best-first search, A*, and alpha-beta. In all of these, the evaluation functions are typically built manually by domain experts, and may require considerable tweaking to work well. I will investigate the thesis that statistical machine learning can be used to automatically generate high-quality evaluation functions for practical combinatorial problems. The data for such learning is gathered by running trajectories through the search space. The learned evaluation function may be applied either to guide further exploration of the same space, or to improve performance in new problem spaces which share similar features. Two general families of learning algorithms apply here: reinforcement learning and meta-optimization. The reinforcement learning approach, dating back to Samuel's checkers player [ 1959 ...
|
6121
|
Introduction to Algorithms
– Cormen, Leiserson, et al.
- 2001
|
|
1487
|
Dynamic programming
– Bellman
- 1957
|
|
1127
|
Numerical Recipes In C: The Art of Scientific Computing
– Flannery
- 1992
|
|
968
|
Learning from delayed rewards
– Watkins
- 1989
|
|
931
|
Learning to predict by the methods of temporal differences
– Sutton
- 1988
|
|
562
|
Lebiere C: The cascade-correlation learning architecture
– Fahlman
- 1991
|
|
487
|
Some studies in machine learning using the game of checkers II: Recent progress
– Samuel
- 1967
|
|
374
|
Integrated architectures for learning, planning, and reacting based on approximating dynamic programming
– Sutton
- 1990
|
|
359
|
Dynamic Programming and Markov Processes
– Howard
- 1960
|
|
298
|
Practical issues in temporal difference learning
– Tesauro
- 1992
|
|
280
|
Simulated annealing : theory and applications
– Laarhoven, Aarts
- 1988
|
|
247
|
Learning in embedded systems
– Kaelbling
- 1993
|
|
236
|
An effective heuristic algorithm for the traveling salesman problem
– Lin, Kernighan
- 1973
|
|
223
|
Improving elevator performance using reinforcement learning
– Crites, Barto
- 1996
|
|
186
|
Generalization in reinforcement learning: Safely approximating the value function
– Boyan, Moore
- 1995
|
|
180
|
Learning and sequential decision making
– Barto, Sutton, et al.
- 1990
|
|
179
|
Reinforcement Learning for Robots Using Neural Networks
– Lin
- 1993
|
|
155
|
TD-Gammon, a self-teaching backgammon program, achieves master-level play
– Tesauro
- 1994
|
|
133
|
An analysis of temporal-difference learning with function approximation
– Tsitsiklis, Roy
- 1997
|
|
132
|
Stable function approximation in dynamic programming
– Gordon
- 1995
|
|
129
|
Bandit problems; Sequential Allocation of Experiments
– Berry, Fristed
- 1985
|
|
108
|
Real-time learning and control using asynchronous dynamic programming
– Barto, Bradtke, et al.
- 1991
|
|
103
|
Efficient training of artificial neural networks for autonomous navigation. Neural Comp 3(1):88–97
– Pomerleau
- 1991
|
|
88
|
A reinforcement learning approach to job-shop scheduling
– Zhang, Dietterich
- 1995
|
|
87
|
Scheduling and Rescheduling with Iterative Repair
– Zweben, Daun, et al.
- 1994
|
|
64
|
Coevolution of a backgammon player
– Pollack, Blair, et al.
- 1996
|
|
61
|
The convergence of TD( ) for general
– Dayan
- 1992
|
|
58
|
A parallel network that learns to play backgammon
– Tesauro, Sejnowski
- 1989
|
|
52
|
Issues in using function approximation for reinforcement learning
– Thrun, Schwartz
- 1993
|
|
52
|
Two kinds of training information for evaluation function learning
– Utgoff
- 1991
|
|
50
|
Simulated Annealing for VLSI Design
– Wong, Leong, et al.
- 1988
|
|
43
|
A pattern classification approach to evaluation function learning
– Lee, Mahajan
- 1988
|
|
36
|
Discovering complex othello strategies through evolutionary neural networks
– Moriarty, Miikkulainen
- 1995
|
|
34
|
A generalized reinforcement-learning model: Convergence and applications
– Littman, Szepesvári
- 1996
|
|
31
|
Modular neural networks for learning context-dependent game strategies
– Boyan
- 1992
|
|
30
|
Dynamic Programming: models and applications
– Denardo
- 1982
|
|
23
|
Memory-based stochastic optimization
– Moore, Schneider
- 1996
|
|
21
|
Learning evaluation functions for large acyclic domains
– Boyan, Moore
- 1996
|
|
19
|
Advantage updating applied to a differential game
– Baird, Harmon, et al.
- 1994
|
|
17
|
A counterexample for temporal differences learning
– Bertsekas
- 1995
|
|
16
|
Machine discovery of effective admissible heuristics
– Prieditis
- 1993
|
|
14
|
Reinforcement Learning for Job-Shop Scheduling
– Zhang
- 1996
|
|
11
|
Synthesis of High-Performance Analog Cells in ASTRX/OBLX
– Ochotta
- 1994
|
|
11
|
A simulated annealing-based approach to three-dimensional component packing
– Szykman, Cagan
- 1995
|
|
9
|
Implementation details of the TD() procedure for the case of vector predictions and backpropagation
– Sutton
- 1989
|
|
7
|
An Introduction to Artificial Intelligence: Can Computers Think
– Bellman
- 1978
|
|
7
|
An efficient lower bound algorithm for channel routing. Integration: The VLSI Journal
– Chao, Harper
- 1996
|
|
7
|
Automatic Device Placement for Analog Cells in KOAN
– Cohn
- 1992
|
|
4
|
A computational solution of the inverse problem in radiation-therapy treatment planning
– Censor, Altschuler, et al.
|
|
3
|
Genetic algorithms optimizing evaluation functions
– Tunstall-Pedoe
- 1991
|