## Efficient selectivity and backup operators in Monte-Carlo tree search (2006)

Venue: | In: Proceedings Computers and Games 2006 |

Citations: | 112 - 2 self |

### BibTeX

@INPROCEEDINGS{Coulom06efficientselectivity,

author = {Rémi Coulom},

title = {Efficient selectivity and backup operators in Monte-Carlo tree search},

booktitle = {In: Proceedings Computers and Games 2006},

year = {2006},

publisher = {Springer-Verlag}

}

### Years of Citing Articles

### OpenURL

### Abstract

Abstract. Monte-Carlo evaluation consists in estimating a position by averaging the outcome of several random continuations, and can serve as an evaluation function at the leaves of a min-max tree. This paper presents a new framework to combine tree search with Monte-Carlo evaluation, that does not separate between a min-max phase and a Monte-Carlo phase. Instead of backing-up the min-max value close to the root, and the average value at some depth, a more general backup operator is defined that progressively changes from averaging to min-max as the number of simulations grows. This approach provides a fine-grained control of the tree growth, at the level of individual simulations, and allows efficient selectivity methods. This algorithm was implemented in a 9 × 9 Go-playing program, Crazy Stone, that won the 10th KGS computer-Go tournament. 1

### Citations

3763 | Reinforcement Learning: An Introduction
- Sutton, Barto
- 1998
(Show Context)
Citation Context ... go to zero. Ideas for this kind of algorithm can be found in two fields of research: n-armed bandit problems, and discrete stochastic optimization. n-armed bandit techniques (Sutton and Barto’s book =-=[25]-=- provides an introduction) are the basis for the MonteCarlo tree search algorithm of Chang, Fu and Marcus [12]. Optimal schemes for the allocation of simulations in discrete stochastic optimization [1... |

1225 | Learning to predict by the method of temporal differences
- Sutton
- 1988
(Show Context)
Citation Context ...so refute other moves of a node. Since formal methods seem difficult to apply, the backup operator of Crazy Stone was determined empirically, by an algorithm similar to the temporal difference method =-=[24]-=-. In the beginning, the backup method for internal nodes was the external-node method. 1,500 positions were sampled at random from ,s6 R. Coulom self-play games. For each of these 1,500 positions, the... |

261 |
An analysis of alpha-beta pruning
- Knuth, Moore
- 1975
(Show Context)
Citation Context ...1 Introduction When writing a program to play a two-person zero-sum game with perfect information, the traditional approach consists in combining alpha-beta search with a heuristic position evaluator =-=[20]-=-. The heuristic evaluator is based on domainspecific knowledge, and provides values at the leaves of the search tree. This technique has been very successful for games such as chess, draughts, checker... |

170 | A sparse sampling algorithm for near-optimal planning in large Markov decision processes
- Kearns, Mansour, et al.
- 2002
(Show Context)
Citation Context ...tion inaccuracies. Other algorithms with better asymptotic properties (given enough time and memory, they will find an optimal action) have been proposed in the formalism of Markov decision processes =-=[12, 19, 22]-=-. This paper presents a new algorithm for combining Monte-Carlo evaluation with tree search. Its basic structure is described in Section 2. Its selectivity and backup operators are presented in the fo... |

78 | Computer go: an AI oriented survey
- Bouzy, Cazenave
(Show Context)
Citation Context ...for many games, it has failed for the game of Go. Experienced human Go players still easily outplay the best programs. So, the game of Go remains an open challenge for artificialintelligence research =-=[8]-=-. Among the main difficulties in writing a Go-playing program is the creation of an accurate static position evaluator [15, 8]. When played on a 9 × 9 grid, the complexity of the game of Go, in terms ... |

74 | Steps toward an expert-level bridge-playing program, in: T. Dean (Ed
- Ginsberg, GIB
- 1999
(Show Context)
Citation Context ...ns is Monte-Carlo evaluation. Monte-Carlo evaluation consists in averagings2 R. Coulom the outcome of several continuations. It is an usual technique in games with randomness or partial observability =-=[5, 23, 26, 14, 17]-=-, but can also be applied to deterministic games, by choosing actions at random until a terminal state is reached [1, 9, 10]. The accuracy of Monte-Carlo evaluation can be improved with tree search. J... |

65 | Searching for solutions in games and artificial intelligence. (Doctoral dissertation
- Allis
- 1994
(Show Context)
Citation Context ...ccurate static position evaluator [15, 8]. When played on a 9 × 9 grid, the complexity of the game of Go, in terms of the number of legal positions, is inferior to the complexity of the game of chess =-=[2, 27]-=-, and the number of legal moves per position is similar. Nevertheless, chess-programming techniques fail to produce a player stronger than experienced humans. One reason is that tree search cannot be ... |

56 | Simulation Budget Allocation for Further Enhancing the Efficiency of Ordinal Optimization
- Chen, W, et al.
(Show Context)
Citation Context ...5] provides an introduction) are the basis for the MonteCarlo tree search algorithm of Chang, Fu and Marcus [12]. Optimal schemes for the allocation of simulations in discrete stochastic optimization =-=[13, 16, 3]-=-, could also be applied to Monte-Carlo tree search. Although they provide interesting sources of inspiration, the theoretical frameworks of n-armed bandit problems and discrete stochastic optimization... |

43 |
Searching with Probabilities
- Palay
- 1985
(Show Context)
Citation Context ...he maximum evaluation overestimates the best move, and generates a lot of instability in the search. Other candidates for a backup method would be algorithms that operate on probability distributions =-=[21, 4]-=-. The weakness of these methods is that they have to assume some degree of independence between probability distributions. This assumption of independence is wrong in the case of Monte-Carlo evaluatio... |

41 | Programming backgammon using self-teaching neural nets
- Tesauro
- 2002
(Show Context)
Citation Context ...ns is Monte-Carlo evaluation. Monte-Carlo evaluation consists in averagings2 R. Coulom the outcome of several continuations. It is an usual technique in games with randomness or partial observability =-=[5, 23, 26, 14, 17]-=-, but can also be applied to deterministic games, by choosing actions at random until a terminal state is reached [1, 9, 10]. The accuracy of Monte-Carlo evaluation can be improved with tree search. J... |

39 |
Expected-outcome: a general model of static evaluation
- Abramson
- 1990
(Show Context)
Citation Context ... usual technique in games with randomness or partial observability [5, 23, 26, 14, 17], but can also be applied to deterministic games, by choosing actions at random until a terminal state is reached =-=[1, 9, 10]-=-. The accuracy of Monte-Carlo evaluation can be improved with tree search. Juillé [18] proposed a selective Monte-Carlo algorithm for single-agent deterministic problems, and applied it successfully t... |

35 | A Bayesian approach to relevance in game playing
- Baum, Smith
- 1997
(Show Context)
Citation Context ...he maximum evaluation overestimates the best move, and generates a lot of instability in the search. Other candidates for a backup method would be algorithms that operate on probability distributions =-=[21, 4]-=-. The weakness of these methods is that they have to assume some degree of independence between probability distributions. This assumption of independence is wrong in the case of Monte-Carlo evaluatio... |

23 | An adaptive sampling algorithm for solving Markov decision processes”. Operations Research 53:126–139
- Chang, Fu, et al.
- 2005
(Show Context)
Citation Context ...tion inaccuracies. Other algorithms with better asymptotic properties (given enough time and memory, they will find an optimal action) have been proposed in the formalism of Markov decision processes =-=[12, 19, 22]-=-. This paper presents a new algorithm for combining Monte-Carlo evaluation with tree search. Its basic structure is described in Section 2. Its selectivity and backup operators are presented in the fo... |

21 |
Monte Carlo Go developments
- Bouzy, Helmstetter
- 2004
(Show Context)
Citation Context ... usual technique in games with randomness or partial observability [5, 23, 26, 14, 17], but can also be applied to deterministic games, by choosing actions at random until a terminal state is reached =-=[1, 9, 10]-=-. The accuracy of Monte-Carlo evaluation can be improved with tree search. Juillé [18] proposed a selective Monte-Carlo algorithm for single-agent deterministic problems, and applied it successfully t... |

19 | Methods for Statistical Inference: Extending the Evolutionary Computation Paradigm
- Juillé
- 1999
(Show Context)
Citation Context ...can also be applied to deterministic games, by choosing actions at random until a terminal state is reached [1, 9, 10]. The accuracy of Monte-Carlo evaluation can be improved with tree search. Juillé =-=[18]-=- proposed a selective Monte-Carlo algorithm for single-agent deterministic problems, and applied it successfully to grammar induction, sorting-network optimization and a solitaire game. Bouzy [6] also... |

17 | Evaluation in Go by a Neural Network Using Soft Segmentation
- Enzenberger
- 2003
(Show Context)
Citation Context ... the game of Go remains an open challenge for artificialintelligence research [8]. Among the main difficulties in writing a Go-playing program is the creation of an accurate static position evaluator =-=[15, 8]-=-. When played on a 9 × 9 grid, the complexity of the game of Go, in terms of the number of legal positions, is inferior to the complexity of the game of chess [2, 27], and the number of legal moves pe... |

14 | Using selective-sampling simulations in poker, in
- Billings, Papp, et al.
- 1999
(Show Context)
Citation Context ...ns is Monte-Carlo evaluation. Monte-Carlo evaluation consists in averagings2 R. Coulom the outcome of several continuations. It is an usual technique in games with randomness or partial observability =-=[5, 23, 26, 14, 17]-=-, but can also be applied to deterministic games, by choosing actions at random until a terminal state is reached [1, 9, 10]. The accuracy of Monte-Carlo evaluation can be improved with tree search. J... |

10 | Associating shallow and selective global tree search with Monte Carlo for 9x9 Go
- Bouzy
- 2004
(Show Context)
Citation Context ...llé [18] proposed a selective Monte-Carlo algorithm for single-agent deterministic problems, and applied it successfully to grammar induction, sorting-network optimization and a solitaire game. Bouzy =-=[6]-=- also applied a similar method to 9 ×9 Go. The algorithms of Juillé and Bouzy grow a tree by iterative deepening, and prune it by keeping only the best-looking moves after each iteration. A problem wi... |

9 |
Efficient Control for Selective Simulations
- Sheppard
(Show Context)
Citation Context |

7 | Combining tactical search and MonteCarlo in the game of Go
- Cazenave, Helmstetter
- 2005
(Show Context)
Citation Context ...er boards. For 19x19, an approach based on a global tree search does not seem reasonable. Generalizing the tree search with high-level tactical objectives such as Cazenave and Helmstetter’s algorithm =-=[11]-=- might be an interesting solution. Acknowledgements I thank Bruno Bouzy and Guillaume Chaslot, for introducing me to Monte-Carlo Go. A lot of the inspiration for the research presented in this paper c... |

5 |
On-line search for solving large Markov decision processes
- Péret, Garcia
- 2004
(Show Context)
Citation Context ...tion inaccuracies. Other algorithms with better asymptotic properties (given enough time and memory, they will find an optimal action) have been proposed in the formalism of Markov decision processes =-=[12, 19, 22]-=-. This paper presents a new algorithm for combining Monte-Carlo evaluation with tree search. Its basic structure is described in Section 2. Its selectivity and backup operators are presented in the fo... |

4 | Move Pruning Techniques for Monte-Carlo Go
- Bouzy
- 2005
(Show Context)
Citation Context ...compare the expected values of many random variables, this theorem allows to compute a probability that the expected value of one variable is larger than the expected value of another variable. Bouzy =-=[9, 7]-=- used this principle to propose progressive pruning. Progressive pruning cuts off moves whose probability of being best according to the distribution of the central-limit theorem falls below some thre... |

2 |
Monte-Carlo planning in RTS games
- Chung, Buro, et al.
- 2005
(Show Context)
Citation Context |

2 |
Optimal allocation of simulation experiments in discrete stochastic optimization and approximative algorithms
- Futschik, Pflug
- 1997
(Show Context)
Citation Context ...5] provides an introduction) are the basis for the MonteCarlo tree search algorithm of Chang, Fu and Marcus [12]. Optimal schemes for the allocation of simulations in discrete stochastic optimization =-=[13, 16, 3]-=-, could also be applied to Monte-Carlo tree search. Although they provide interesting sources of inspiration, the theoretical frameworks of n-armed bandit problems and discrete stochastic optimization... |

2 |
and Gunnar Farnebäck. Combinatorics of Go
- Tromp
- 2006
(Show Context)
Citation Context ...ccurate static position evaluator [15, 8]. When played on a 9 × 9 grid, the complexity of the game of Go, in terms of the number of legal positions, is inferior to the complexity of the game of chess =-=[2, 27]-=-, and the number of legal moves per position is similar. Nevertheless, chess-programming techniques fail to produce a player stronger than experienced humans. One reason is that tree search cannot be ... |

1 |
Alrefaei and Sigrún Andradóttir. A simulated annealing algorithm with constant temperature for discrete stochastic optimization
- Mahmoud
- 1999
(Show Context)
Citation Context ...5] provides an introduction) are the basis for the MonteCarlo tree search algorithm of Chang, Fu and Marcus [12]. Optimal schemes for the allocation of simulations in discrete stochastic optimization =-=[13, 16, 3]-=-, could also be applied to Monte-Carlo tree search. Although they provide interesting sources of inspiration, the theoretical frameworks of n-armed bandit problems and discrete stochastic optimization... |

1 |
Computer Go tournaments on KGS. http://www.weddslist.com/ kgs/, 2005. A Random Simulations in Crazy Stone The most basic method to perform random simulations in computer-Go consists in selecting legal moves uniformly at random, with the exception of eye-f
- Wedd
(Show Context)
Citation Context ...ssible to find better methods. 5 Game Results As indicated in the abstract, Crazy Stone won the 10th KGS computer-Go tournament, ahead of 8 participants, including GNU Go, Neuro Go, Viking 5, and Aya =-=[28]-=-. This is a spectacular result, but this was only a 6-round tournament, and luck was probably one of the main factor in this victory. In order to test the strength of Crazy Stone more accurately, 100-... |