Results 1 -
4 of
4
Bandit based Monte-Carlo Planning
- In: ECML-06. Number 4212 in LNCS
, 2006
"... Abstract. For large state-space Markovian Decision Problems Monte-Carlo planning is one of the few viable approaches to find near-optimal solutions. In this paper we introduce a new algorithm, UCT, that applies bandit ideas to guide Monte-Carlo planning. In finite-horizon or discounted MDPs the algo ..."
Abstract
-
Cited by 111 (4 self)
- Add to MetaCart
Abstract. For large state-space Markovian Decision Problems Monte-Carlo planning is one of the few viable approaches to find near-optimal solutions. In this paper we introduce a new algorithm, UCT, that applies bandit ideas to guide Monte-Carlo planning. In finite-horizon or discounted MDPs the algorithm is shown to be consistent and finite sample bounds are derived on the estimation error due to sampling. Experimental results show that in several domains, UCT is significantly more efficient than its alternatives. 1
Fast Reachability Analysis for Uncertain SSPs
, 2005
"... Stochastic Shortest Path problems (SSPs) can be efficiently dealt with by the Real-Time Dynamic Programming algorithm (RTDP). Yet, RTDP requires that a goal state is always reachable, what can be checked easily for a certain SSP, and with a more complex algorithm for an uncertain SSP, i.e. where onl ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Stochastic Shortest Path problems (SSPs) can be efficiently dealt with by the Real-Time Dynamic Programming algorithm (RTDP). Yet, RTDP requires that a goal state is always reachable, what can be checked easily for a certain SSP, and with a more complex algorithm for an uncertain SSP, i.e. where only a possible interval is known for each transition probability. This paper makes a simplified description of these two processes, and demonstrates how the time consuming uncertain analysis can be dramatically speeded up. The main improvement still needed is to turn to a symbolic analysis in order to avoid a complete state-space enumeration.
Received (Day Month Year) Revised (Day Month Year)
"... Stochastic Shortest Path problems (SSPs) can be efficiently dealt with by the Real-Time Dynamic Programming algorithm (RTDP). Yet, RTDP requires that a goal state is always reachable. This article presents an algorithm checking for goal reachability, especially in the complex case of an uncertain SS ..."
Abstract
- Add to MetaCart
Stochastic Shortest Path problems (SSPs) can be efficiently dealt with by the Real-Time Dynamic Programming algorithm (RTDP). Yet, RTDP requires that a goal state is always reachable. This article presents an algorithm checking for goal reachability, especially in the complex case of an uncertain SSP where only a possible interval is known for each transition probability. This gives an analysis method for determining if SSP algorithms such as RTDP are applicable, even if the exact model is not known. As this is a time-consuming algorithm, we also present a simple process that often speeds it up dramatically. Yet, the main improvement still needed is to turn to a symbolic analysis in order to avoid a complete state-space enumeration.
Fast Reachability Analysis for Uncertain SSPs Olivier Buffet
"... Stochastic Shortest Path problems (SSPs) can be efficiently dealt with by the Real-Time Dynamic Programming algorithm (RTDP). Yet, RTDP requires that a goal state is always reachable, what can be checked easily for a certain SSP, and with a more complex algorithm for an uncertain SSP, i.e. where onl ..."
Abstract
- Add to MetaCart
Stochastic Shortest Path problems (SSPs) can be efficiently dealt with by the Real-Time Dynamic Programming algorithm (RTDP). Yet, RTDP requires that a goal state is always reachable, what can be checked easily for a certain SSP, and with a more complex algorithm for an uncertain SSP, i.e. where only a possible interval is known for each transition probability. This paper makes a simplified description of these two processes, and demonstrates how the time consuming uncertain analysis can be dramatically speeded up. The main improvement still needed is to turn to a symbolic analysis in order to avoid a complete state-space enumeration. 1

