Results 1  10
of
11
Bandit based MonteCarlo Planning
 In: ECML06. Number 4212 in LNCS
, 2006
"... Abstract. For large statespace Markovian Decision Problems MonteCarlo planning is one of the few viable approaches to find nearoptimal solutions. In this paper we introduce a new algorithm, UCT, that applies bandit ideas to guide MonteCarlo planning. In finitehorizon or discounted MDPs the algo ..."
Abstract

Cited by 217 (6 self)
 Add to MetaCart
Abstract. For large statespace Markovian Decision Problems MonteCarlo planning is one of the few viable approaches to find nearoptimal solutions. In this paper we introduce a new algorithm, UCT, that applies bandit ideas to guide MonteCarlo planning. In finitehorizon or discounted MDPs the algorithm is shown to be consistent and finite sample bounds are derived on the estimation error due to sampling. Experimental results show that in several domains, UCT is significantly more efficient than its alternatives. 1
Bayesian sparse sampling for online reward optimization
 In ICML ’05: Proceedings of the 22nd international conference on Machine learning
, 2005
"... We present an efficient “sparse sampling ” technique for approximating Bayes optimal decision making in reinforcement learning, addressing the well known exploration versus exploitation tradeoff. Our approach combines sparse sampling with Bayesian exploration to achieve improved decision making whil ..."
Abstract

Cited by 36 (5 self)
 Add to MetaCart
We present an efficient “sparse sampling ” technique for approximating Bayes optimal decision making in reinforcement learning, addressing the well known exploration versus exploitation tradeoff. Our approach combines sparse sampling with Bayesian exploration to achieve improved decision making while controlling computational cost. The idea is to grow a sparse lookahead tree, intelligently, by exploiting information in a Bayesian posterior—rather than enumerate action branches (standard sparse sampling) or compensate myopically (value of perfect information). The outcome is a flexible, practical technique for improving action selection in simple reinforcement learning scenarios. 1.
Développement autonome des comportements de base d’un agent
 in « Conférence d’apprentissage 2004  CAp’04
, 2004
"... RÉSUMÉ. La problématique abordée dans cet article est celle de la conception automatique d’agents autonomes devant résoudre des tâches complexes mettant en œuvre plusieurs objectifs potentiellement concurrents. Nous proposons alors une approche modulaire s’appuyant sur les principes de la sélection ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
RÉSUMÉ. La problématique abordée dans cet article est celle de la conception automatique d’agents autonomes devant résoudre des tâches complexes mettant en œuvre plusieurs objectifs potentiellement concurrents. Nous proposons alors une approche modulaire s’appuyant sur les principes de la sélection d’action où les actions recommandées par plusieurs comportements de base sont combinées en une décision globale. Dans ce cadre, notre principale contribution est une méthode pour qu’un agent puisse définir et construire automatiquement les comportements de base dont il a besoin via des méthodes d’apprentissage par renforcement incrémentales. Nous obtenons ainsi une architecture très autonome ne nécessitant que peu de réglages. Cette approche est testée et discutée sur un problème représentatif issu du "monde des tuiles". ABSTRACT. The problem addressed in this article is that of automatically designing autonomous agents having to solve complex tasks involving several –and possibly concurrent – objectives. We propose a modular approach based on the principles of action selection where the actions recommanded by several basic behaviors are combined in a global decision. In this framework, our main contribution is a method making an agent able to automatically define and build the basic behaviors it needs through incremental reinforcement learning methods. This way, we obtain a very autonomous architecture requiring very few handcoding. This approach is tested and discussed on a representative problem taken from the “tileworld”. MOTSCLÉS: problèmes de décision markoviens, apprentissage par renforcement, motivations multiples.
2.1 Stochastic Shortest Path Problems................... 2
, 2004
"... Stochastic Shortest Path problems (SSPs), a subclass of Markov Decision Problems (MDPs), can be efficiently dealt with using RealTime Dynamic Programming (RTDP). Yet, MDP models are often uncertain (obtained through statistics or guessing). The usual approach is robust planning: searching for the b ..."
Abstract
 Add to MetaCart
Stochastic Shortest Path problems (SSPs), a subclass of Markov Decision Problems (MDPs), can be efficiently dealt with using RealTime Dynamic Programming (RTDP). Yet, MDP models are often uncertain (obtained through statistics or guessing). The usual approach is robust planning: searching for the best policy under the worst model. This paper shows how RTDP can be made robust in the common case where transition probabilities are known to lie in a given interval.
Received (Day Month Year) Revised (Day Month Year)
"... Stochastic Shortest Path problems (SSPs) can be efficiently dealt with by the RealTime Dynamic Programming algorithm (RTDP). Yet, RTDP requires that a goal state is always reachable. This article presents an algorithm checking for goal reachability, especially in the complex case of an uncertain SS ..."
Abstract
 Add to MetaCart
Stochastic Shortest Path problems (SSPs) can be efficiently dealt with by the RealTime Dynamic Programming algorithm (RTDP). Yet, RTDP requires that a goal state is always reachable. This article presents an algorithm checking for goal reachability, especially in the complex case of an uncertain SSP where only a possible interval is known for each transition probability. This gives an analysis method for determining if SSP algorithms such as RTDP are applicable, even if the exact model is not known. As this is a timeconsuming algorithm, we also present a simple process that often speeds it up dramatically. Yet, the main improvement still needed is to turn to a symbolic analysis in order to avoid a complete statespace enumeration.
2.1 Stochastic Shortest Path Problems................... 2
, 2004
"... Stochastic Shortest Path problems (SSPs) can be efficiently dealt with by the RealTime Dynamic Programming algorithm (RTDP). Yet, RTDP requires that a goal state is always reachable. This paper presents an algorithm checking for goal reachability, especially in the complex case of an uncertain SSP ..."
Abstract
 Add to MetaCart
Stochastic Shortest Path problems (SSPs) can be efficiently dealt with by the RealTime Dynamic Programming algorithm (RTDP). Yet, RTDP requires that a goal state is always reachable. This paper presents an algorithm checking for goal reachability, especially in the complex case of an uncertain SSP where only a possible interval is known for each transition probability. This gives an analysis method for determining if SSP algorithms such as RTDP are applicable, even if the exact model is not known. We aim at a symbolic analysis in order to avoid a complete statespace enumeration.
2.1 Stochastic Shortest Path Problems................... 2
, 2004
"... Stochastic Shortest Path problems (SSPs) can be efficiently dealt with by the RealTime Dynamic Programming algorithm (RTDP). Yet, RTDP requires that a goal state is always reachable. This paper presents an algorithm checking for goal reachability, especially in the complex case of an uncertain SSP ..."
Abstract
 Add to MetaCart
Stochastic Shortest Path problems (SSPs) can be efficiently dealt with by the RealTime Dynamic Programming algorithm (RTDP). Yet, RTDP requires that a goal state is always reachable. This paper presents an algorithm checking for goal reachability, especially in the complex case of an uncertain SSP where only a possible interval is known for each transition probability. This gives an analysis method for determining if SSP algorithms such as RTDP are applicable, even if the exact model is not known. We aim at a symbolic analysis in order to avoid a complete statespace enumeration.
Conservation decisionmaking in large state spaces
"... Abstract: For metapopulation management problems with small state spaces, it is typically possible to model the problem as a Markov decision process (MDP), and find an optimal control policy using stochastic dynamic programming (SDP). SDP is an iterative procedure that seeks to optimise a value func ..."
Abstract
 Add to MetaCart
Abstract: For metapopulation management problems with small state spaces, it is typically possible to model the problem as a Markov decision process (MDP), and find an optimal control policy using stochastic dynamic programming (SDP). SDP is an iterative procedure that seeks to optimise a value function at each timestep by trying each of the actions defined in the MDP. Although SDP gives the optimal solution to conservation management questions in a stochastic world, its applicability has always been limited by the socalled curse of dimensionality. The curse of dimensionality is the problem that adding new state variables inevitably results in much larger (often exponential) increases in the size of the state space, which can make solving superficially small problems impossible. A large state space makes optimal SDP solutions computationally expensive to compute because optimal SDP techniques require the value function to be updated for the entire state space for every time step. The high computational requirements of large SDP problems means that only simple population management problems can be analysed. In this paper we present an application of the online sparse sampling algorithm proposed by Kearns, Mansour & Ng (2002), which can be used to approximate the optimal solution of a MDP for a given starting state. The algorithm is particularly attractive for problems with large state spaces as it has a running time that is independent of the size of the state space of the problem.
Applying DTGolog to Largescale Domains
"... © Huy Pham 2006I hereby declare that I am the sole author of this thesis. I authorize Ryerson University to lend this thesis to other institutions or individuals for the purpose of scholarly research. ..."
Abstract
 Add to MetaCart
© Huy Pham 2006I hereby declare that I am the sole author of this thesis. I authorize Ryerson University to lend this thesis to other institutions or individuals for the purpose of scholarly research.