Results 1  10
of
19
Bandit based MonteCarlo Planning
 In: ECML06. Number 4212 in LNCS
, 2006
"... Abstract. For large statespace Markovian Decision Problems MonteCarlo planning is one of the few viable approaches to find nearoptimal solutions. In this paper we introduce a new algorithm, UCT, that applies bandit ideas to guide MonteCarlo planning. In finitehorizon or discounted MDPs the algo ..."
Abstract

Cited by 402 (5 self)
 Add to MetaCart
(Show Context)
Abstract. For large statespace Markovian Decision Problems MonteCarlo planning is one of the few viable approaches to find nearoptimal solutions. In this paper we introduce a new algorithm, UCT, that applies bandit ideas to guide MonteCarlo planning. In finitehorizon or discounted MDPs the algorithm is shown to be consistent and finite sample bounds are derived on the estimation error due to sampling. Experimental results show that in several domains, UCT is significantly more efficient than its alternatives. 1
Bayesian sparse sampling for online reward optimization
 In ICML ’05: Proceedings of the 22nd international conference on Machine learning
, 2005
"... We present an efficient “sparse sampling ” technique for approximating Bayes optimal decision making in reinforcement learning, addressing the well known exploration versus exploitation tradeoff. Our approach combines sparse sampling with Bayesian exploration to achieve improved decision making whil ..."
Abstract

Cited by 51 (5 self)
 Add to MetaCart
(Show Context)
We present an efficient “sparse sampling ” technique for approximating Bayes optimal decision making in reinforcement learning, addressing the well known exploration versus exploitation tradeoff. Our approach combines sparse sampling with Bayesian exploration to achieve improved decision making while controlling computational cost. The idea is to grow a sparse lookahead tree, intelligently, by exploiting information in a Bayesian posterior—rather than enumerate action branches (standard sparse sampling) or compensate myopically (value of perfect information). The outcome is a flexible, practical technique for improving action selection in simple reinforcement learning scenarios. 1.
Simple regret optimization in online planning for markov decision processes
 CoRR
, 2012
"... We consider online planning in Markov decision processes (MDPs). In online planning, the agent focuses on its current state only, deliberates about the set of possible policies from that state onwards and, when interrupted, uses the outcome of that exploratory deliberation to choose what action to p ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
We consider online planning in Markov decision processes (MDPs). In online planning, the agent focuses on its current state only, deliberates about the set of possible policies from that state onwards and, when interrupted, uses the outcome of that exploratory deliberation to choose what action to perform next. Formally, the performance of algorithms for online planning is assessed in terms of simple regret, the agent’s expected performance loss when the chosen action, rather than an optimal one, is followed. To date, stateoftheart algorithms for online planning in general MDPs are either best effort, or guarantee only polynomialrate reduction of simple regret over time. Here we introduce a new MonteCarlo tree search algorithm, BRUE, that guarantees exponentialrate and smooth reduction of simple regret. At a high level, BRUE is based on a simple yet nonstandard statespace sampling scheme, MCTS2e, in which different parts of each sample are dedicated to different exploratory objectives. We further extend BRUE with a variant of “learning by forgetting. ” The resulting parametrized algorithm, BRUE(α), exhibits even more attractive formal guarantees than BRUE. Our empirical evaluation shows that both BRUE and its generalization, BRUE(α), are also very effective in practice and compare favorably to the stateoftheart. 1.
Développement autonome des comportements de base d’un agent
 in « Conférence d’apprentissage 2004  CAp’04
, 2004
"... RÉSUMÉ. La problématique abordée dans cet article est celle de la conception automatique d’agents autonomes devant résoudre des tâches complexes mettant en œuvre plusieurs objectifs potentiellement concurrents. Nous proposons alors une approche modulaire s’appuyant sur les principes de la sélection ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
RÉSUMÉ. La problématique abordée dans cet article est celle de la conception automatique d’agents autonomes devant résoudre des tâches complexes mettant en œuvre plusieurs objectifs potentiellement concurrents. Nous proposons alors une approche modulaire s’appuyant sur les principes de la sélection d’action où les actions recommandées par plusieurs comportements de base sont combinées en une décision globale. Dans ce cadre, notre principale contribution est une méthode pour qu’un agent puisse définir et construire automatiquement les comportements de base dont il a besoin via des méthodes d’apprentissage par renforcement incrémentales. Nous obtenons ainsi une architecture très autonome ne nécessitant que peu de réglages. Cette approche est testée et discutée sur un problème représentatif issu du &quot;monde des tuiles&quot;. ABSTRACT. The problem addressed in this article is that of automatically designing autonomous agents having to solve complex tasks involving several –and possibly concurrent – objectives. We propose a modular approach based on the principles of action selection where the actions recommanded by several basic behaviors are combined in a global decision. In this framework, our main contribution is a method making an agent able to automatically define and build the basic behaviors it needs through incremental reinforcement learning methods. This way, we obtain a very autonomous architecture requiring very few handcoding. This approach is tested and discussed on a representative problem taken from the “tileworld”. MOTSCLÉS: problèmes de décision markoviens, apprentissage par renforcement, motivations multiples.
NEW REPRESENTATIONS AND APPROXIMATIONS FOR SEQUENTIAL DECISION MAKING UNDER UNCERTAINTY
, 2007
"... This dissertation research addresses the challenge of scaling up algorithms for sequential decision making under uncertainty. In my dissertation, I developed new approximation strategies for planning and learning in the presence of uncertainty while maintaining useful theoretical properties that all ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
This dissertation research addresses the challenge of scaling up algorithms for sequential decision making under uncertainty. In my dissertation, I developed new approximation strategies for planning and learning in the presence of uncertainty while maintaining useful theoretical properties that allow larger problems to be tackled than is practical with exact methods. In particular, my research tackles three outstanding issues in sequential decision making in uncertain environments: performing stable generalization during offpolicy updates, balancing exploration with exploitation, and handling partial observability of the environment. The first key contribution of my thesis is the development of novel dual representations and algorithms for planning and learning in stochastic environments. This dual view I have developed offers a coherent and comprehensive approach to optimal sequential decision making problems, provides an alternative to standard value function based techniques, and opens new avenues for solving sequential decision making problems. In particular, I have shown that dual dynamic program
Applying DTGolog to Largescale Domains
"... © Huy Pham 2006I hereby declare that I am the sole author of this thesis. I authorize Ryerson University to lend this thesis to other institutions or individuals for the purpose of scholarly research. ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
© Huy Pham 2006I hereby declare that I am the sole author of this thesis. I authorize Ryerson University to lend this thesis to other institutions or individuals for the purpose of scholarly research.
MonteCarlo Planning for Pathfinding in RealTime Strategy Games
"... In this work, we explore two MonteCarlo planning approaches: Upper Confidence Tree (UCT) and Rapidlyexploring Random Tree (RRT). These MonteCarlo planning approaches are applied in a realtime strategy game for solving the path finding problem. The planners are evaluated using a gridbased represe ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
In this work, we explore two MonteCarlo planning approaches: Upper Confidence Tree (UCT) and Rapidlyexploring Random Tree (RRT). These MonteCarlo planning approaches are applied in a realtime strategy game for solving the path finding problem. The planners are evaluated using a gridbased representation of our game world. The results show that the UCT planner solves the path planning problem with significantly less search effort than the RRT planner. The game playing performance of each planner is evaluated using the mean, maximum and minimum scores in the test games. With respect to the mean scores, the RRT planner shows better performance than the UCT planner. The RRT planner achieves more maximum scores than the UCT planner in the test games.
1Decision making under uncertainty: a quasimetric approach
"... We propose a new approach for solving a class of discrete decision making problems under uncertainty with positive cost. This issue concerns multiple and diverse fields such as engineering, economics, artificial intelligence, cognitive science and many others. Basically, an agent has to choose a sin ..."
Abstract
 Add to MetaCart
(Show Context)
We propose a new approach for solving a class of discrete decision making problems under uncertainty with positive cost. This issue concerns multiple and diverse fields such as engineering, economics, artificial intelligence, cognitive science and many others. Basically, an agent has to choose a single or series of actions from a set of options, without knowing for sure their consequences. Schematically, two main approaches have been followed: either the agent learns which option is the correct one to choose in a given situation by trial and error, or the agent already has some knowledge on the possible consequences of his decisions; this knowledge being generally expressed as a conditional probability distribution. In the latter case, several optimal or suboptimal methods have been proposed to exploit this uncertain knowledge in various contexts. In this work, we propose following a different approach, based on the geometric intuition of distance. More precisely, we define a goal independent quasimetric structure on the state space, taking into account both cost function and transition probability. We then compare precision and computation time with classical approaches.
2.1 Stochastic Shortest Path Problems................... 2
, 2004
"... Stochastic Shortest Path problems (SSPs), a subclass of Markov Decision Problems (MDPs), can be efficiently dealt with using RealTime Dynamic Programming (RTDP). Yet, MDP models are often uncertain (obtained through statistics or guessing). The usual approach is robust planning: searching for the b ..."
Abstract
 Add to MetaCart
(Show Context)
Stochastic Shortest Path problems (SSPs), a subclass of Markov Decision Problems (MDPs), can be efficiently dealt with using RealTime Dynamic Programming (RTDP). Yet, MDP models are often uncertain (obtained through statistics or guessing). The usual approach is robust planning: searching for the best policy under the worst model. This paper shows how RTDP can be made robust in the common case where transition probabilities are known to lie in a given interval.