Results 1  10
of
167
Optimal and approximate Qvalue functions for decentralized POMDPs
 J. Artificial Intelligence Research
"... Decisiontheoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In singleagent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Qvalue functions: an optimal Qvalue functi ..."
Abstract

Cited by 62 (26 self)
 Add to MetaCart
Decisiontheoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In singleagent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Qvalue functions: an optimal Qvalue function Q ∗ is computed in a recursive manner by dynamic programming, and then an optimal policy is extracted from Q ∗. In this paper we study whether similar Qvalue functions can be defined for decentralized POMDP models (DecPOMDPs), and how policies can be extracted from such value functions. We define two forms of the optimal Qvalue function for DecPOMDPs: one that gives a normative description as the Qvalue function of an optimal pure joint policy and another one that is sequentially rational and thus gives a recipe for computation. This computation, however, is infeasible for all but the smallest problems. Therefore, we analyze various approximate Qvalue functions that allow for efficient computation. We describe how they relate, and we prove that they all provide an upper bound to the optimal Qvalue function Q ∗. Finally, unifying some previous approaches for solving DecPOMDPs, we describe a family of algorithms for extracting policies from such Qvalue functions, and perform an experimental evaluation on existing test problems, including a new firefighting benchmark problem. 1.
Learning Tetris Using the Noisy CrossEntropy Method
 Neural Computation
"... The crossentropy method is an efficient and general optimization algorithm. However, its applicability in reinforcement learning seems to be limited although it is fast, because it often converges to suboptimal policies. A standard technique for preventing early convergence is to introduce noise. W ..."
Abstract

Cited by 55 (2 self)
 Add to MetaCart
The crossentropy method is an efficient and general optimization algorithm. However, its applicability in reinforcement learning seems to be limited although it is fast, because it often converges to suboptimal policies. A standard technique for preventing early convergence is to introduce noise. We apply the noisy crossentropy method to the game of Tetris to demonstrate its efficiency. The resulting policy outperforms previous RL algorithms by almost two orders of magnitude, and reaches over 300,000 points on average. Key words: tetris, crossentropy method, reinforcement learning 1
Adaptive importance sampling in general mixture classes
, 2007
"... In this paper, we propose an adaptive algorithm that iteratively updates both the weights and component parameters of a mixture importance sampling density so as to optimise the performance of importance sampling, as measured by an entropy criterion. The method, called MPMC, is shown to be applicab ..."
Abstract

Cited by 37 (11 self)
 Add to MetaCart
In this paper, we propose an adaptive algorithm that iteratively updates both the weights and component parameters of a mixture importance sampling density so as to optimise the performance of importance sampling, as measured by an entropy criterion. The method, called MPMC, is shown to be applicable to a wide class of importance sampling densities, which includes in particular mixtures of multivariate Student t distributions. The performance of the proposed scheme is studied on both artificial and real examples, highlighting in particular the benefit of a novel RaoBlackwellisation device which can be easily incorporated in the updating scheme.
Minimum variance importance sampling via population Monte Carlo. ESAIM: Probability and Statistics
, 2007
"... Variance reduction has always been a central issue in Monte Carlo experiments. Population Monte Carlo can be used to this effect, in that a mixture of importance functions, called a Dkernel, can be iteratively optimised to achieve the minimum asymptotic variance for a function of interest among all ..."
Abstract

Cited by 31 (11 self)
 Add to MetaCart
Variance reduction has always been a central issue in Monte Carlo experiments. Population Monte Carlo can be used to this effect, in that a mixture of importance functions, called a Dkernel, can be iteratively optimised to achieve the minimum asymptotic variance for a function of interest among all possible mixtures. The implementation of this iterative scheme is illustrated for the computation of the price of a European option in the CoxIngersollRoss model. A Central Limit theorem as well as moderate deviations are established for the Dkernel Population Monte Carlo methodology.
The crossentropy method for continuous multiextremal optimization
 Methodology and Computing in Applied Probability
"... Abstract In recent years, the crossentropy method has been successfully applied to a wide range of discrete optimization tasks. In this paper we consider the crossentropy method in the context of continuous optimization. We demonstrate the effectiveness of the crossentropy method for solving diffi ..."
Abstract

Cited by 29 (6 self)
 Add to MetaCart
Abstract In recent years, the crossentropy method has been successfully applied to a wide range of discrete optimization tasks. In this paper we consider the crossentropy method in the context of continuous optimization. We demonstrate the effectiveness of the crossentropy method for solving difficult continuous multiextremal optimization problems, including those with nonlinear constraints.
A Model Reference Adaptive Search Method for Global Optimization
 2007 Oper. Res
, 2008
"... informs ® doi 10.1287/opre.1060.0367 © 2007 INFORMS Model reference adaptive search (MRAS) for solving global optimization problems works with a parameterized probabilistic model on the solution space and generates at each iteration a group of candidate solutions. These candidate solutions are then ..."
Abstract

Cited by 26 (10 self)
 Add to MetaCart
informs ® doi 10.1287/opre.1060.0367 © 2007 INFORMS Model reference adaptive search (MRAS) for solving global optimization problems works with a parameterized probabilistic model on the solution space and generates at each iteration a group of candidate solutions. These candidate solutions are then used to update the parameters associated with the probabilistic model in such a way that the future search will be biased toward the region containing highquality solutions. The parameter updating procedure in MRAS is guided by a sequence of implicit probabilistic models we call reference models. We provide a particular algorithm instantiation of the MRAS method, where the sequence of reference models can be viewed as the generalized probability distribution models for estimation of distribution algorithms (EDAs) with proportional selection scheme. In addition, we show that the model reference framework can also be used to describe the recently proposed crossentropy (CE) method for optimization and to study its properties. Hence, this paper can also be seen as a study on the effectiveness of combining CE and EDAs. We prove global convergence of the proposed algorithm in both continuous and combinatorial domains, and we carry out numerical studies to illustrate the performance of the algorithm.
Efficient Monte Carlo simulation via the generalized splitting method. Statistics and Computing
, 2011
"... We describe a new Monte Carlo algorithm for the consistent and unbiased estimation of multidimensional integrals and the efficient sampling from multidimensional densities. The algorithm is inspired by the classical splitting method and can be applied to general static simulation models. We provide ..."
Abstract

Cited by 25 (10 self)
 Add to MetaCart
(Show Context)
We describe a new Monte Carlo algorithm for the consistent and unbiased estimation of multidimensional integrals and the efficient sampling from multidimensional densities. The algorithm is inspired by the classical splitting method and can be applied to general static simulation models. We provide examples from rareevent probability estimation, counting, and sampling, demonstrating that the proposed method can outperform existing Markov chain sampling methods in terms of convergence speed and accuracy.
InformationGeometric Optimization Algorithms: A Unifying Picture via Invariance Principles
, 2011
"... We present a canonical way to turn any smooth parametric family of probability distributions on an arbitrary search space X into a continuoustime blackbox optimization method on X, the informationgeometric optimization (IGO) method. Invariance as a major design principle keeps the number of arbitr ..."
Abstract

Cited by 25 (6 self)
 Add to MetaCart
(Show Context)
We present a canonical way to turn any smooth parametric family of probability distributions on an arbitrary search space X into a continuoustime blackbox optimization method on X, the informationgeometric optimization (IGO) method. Invariance as a major design principle keeps the number of arbitrary choices to a minimum. The resulting method conducts a natural gradient ascent using an adaptive, timedependent transformation of the objective function, and makes no particular assumptions on the objective function to be optimized. The IGO method produces explicit IGO algorithms through time discretization. The crossentropy method is recovered in a particular case with a large time step, and can be extended into a smoothed, parametrizationindependent maximum likelihood update. When applied to specific families of distributions on discrete or continuous spaces, the IGO framework allows to naturally recover versions
2008: Adaptive methods for sequential importance sampling with application to state space models
 Statistics and Computing
"... Abstract. In this paper we discuss new adaptive proposal strategies for sequential Monte Carlo algorithms—also known as particle filters—relying on criteria evaluating the quality of the proposed particles. The choice of the proposal distribution is a major concern and can dramatically influence the ..."
Abstract

Cited by 24 (6 self)
 Add to MetaCart
(Show Context)
Abstract. In this paper we discuss new adaptive proposal strategies for sequential Monte Carlo algorithms—also known as particle filters—relying on criteria evaluating the quality of the proposed particles. The choice of the proposal distribution is a major concern and can dramatically influence the quality of the estimates. Thus, we show how the longused coefficient of variation (suggested by Kong et al. (1994)) of the weights can be used for estimating the chisquare distance between the target and instrumental distributions of the auxiliary particle filter. As a byproduct of this analysis we obtain an auxiliary adjustment multiplier weight type for which this chisquare distance is minimal. Moreover, we establish an empirical estimate of linear complexity of the KullbackLeibler divergence between the involved distributions. Guided by these results, we discuss adaptive designing of the particle filter proposal distribution and illustrate the methods on a numerical example. 1.
Learning to Play Using LowComplexity RuleBased Policies: Illustrations through Ms. PacMan
"... In this article we propose a method that can deal with certain combinatorial reinforcement learning tasks. We demonstrate the approach in the popular Ms. PacMan game. We de ne a set of highlevel observation and action modules, from which rulebased policies are constructed automatically. In these ..."
Abstract

Cited by 23 (1 self)
 Add to MetaCart
In this article we propose a method that can deal with certain combinatorial reinforcement learning tasks. We demonstrate the approach in the popular Ms. PacMan game. We de ne a set of highlevel observation and action modules, from which rulebased policies are constructed automatically. In these policies, actions are temporally extended, and may work concurrently. The policy of the agent is encoded by a compact decision list. The components of the list are selected from a large pool of rules, which can be either handcrafted or generated automatically. A suitable selection of rules is learnt by the crossentropy method, a recent global optimization algorithm that ts our framework smoothly. Crossentropyoptimized policies perform better than our handcrafted policy, and reach the score of average human players. We argue that learning is successful mainly because (i) policies may apply concurrent actions and thus the policy space is su ciently rich, (ii) the search is biased towards lowcomplexity policies and therefore, solutions with a compact description can be found quickly if they exist. 1.