Results 1  10
of
67
Controlled MCMC for Optimal Sampling
, 2001
"... this paper we develop an original and general framework for automatically optimizing the statistical properties of Markov chain Monte Carlo (MCMC) samples, which are typically used to evaluate complex integrals. The MetropolisHastings algorithm is the basic building block of classical MCMC methods ..."
Abstract

Cited by 37 (6 self)
 Add to MetaCart
this paper we develop an original and general framework for automatically optimizing the statistical properties of Markov chain Monte Carlo (MCMC) samples, which are typically used to evaluate complex integrals. The MetropolisHastings algorithm is the basic building block of classical MCMC methods and requires the choice of a proposal distribution, which usually belongs to a parametric family. The correlation properties together with the exploratory ability of the Markov chain heavily depend on the choice of the proposal distribution. By monitoring the simulated path, our approach allows us to learn "on the fly" the optimal parameters of the proposal distribution for several statistical criteria. Keywords: Monte Carlo, adaptive MCMC, calibration, stochastic approximation, gradient method, optimal scaling, random walk, Langevin, Gibbs, controlled Markov chain, learning algorithm, reversible jump MCMC. 1. Motivation 1.1. Introduction Markov chain Monte Carlo (MCMC) is a general strategy for generating samples x i (i = 0; 1; : : :) from complex highdimensional distributions, say defined on the space X ae R nx , from which integrals of the type I (f) = Z X f (x) (x) dx; can be calculated using the estimator b I N (f) = 1 N + 1 N X i=0 f (x i ) ; provided that the Markov chain produced is ergodic. The main building block of this class of algorithms is the MetropolisHastings (MH) algorithm. It requires the definition of a proposal distribution q whose role is to generate possible transitions for the Markov chain, say from x to y, which are then accepted or rejected according to the probabilityy ff (x; y) = min ae 1; (y) q (y; x) (x) q (x; y) oe : The simplicity and universality of this algorithm are both its strength and weakness. The choice of ...
Noisy optimization with evolution strategies
 SIAM Journal on Optimization
"... Evolution strategies are general, natureinspired heuristics for search and optimization. Supported both by empirical evidence and by recent theoretical findings, there is a common belief that evolution strategies are robust and reliable, and frequently they are the method of choice if neither deriv ..."
Abstract

Cited by 36 (6 self)
 Add to MetaCart
Evolution strategies are general, natureinspired heuristics for search and optimization. Supported both by empirical evidence and by recent theoretical findings, there is a common belief that evolution strategies are robust and reliable, and frequently they are the method of choice if neither derivatives of the objective function are at hand nor differentiability and numerical accuracy can be assumed. However, despite their widespread use, there is little exchange between members of the “classical ” optimization community and people working in the field of evolutionary computation. It is our belief that both sides would benefit from such an exchange. In this paper, we present a brief outline of evolution strategies and discuss some of their properties in the presence of noise. We then empirically demonstrate that for a simple but nonetheless nontrivial noisy objective function, an evolution strategy outperforms other optimization algorithms designed to be able to cope with noise. The environment in which the algorithms are tested is deliberately chosen to afford a transparency of the results that reveals the strengths and shortcomings of the strategies, making it possible to draw conclusions with regard to the design of better optimization algorithms for noisy environments. 1
Global random optimization by simultaneous perturbation stochastic approximation
 in Proc. Amer. Control Conf
, 2001
"... Abstract—We examine the theoretical and numerical global convergence properties of a certain “gradient free ” stochastic approximation algorithm called the “simultaneous perturbation stochastic approximation (SPSA)” that has performed well in complex optimization problems. We establish two theorems ..."
Abstract

Cited by 28 (0 self)
 Add to MetaCart
Abstract—We examine the theoretical and numerical global convergence properties of a certain “gradient free ” stochastic approximation algorithm called the “simultaneous perturbation stochastic approximation (SPSA)” that has performed well in complex optimization problems. We establish two theorems on the global convergence of SPSA, the first involving the wellknown method of injected noise. The second theorem establishes conditions under which “basic ” SPSA without injected noise can achieve convergence in probability to a global optimum, a result with important practical benefits. Index Terms—Global convergence, simulated annealing, simultaneous perturbation stochastic approximation (SPSA), stochastic approximation (SA), stochastic optimization. I.
Adaptive Multivariate ThreeTimescale Stochastic Approximation Algorithms for Simulation Based Optimization
, 2003
"... ..."
Bayesian evidence test for precise hypotheses
 JOURNAL OF STATISTICAL PLANNING AND INFERENCE
, 2003
"... ..."
Maximum likelihood parameter estimation in General StateSpace Models using Particle Methods
 Proc of the American Stat. Assoc
, 2005
"... ; general statespace model; filter derivative; hahnjordan decomposition. A large number of time series can be described by nonlinear, nonGaussian statespace models. While state estimation for these models is now routinely performed using particle filters, maximum likelihood estimation of the mod ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
; general statespace model; filter derivative; hahnjordan decomposition. A large number of time series can be described by nonlinear, nonGaussian statespace models. While state estimation for these models is now routinely performed using particle filters, maximum likelihood estimation of the model parameters is much more challenging. In this paper, we present new numerical methods to approximate the derivative of the optimal filter. We use this to perform batch and recursive maximum likelihood parameter estimation and tracking by maximizing the likelihood through a gradient ascent method. We generalize the method to include the second derivative of the optimal filter. This provides estimates of the Hessian of the likelihood and can be used to accelerate the gradient ascent method. 1.
Adaptive newtonbased multivariate smoothed functional algorithms for simulation optimization
 ACM Transactions on Modeling and Computer Simulation, Accepted for Publication
, 2007
"... In this paper, we present three smoothed functional (SF) algorithms for simulation optimization. While one of these estimates only the gradient by using a finite difference approximation with two parallel simulations, the other two are adaptive Newton based stochastic approximation algorithms that e ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
In this paper, we present three smoothed functional (SF) algorithms for simulation optimization. While one of these estimates only the gradient by using a finite difference approximation with two parallel simulations, the other two are adaptive Newton based stochastic approximation algorithms that estimate both the gradient and Hessian. One of the Newton based algorithms uses only one simulation and has a onesided estimate in both the gradient and Hessian, while the other uses twosided estimates in both quantities and requires two simulations. For obtaining gradient and Hessian estimates, we perturb each parameter component randomly using independent and identically distributed (i.i.d) Gaussian random variates. The earlier SF algorithms in the literature only estimate the gradient of the objective function. Using similar techniques, we derive two unbiased SF based estimators for the Hessian and develop suitable threetimescale stochastic approximation procedures for simulation optimization. We present a detailed convergence analysis of our algorithms and show numerical experiments with parameters of dimension 50 on a setting involving a network of M/G/1 queues with feedback. We compare the performance of our algorithms with related algorithms in the literature. While our twosimulation Newton based algorithm shows the best results overall, our onesimulation algorithm shows better performance as compared to other onesimulation algorithms.
Feedback and weighting mechanisms for improving Jacobian (Hessian) estimates in the adaptive simultaneous perturbation algorithm
 Proc. Amer. Control Conf., Piscataway, NJ
, 2006
"... Abstract—It is known that a stochastic approximation (SA) analogue of the deterministic NewtonRaphson algorithm provides an asymptotically optimal or nearoptimal form of stochastic search. However, directly determining the required Jacobian matrix (or Hessian matrix for optimization) has often bee ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
Abstract—It is known that a stochastic approximation (SA) analogue of the deterministic NewtonRaphson algorithm provides an asymptotically optimal or nearoptimal form of stochastic search. However, directly determining the required Jacobian matrix (or Hessian matrix for optimization) has often been difficult or impossible in practice. This paper presents a general adaptive SA algorithm that is based on a simple method for estimating the Jacobian matrix while concurrently estimating the primary parameters of interest. Relative to prior methods for adaptively estimating the Jacobian matrix, the paper introduces two enhancements that generally improve the quality of the estimates for underlying Jacobian (Hessian) matrices, thereby improving the quality of the estimates for the primary parameters of interest. The first enhancement rests on a feedback process that uses previous Jacobian estimates to reduce the error in the current estimate. The second enhancement is based on an optimal weighting of periteration Jacobian estimates. From the use of simultaneous perturbations, the algorithm requires only a small number of loss function or gradient measurements per iteration—independent of the problem dimension—to adaptively estimate the Jacobian matrix and parameters of primary interest. Index Terms—Adaptive estimation, Jacobian matrix, rootfinding, simultaneous perturbation stochastic approximation (SPSA), stochastic optimization. I.
On the Choice of Random Directions for Stochastic Approximation Algorithms
 IEEE Transactions on Automatic Control
"... We investigate variants of the KushnerClark Random Direction Stochastic Approximation (RDSA) algorithm for optimizing noisy loss functions in highdimensional spaces. These variants employ different strategies for choosing random directions. The most popular approach is random selection from a B ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
We investigate variants of the KushnerClark Random Direction Stochastic Approximation (RDSA) algorithm for optimizing noisy loss functions in highdimensional spaces. These variants employ different strategies for choosing random directions. The most popular approach is random selection from a Bernoulli distribution, which for historical reasons goes also by the name Simultaneous Perturbation Stochastic Approximation (SPSA). But viable alternatives include an axisaligned distribution, a normal distribution, and a uniform distribution on a spherical shell. Although there are special cases where the Bernoulli distribution is optimal, there are other cases where it performs worse than other alternatives. We find that for generic loss functions that are not aligned to the coordinate axes, the average asymptotic performance is depends only on the radial fourth moment of the distribution of directions, and is identical for Bernoulli, the axisaligned, and the spherical shell distributions. Of these variants, the spherical shell is optimal in the sense of minimum variance over random orientations of the loss function with respect to the coordinate axes. We also show that for unaligned loss functions, the performance of the KeiferWolfowitzBlum Finite Difference Stochastic Approximation (FDSA) is asymptotically equivalent to the RDSA algorithms, and we observe numerically that the preasymptotic performance of FDSA is often superior. We also introduce a "quasirandom" selection process which exhibits the same asymptotic performance, but empirically is observed to converge to the asymptote more rapidly.
Particle Filterbased Policy Gradient
 in POMDPs,” Advances in Neural Inforamtion Processing Systems
"... Our setting is a Partially Observable Markov Decision Process with continuous state, observation and action spaces. Decisions are based on a Particle Filter for estimating the belief state given past observations. We consider a policy gradient approach for parameterized policy optimization. For that ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Our setting is a Partially Observable Markov Decision Process with continuous state, observation and action spaces. Decisions are based on a Particle Filter for estimating the belief state given past observations. We consider a policy gradient approach for parameterized policy optimization. For that purpose, we investigate sensitivity analysis of the performance measure with respect to the parameters of the policy, focusing on Finite Difference (FD) techniques. We show that the naive FD is subject to variance explosion because of the nonsmoothness of the resampling procedure. We propose a more sophisticated FD method which overcomes this problem and establish its consistency. 1