Results 1  10
of
76
Parameterexploring Policy Gradients
, 2009
"... We present a modelfree reinforcement learning method for partially observable Markov decision problems. Our method estimates a likelihood gradient by sampling directly in parameter space, which leads to lower variance gradient estimates than obtained by regular policy gradient methods. We show that ..."
Abstract

Cited by 24 (4 self)
 Add to MetaCart
We present a modelfree reinforcement learning method for partially observable Markov decision problems. Our method estimates a likelihood gradient by sampling directly in parameter space, which leads to lower variance gradient estimates than obtained by regular policy gradient methods. We show that for several complex control tasks, including robust standing with a humanoid robot, this method outperforms wellknown algorithms from the fields of standard policy gradients, finite difference methods and population based heuristics. We also show that the improvement is largest when the parameter samples are drawn symmetrically. Lastly we analyse the importance of the individual components of our method by incrementally incorporating them into the other algorithms, and measuring the gain in performance after each step.
Policy Gradients with ParameterBased Exploration for Control
, 2008
"... We present a modelfree reinforcement learning method for partially observable Markov decision problems. Our method estimates a likelihood gradient by sampling directly in parameter space, which leads to lower variance gradient estimates than those obtained by policy gradient methods such as REINF ..."
Abstract

Cited by 16 (10 self)
 Add to MetaCart
(Show Context)
We present a modelfree reinforcement learning method for partially observable Markov decision problems. Our method estimates a likelihood gradient by sampling directly in parameter space, which leads to lower variance gradient estimates than those obtained by policy gradient methods such as REINFORCE. For several complex control tasks, including robust standing with a humanoid robot, we show that our method outperforms wellknown algorithms from the fields of policy gradients, finite difference methods and population based heuristics. We also provide a detailed analysis of the differences between our method and the other algorithms.
Bayesian model updating using hybrid monte carlo simulation with application to structural dynamics models with many uncertain parameters. Journal of Engineering Mechanics. drillstring dynamics 142
"... Abstract: In recent years, Bayesian model updating techniques based on measured data have been applied to system identification of structures and to structural health monitoring. A fully probabilistic Bayesian model updating approach provides a robust and rigorous framework for these applications du ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
Abstract: In recent years, Bayesian model updating techniques based on measured data have been applied to system identification of structures and to structural health monitoring. A fully probabilistic Bayesian model updating approach provides a robust and rigorous framework for these applications due to its ability to characterize modeling uncertainties associated with the underlying structural system and to its exclusive foundation on the probability axioms. The plausibility of each structural model within a set of possible models, given the measured data, is quantified by the joint posterior probability density function of the model parameters. This Bayesian approach requires the evaluation of multidimensional integrals, and this usually cannot be done analytically. Recently, some Markov chain Monte Carlo simulation methods have been developed to solve the Bayesian model updating problem. However, in general, the efficiency of these proposed approaches is adversely affected by the dimension of the model parameter space. In this paper, the Hybrid Monte Carlo method is investigated �also known as Hamiltonian Markov chain method�, and we show how it can be used to solve higherdimensional Bayesian model updating problems. Practical issues for the feasibility of the Hybrid Monte Carlo method to such problems are addressed, and improvements are proposed to make it more effective and efficient for solving such model updating problems. New formulae for Markov chain convergence assessment are derived. The effectiveness of the proposed approach for Bayesian model updating of structural dynamic models with many uncertain parameters is illustrated with a simulated data example involving a tenstory building that has 31 model parameters to be updated.
Adaptive mechanism design: a metalearning approach
 In Proceedings of the Eighth International Conference on Electronic Commerce (ICEC ’06
, 2006
"... Auction mechanism design has traditionally been a largely analytic process, relying on assumptions such as fully rational bidders. In practice, however, bidders often exhibit unknown and variable behavior, making them difficult to model and complicating the design process. To address this challenge, ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
(Show Context)
Auction mechanism design has traditionally been a largely analytic process, relying on assumptions such as fully rational bidders. In practice, however, bidders often exhibit unknown and variable behavior, making them difficult to model and complicating the design process. To address this challenge, we explore the use of an adaptive auction mechanism: one that learns to adjust its parameters in response to past empirical bidder behavior so as to maximize an objective function such as auctioneer revenue. In this paper, we give an overview of our general approach and then present an instantiation in a specific auction scenario. In addition, we show how predictions of possible bidder behavior can be incorporated into the adaptive mechanism through a metalearning process. The approach is fully implemented and tested. Results indicate that the adaptive mechanism is able to outperform any single fixed mechanism, and that the addition of metalearning improves performance substantially.
Offline Calibration of Dynamic Traffic Assignment Models
, 2006
"... Advances in Intelligent Transportation Systems (ITS) have resulted in the deployment of surveillance systems that automatically collect and store extensive networkwide traffic data. Dynamic Traffic Assignment (DTA) models have also been developed for a variety of dynamic traffic management applicat ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
Advances in Intelligent Transportation Systems (ITS) have resulted in the deployment of surveillance systems that automatically collect and store extensive networkwide traffic data. Dynamic Traffic Assignment (DTA) models have also been developed for a variety of dynamic traffic management applications. Such models are designed to estimate and predict the evolution of congestion through detailed models and algorithms that capture travel demand, network supply and their complex interactions. The availability of rich timevarying traffic data spanning multiple days thus provides the opportunity to calibrate a DTA model’s many inputs and parameters, so that its outputs reflect field conditions. The current
Multisubject registration for unbiased statistical atlas construction
 in MICCAI, 2004
, 2004
"... Abstract. This paper introduces a new similarity measure designed to bring a population of segmented subjects into alignment in a common coordinate system. Our metric aligns each subject with a hidden probabilistic model of the common spatial distribution of anatomical tissues, estimated using STAPL ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
(Show Context)
Abstract. This paper introduces a new similarity measure designed to bring a population of segmented subjects into alignment in a common coordinate system. Our metric aligns each subject with a hidden probabilistic model of the common spatial distribution of anatomical tissues, estimated using STAPLE. Our approach does not require the selection of a subject of the population as a “target subject”, nor the identification of “stable ” landmarks across subjects. Rather, the approach determines automatically from the data what the most consistent alignment of the joint data is, subject to the particular transformation family used to align the subjects. The computational cost of joint simultaneous registration of the population of subjects is small due to the use of an efficient gradient estimate used to solve the optimization transform aligning each subject. The efficacy of the approach in constructing an unbiased statistical atlas was demonstrated by carrying out joint alignment of 20 segmentations of MRI of healthy preterm infants, using an affine transformation model and a FEM volumetric tetrahedral mesh transformation model. 1
A Neural Network Assisted Cascade Control System for Air Handling Unit
"... Abstract—In the centralized heating, ventilating and airconditioning (HVAC) system, air handling units (AHUs) are traditionally controlled by singleloop proportionalintegralderivative (PID) controllers. The control structure is simple, but the performance is usually not satisfactory. In this pap ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
Abstract—In the centralized heating, ventilating and airconditioning (HVAC) system, air handling units (AHUs) are traditionally controlled by singleloop proportionalintegralderivative (PID) controllers. The control structure is simple, but the performance is usually not satisfactory. In this paper, we propose a cascade control strategy for temperature control of AHU. Instead of a fixed PID controller in the classical cascade control scheme, a neural network (NN) controller is used in the outer control loop. This approach not only overcomes the tedious tuning procedure for the inner and outer loop PID parameters of a classical cascade control system, but also makes the whole control system be adaptive and robust. The multilayer NN is trained online by a special training algorithm—simultaneous perturbation stochastic approximation (SPSA)based training algorithm. With the SPSAbased training algorithm, the weight convergence of the NN and stability of the control system is guaranteed. The novel cascade control system has been implemented on an experimental HVAC system. Testing results demonstrate the effectiveness of the proposed algorithm over the classical cascade control system. Index Terms—Air handling units, cascade control, neural networks (NNs), simultaneous perturbation stochastic approximation (SPSA). I.
An Efficient Stochastic Approach to Groupwise Nonrigid Image Registration
"... The groupwise approach to nonrigid image registration, solving the dense correspondence problem, has recently been shown to be a useful tool in many applications, including medical imaging, automatic construction of statistical models of appearance and analysis of facial dynamics. Such an approach ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
The groupwise approach to nonrigid image registration, solving the dense correspondence problem, has recently been shown to be a useful tool in many applications, including medical imaging, automatic construction of statistical models of appearance and analysis of facial dynamics. Such an approach overcomes limitations of traditional pairwise methods but at a cost of having to search for the solution (optimal registration) in a space of much higher dimensionality which grows rapidly with the number of examples (images) being registered. Techniques to overcome this dimensionality problem have not been addressed sufficiently in the groupwise registration literature. In this paper, we propose a novel, fast and reliable, fully unsupervised stochastic algorithm to search for optimal groupwise dense correspondence in large sets of unmarked images. The efficiency of our approach stems from novel dimensionality reduction techniques specific to the problem of groupwise image registration and from comparative insensitivity of the adopted optimisation scheme (Simultaneous Perturbation Stochastic Approximation (SPSA)) to the high dimensionality of the search space. Additionally, our algorithm is formulated in way readily suited to implementation on graphics processing units (GPU). In evaluation of our method we show a high robustness and success rate, fast convergence on various types of test data, including facial images featuring large degrees of both inter and intraperson variation, and show considerable improvement in terms of accuracy of solution and speed compared to traditional methods. 1.
Optimisation Of Particle Filters Using Simultaneous Perturbation Stochastic Approximation
, 2003
"... This paper addresses the optimisation of particle filtering methods aka Sequential Monte Carlo (SMC) methods using stochastic approximation. First, the SMC algorithm is parameterised smoothly by a parameter. Second, optimisation of an average cost function is performed using Simultaneous Perturbatio ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
This paper addresses the optimisation of particle filtering methods aka Sequential Monte Carlo (SMC) methods using stochastic approximation. First, the SMC algorithm is parameterised smoothly by a parameter. Second, optimisation of an average cost function is performed using Simultaneous Perturbation Stochastic Approximation (SPSA). Simulations demonstrate the efficiency of our algorithm.
On the Choice of Random Directions for Stochastic Approximation Algorithms
 IEEE Transactions on Automatic Control
"... We investigate variants of the KushnerClark Random Direction Stochastic Approximation (RDSA) algorithm for optimizing noisy loss functions in highdimensional spaces. These variants employ different strategies for choosing random directions. The most popular approach is random selection from a B ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
We investigate variants of the KushnerClark Random Direction Stochastic Approximation (RDSA) algorithm for optimizing noisy loss functions in highdimensional spaces. These variants employ different strategies for choosing random directions. The most popular approach is random selection from a Bernoulli distribution, which for historical reasons goes also by the name Simultaneous Perturbation Stochastic Approximation (SPSA). But viable alternatives include an axisaligned distribution, a normal distribution, and a uniform distribution on a spherical shell. Although there are special cases where the Bernoulli distribution is optimal, there are other cases where it performs worse than other alternatives. We find that for generic loss functions that are not aligned to the coordinate axes, the average asymptotic performance is depends only on the radial fourth moment of the distribution of directions, and is identical for Bernoulli, the axisaligned, and the spherical shell distributions. Of these variants, the spherical shell is optimal in the sense of minimum variance over random orientations of the loss function with respect to the coordinate axes. We also show that for unaligned loss functions, the performance of the KeiferWolfowitzBlum Finite Difference Stochastic Approximation (FDSA) is asymptotically equivalent to the RDSA algorithms, and we observe numerically that the preasymptotic performance of FDSA is often superior. We also introduce a "quasirandom" selection process which exhibits the same asymptotic performance, but empirically is observed to converge to the asymptote more rapidly.