Results 1 - 10
of
34
Policy Gradients with Parameter-Based Exploration for Control
"... Abstract. We present a model-free reinforcement learning method for partially observable Markov decision problems. Our method estimates a likelihood gradient by sampling directly in parameter space, which leads to lower variance gradient estimates than those obtained by policy gradient methods such ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
Abstract. We present a model-free reinforcement learning method for partially observable Markov decision problems. Our method estimates a likelihood gradient by sampling directly in parameter space, which leads to lower variance gradient estimates than those obtained by policy gradient methods such as REINFORCE. For several complex control tasks, including robust standing with a humanoid robot, we show that our method outperforms well-known algorithms from the fields of policy gradients, finite difference methods and population based heuristics. We also provide a detailed analysis of the differences between our method and the other algorithms. 1
Adaptive mechanism design: a metalearning approach
- In Proceedings of the Eighth International Conference on Electronic Commerce (ICEC ’06
, 2006
"... Auction mechanism design has traditionally been a largely analytic process, relying on assumptions such as fully rational bidders. In practice, however, bidders often exhibit unknown and variable behavior, making them difficult to model and complicating the design process. To address this challenge, ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Auction mechanism design has traditionally been a largely analytic process, relying on assumptions such as fully rational bidders. In practice, however, bidders often exhibit unknown and variable behavior, making them difficult to model and complicating the design process. To address this challenge, we explore the use of an adaptive auction mechanism: one that learns to adjust its parameters in response to past empirical bidder behavior so as to maximize an objective function such as auctioneer revenue. In this paper, we give an overview of our general approach and then present an instantiation in a specific auction scenario. In addition, we show how predictions of possible bidder behavior can be incorporated into the adaptive mechanism through a metalearning process. The approach is fully implemented and tested. Results indicate that the adaptive mechanism is able to outperform any single fixed mechanism, and that the addition of metalearning improves performance substantially.
Multi-subject registration for unbiased statistical atlas construction
- in MICCAI, 2004
, 2004
"... Abstract. This paper introduces a new similarity measure designed to bring a population of segmented subjects into alignment in a common coordinate system. Our metric aligns each subject with a hidden probabilistic model of the common spatial distribution of anatomical tissues, estimated using STAPL ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Abstract. This paper introduces a new similarity measure designed to bring a population of segmented subjects into alignment in a common coordinate system. Our metric aligns each subject with a hidden probabilistic model of the common spatial distribution of anatomical tissues, estimated using STAPLE. Our approach does not require the selection of a subject of the population as a “target subject”, nor the identification of “stable ” landmarks across subjects. Rather, the approach determines automatically from the data what the most consistent alignment of the joint data is, subject to the particular transformation family used to align the subjects. The computational cost of joint simultaneous registration of the population of subjects is small due to the use of an efficient gradient estimate used to solve the optimization transform aligning each subject. The efficacy of the approach in constructing an unbiased statistical atlas was demonstrated by carrying out joint alignment of 20 segmentations of MRI of healthy preterm infants, using an affine transformation model and a FEM volumetric tetrahedral mesh transformation model. 1
Off-line Calibration of Dynamic Traffic Assignment Models
, 2006
"... Advances in Intelligent Transportation Systems (ITS) have resulted in the deployment of surveillance systems that automatically collect and store extensive network-wide traffic data. Dynamic Traffic Assignment (DTA) models have also been developed for a variety of dynamic traffic management applicat ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Advances in Intelligent Transportation Systems (ITS) have resulted in the deployment of surveillance systems that automatically collect and store extensive network-wide traffic data. Dynamic Traffic Assignment (DTA) models have also been developed for a variety of dynamic traffic management applications. Such models are designed to estimate and predict the evolution of congestion through detailed models and algorithms that capture travel demand, network supply and their complex interactions. The availability of rich time-varying traffic data spanning multiple days thus provides the opportunity to calibrate a DTA model’s many inputs and parameters, so that its outputs reflect field conditions. The current
Optimisation Of Particle Filters Using Simultaneous Perturbation Stochastic Approximation
, 2003
"... This paper addresses the optimisation of particle filtering methods aka Sequential Monte Carlo (SMC) methods using stochastic approximation. First, the SMC algorithm is parameterised smoothly by a parameter. Second, optimisation of an average cost function is performed using Simultaneous Perturbatio ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
This paper addresses the optimisation of particle filtering methods aka Sequential Monte Carlo (SMC) methods using stochastic approximation. First, the SMC algorithm is parameterised smoothly by a parameter. Second, optimisation of an average cost function is performed using Simultaneous Perturbation Stochastic Approximation (SPSA). Simulations demonstrate the efficiency of our algorithm.
On the Choice of Random Directions for Stochastic Approximation Algorithms
- IEEE Transactions on Automatic Control
"... We investigate variants of the Kushner-Clark Random Direction Stochastic Approximation (RDSA) algorithm for optimizing noisy loss functions in high-dimensional spaces. These variants employ different strategies for choosing random directions. The most popular approach is random selection from a B ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We investigate variants of the Kushner-Clark Random Direction Stochastic Approximation (RDSA) algorithm for optimizing noisy loss functions in high-dimensional spaces. These variants employ different strategies for choosing random directions. The most popular approach is random selection from a Bernoulli distribution, which for historical reasons goes also by the name Simultaneous Perturbation Stochastic Approximation (SPSA). But viable alternatives include an axis-aligned distribution, a normal distribution, and a uniform distribution on a spherical shell. Although there are special cases where the Bernoulli distribution is optimal, there are other cases where it performs worse than other alternatives. We find that for generic loss functions that are not aligned to the coordinate axes, the average asymptotic performance is depends only on the radial fourth moment of the distribution of directions, and is identical for Bernoulli, the axis-aligned, and the spherical shell distributions. Of these variants, the spherical shell is optimal in the sense of minimum variance over random orientations of the loss function with respect to the coordinate axes. We also show that for unaligned loss functions, the performance of the Keifer-Wolfowitz-Blum Finite Difference Stochastic Approximation (FDSA) is asymptotically equivalent to the RDSA algorithms, and we observe numerically that the pre-asymptotic performance of FDSA is often superior. We also introduce a "quasirandom" selection process which exhibits the same asymptotic performance, but empirically is observed to converge to the asymptote more rapidly.
Sensor placement optimization under uncertainty for structural health monitoring systems of hot aerospace structures
"... iii ACKNOWLEDGEMENTS It is with great sincerity that I thank my mentor and advisor – Dr. Sankaran Mahadevan – for his constant support throughout my career at Vanderbilt University. I thank him for his encouragement, constructive criticism, and belief in my abilities. I thank him especially for his ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
iii ACKNOWLEDGEMENTS It is with great sincerity that I thank my mentor and advisor – Dr. Sankaran Mahadevan – for his constant support throughout my career at Vanderbilt University. I thank him for his encouragement, constructive criticism, and belief in my abilities. I thank him especially for his patience during the countless hours we spent working together. I also wish to thank the members of my committee, Dr. P.K. Basu, Dr. Gautam Biswas, and Dr. Mark Ellingham, for their advice and great attitude toward working with me.
Multi-Modal Non-Rigid Registration Using a Stochastic Gradient Approximation
- In ISBI, 2004c
, 2004
"... We present a new fast implementation of a non-rigid registration algorithm, based on a finite element elastic deformation model using the mutual information metric with a linear elastic regularization constraint. The algorithm was parallelized for symmetric multi-processor architectures. A Simultane ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We present a new fast implementation of a non-rigid registration algorithm, based on a finite element elastic deformation model using the mutual information metric with a linear elastic regularization constraint. The algorithm was parallelized for symmetric multi-processor architectures. A Simultaneous Perturbation Stochastic Approximation (SPSA) optimization scheme was used to maximize the objective function.
Simultaneous Selection of Features and Metric for Optimal Nearest Neighbor Classification
"... Given a set of observations in R n along with provided class labels, C, one is often interested in building a classifier that is a mapping from R n! C. One way to do this is using a simple nearest neighbor classifier. Inherent in the use of this classifier is a metric or pseudo-metric that measures ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Given a set of observations in R n along with provided class labels, C, one is often interested in building a classifier that is a mapping from R n! C. One way to do this is using a simple nearest neighbor classifier. Inherent in the use of this classifier is a metric or pseudo-metric that measures the distance between the observations. One typically uses the L2 metric. We examine the classification benefits of the use Dedicated to Professor Z. Govindarajulu on the occasion of his 70th birthday. *Correspondence: Edward J. Wegman, Center for Computational Statistics, MS
A Neural Network Assisted Cascade Control System for Air Handling Unit
"... Abstract—In the centralized heating, ventilating and air-conditioning (HVAC) system, air handling units (AHUs) are traditionally controlled by single-loop proportional-integral-derivative (PID) controllers. The control structure is simple, but the performance is usually not satisfactory. In this pap ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract—In the centralized heating, ventilating and air-conditioning (HVAC) system, air handling units (AHUs) are traditionally controlled by single-loop proportional-integral-derivative (PID) controllers. The control structure is simple, but the performance is usually not satisfactory. In this paper, we propose a cascade control strategy for temperature control of AHU. Instead of a fixed PID controller in the classical cascade control scheme, a neural network (NN) controller is used in the outer control loop. This approach not only overcomes the tedious tuning procedure for the inner and outer loop PID parameters of a classical cascade control system, but also makes the whole control system be adaptive and robust. The multilayer NN is trained online by a special training algorithm—simultaneous perturbation stochastic approximation (SPSA)-based training algorithm. With the SPSA-based training algorithm, the weight convergence of the NN and stability of the control system is guaranteed. The novel cascade control system has been implemented on an experimental HVAC system. Testing results demonstrate the effectiveness of the proposed algorithm over the classical cascade control system. Index Terms—Air handling units, cascade control, neural networks (NNs), simultaneous perturbation stochastic approximation (SPSA). I.

