Results 11  20
of
316
An Overview of the Simultaneous Perturbation Method for Efficient Optimization
"... This article is an introduction to the simultaneous perturbation stochastic approximation (SPSA) algorithm for stochastic optimization of multivariate systems. Optimization algorithms play a critical role in the design, analysis, and control of most engineering systems and are in widespread use in t ..."
Abstract

Cited by 58 (1 self)
 Add to MetaCart
This article is an introduction to the simultaneous perturbation stochastic approximation (SPSA) algorithm for stochastic optimization of multivariate systems. Optimization algorithms play a critical role in the design, analysis, and control of most engineering systems and are in widespread use in the work of APL and other organizations: The future, in fact, will be full of [optimization] algorithms. They are becoming part of almost everything. They are moving up the complexity chain to make entire companies more efficient. They also are moving down the chain as computers spread. (USA Today, 31 Dec 1997) Before presenting the SPSA algorithm, we provide some general background on the stochastic optimization context of interest here
Sequential KarhunenLoeve Basis Extraction and its Application to Images
 IEEE Transactions on Image processing
"... The KarhunenLoeve (KL) Transform is an optimal method for approximating a set of vectors, which was used in image processing and computer vision for several tasks. Its computational demands and its batch calculation nature have limited its application. Here we present a new, sequential algorithm fo ..."
Abstract

Cited by 58 (0 self)
 Add to MetaCart
The KarhunenLoeve (KL) Transform is an optimal method for approximating a set of vectors, which was used in image processing and computer vision for several tasks. Its computational demands and its batch calculation nature have limited its application. Here we present a new, sequential algorithm for calculating the KL basis, which is faster in typical applications and is especially advantageous for image sequences: the KL basis calculation is done with much lower delay and allows for dynamic updating of object databases for recognition. Systematic tests of the implemented algorithm show that these advantages are indeed obtained with the same accuracy available from batch KL algorithms. 1 Introduction The KarhunenLoeve (KL) transform [1] is a preferred method for approximating a set of vectors by a low dimensional subspace. The method provides the optimal subspace, spanned by the KL basis, which minimizes the MSE between the given set of vectors and their projections on the subspace...
Online EM Algorithm for the Normalized Gaussian Network
, 1999
"... A Normalized Gaussian Network (NGnet) (Moody and Darken 1989) is a network of local linear regression units. The model softly partitions the input space by normalized Gaussian functions and each local unit linearly approximates the output within the partition. In this article, we propose a new on ..."
Abstract

Cited by 55 (6 self)
 Add to MetaCart
A Normalized Gaussian Network (NGnet) (Moody and Darken 1989) is a network of local linear regression units. The model softly partitions the input space by normalized Gaussian functions and each local unit linearly approximates the output within the partition. In this article, we propose a new online EM algorithm for the NGnet, which is derived from the batch EM algorithm (Xu, Jordan and Hinton 1995) by introducing a discount factor. We show that the online EM algorithm is equivalent to the batch EM algorithm if a specific scheduling of the discount factor is employed. In addition, we show that the online EM algorithm can be considered as a stochastic approximation method to find the maximum likelihood estimator. A new regularization method is proposed in order to deal with a singular input distribution. In order to manage dynamic environments, where the inputoutput distribution of data changes over time, unit manipulation mechanisms such as unit production, unit deletion...
On the ergodicity properties of some adaptive MCMC algorithms
 Annals of Applied Probability
"... In this paper we study the ergodicity properties of some adaptive Monte Carlo Markov chain algorithms (MCMC) that have been recently proposed in the literature. We prove that under a set of verifiable conditions, ergodic averages calculated from the output of a socalled adaptive MCMC sampler conver ..."
Abstract

Cited by 54 (7 self)
 Add to MetaCart
In this paper we study the ergodicity properties of some adaptive Monte Carlo Markov chain algorithms (MCMC) that have been recently proposed in the literature. We prove that under a set of verifiable conditions, ergodic averages calculated from the output of a socalled adaptive MCMC sampler converge to the required value and can even, under more stringent assumptions, satisfy a central limit theorem. We prove that the conditions required are satisfied for the Independent MetropolisHastings algorithm and the Random Walk Metropolis algorithm with symmetric increments. Finally we propose an application of these results to the case where the proposal distribution of the MetropolisHastings update is a mixture of distributions from a curved exponential family.
Sastry,”Varieties of Learning Automata: An Overview
 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS
, 2002
"... Abstract—Automata models of learning systems introduced in the 1960s were popularized as learning automata (LA) in a survey paper in 1974 [1]. Since then, there have been many fundamental advances in the theory as well as applications of these learning models. In the past few years, the structure of ..."
Abstract

Cited by 51 (0 self)
 Add to MetaCart
Abstract—Automata models of learning systems introduced in the 1960s were popularized as learning automata (LA) in a survey paper in 1974 [1]. Since then, there have been many fundamental advances in the theory as well as applications of these learning models. In the past few years, the structure of LA has been modified in several directions to suit different applications. Concepts such as parameterized learning automata (PLA), generalized learning automata (GLA), and continuous actionset learning automata (CALA) have been proposed, analyzed, and applied to solve many significant learning problems. Furthermore, groups of LA forming teams and feedforward networks have been shown to converge to desired solutions under appropriate learning algorithms. Modules of LA have been used for parallel operation with consequent increase in speed of convergence. All of these concepts and results are relatively new and are scattered in technical literature. An attempt has been made in this paper to bring together the main ideas involved in a unified framework and provide pointers to relevant references. Index Terms—Continuous actionset learning automata (CALA), generalized learning automata (GLA), modules of learning automata, parameterized learning automata (PLA), teams and networks of learning automata. I.
Transmission Scheduling for Efficient Wireless Utilization
 in Proceedings of IEEE INFOCOM ’01
, 2001
"... We present an "opportunistic" transmission scheduling policy that exploits timevarying channel conditions and maximizes the system performance stochastically under a certain resource allocation fairness constraint. We establish the optimality of the scheduling scheme and also describe a practical s ..."
Abstract

Cited by 50 (3 self)
 Add to MetaCart
We present an "opportunistic" transmission scheduling policy that exploits timevarying channel conditions and maximizes the system performance stochastically under a certain resource allocation fairness constraint. We establish the optimality of the scheduling scheme and also describe a practical scheduling procedure to implement our scheme. Through simulation results, we show that the scheme also works well for nonstationary scenarios and results in performance improvements of 20150% compared with a scheduling scheme that does not take into account channel conditions. Furthermore, we note that in wireless networks, an important role of resource allocation is to balance the system performance and fairness among "good" and "bad" users. We propose three heuristic timefraction assignment schemes, which approach the problem from different viewpoints. KeywordsScheduling, fairness, wireless, highratedata. I.
Stability of Stochastic Approximation Under Verifiable Conditions
 SIAM J. Control and Optimization
, 2005
"... procedure In this paper we address the problem of the stability and convergence of the stochastic approximation θn+1 = θn + γn+1[h(θn) + ξn+1]. The stability of such sequences {θn} is known to heavily rely on the behaviour of the mean field h at the boundary of the parameter set and the magnitude of ..."
Abstract

Cited by 49 (6 self)
 Add to MetaCart
procedure In this paper we address the problem of the stability and convergence of the stochastic approximation θn+1 = θn + γn+1[h(θn) + ξn+1]. The stability of such sequences {θn} is known to heavily rely on the behaviour of the mean field h at the boundary of the parameter set and the magnitude of the stepsizes used. The conditions typically required to ensure convergence, and in particular the boundedness or stability of {θn}, are either too difficult to check in practice or not satisfied at all. This is the case even for very simple models. The most popular technique to circumvent the stability problem consists of constraining {θn} to a compact subset K in the parameter space. This is obviously not a satisfactory solution as the choice of K is a delicate one. In the present contribution we first prove a “deterministic ” stability result which relies on simple conditions on the sequences {ξn} and {γn}. We then propose and analyze an algorithm based on projections on adaptive truncation sets which ensures that the aforementioned conditions required for stability are satisfied. We focus in particular on the case where {ξn} is a socalled Markov statedependent noise. We establish both the stability and convergence w.p. 1 of the algorithm under a set of simple and verifiable assumptions. We illustrate our results with an example related to adaptive Markov chain Monte Carlo algorithms. Key words. Stochastic approximation, statedependent noise, randomly varying truncation, Adaptive Markov Chain
A class of mean field interaction models for computer and communication systems
 PERFORM. EVAL
, 2008
"... We consider models of N interacting objects, where the interaction is via a common resource and the distribution of states of all objects. We introduce the key scaling concept of intensity; informally, the expected number of transitions per object per time slot is of the order of the intensity. We c ..."
Abstract

Cited by 47 (0 self)
 Add to MetaCart
We consider models of N interacting objects, where the interaction is via a common resource and the distribution of states of all objects. We introduce the key scaling concept of intensity; informally, the expected number of transitions per object per time slot is of the order of the intensity. We consider the case of vanishing intensity, i.e. the expected number of object transitions per time slot is o(N). We show that, under mild assumptions and for large N, the occupancy measure converges, in mean square (and thus in probability) over any finite horizon, to a deterministic dynamical system. The mild assumption is essentially that the coefficient of variation of the number of object transitions per time slot remains bounded with N. No independence assumption is needed anywhere. The convergence results allow us to derive properties valid in the stationary regime. We discuss when one can assure that a stationary point of the ODE is the large N limit of the stationary probability distribution of the state of one object for the system with N objects. We use this to develop a critique of the fixed point method sometimes used in conjunction with the decoupling assumption.
Escaping Nash Inflation
 Review of Economic Studies
, 2002
"... . Mean dynamics describe the convergence to selfconfirming equilibria of selfreferential systems under discounted least squares learning. Escape dynamics recurrently propel away from a selfconfirming equilibrium. In a model with a unique selfconfirming equilibrium, the escape dynamics make the go ..."
Abstract

Cited by 41 (16 self)
 Add to MetaCart
. Mean dynamics describe the convergence to selfconfirming equilibria of selfreferential systems under discounted least squares learning. Escape dynamics recurrently propel away from a selfconfirming equilibrium. In a model with a unique selfconfirming equilibrium, the escape dynamics make the government discover too strong a version of the natural rate hypothesis. The escape route dynamics cause recurrent outcomes close to the Ramsey (commitment) inflation rate in a model with an adaptive government. Key Words: Selfconfirming equilibrium, mean dynamics, escape route, large deviation, natural rate of unemployment, adaptation, experimentation trap. `If an unlikely event occurs, it is very likely to occur in the most likely way.' Michael Harrison 1. INTRODUCTION Building on work by Sims (1988) and Chung (1990), Sargent (1999) showed how a government adaptively fitting an approximating Phillips curve model recurrently sets inflation near the optimal timeinconsistent ouctome, althoug...
Incremental Natural ActorCritic Algorithms
"... We present four new reinforcement learning algorithms based on actorcritic and naturalgradient ideas, and provide their convergence proofs. Actorcritic reinforcement learning methods are online approximations to policy iteration in which the valuefunction parameters are estimated using temporal ..."
Abstract

Cited by 41 (3 self)
 Add to MetaCart
We present four new reinforcement learning algorithms based on actorcritic and naturalgradient ideas, and provide their convergence proofs. Actorcritic reinforcement learning methods are online approximations to policy iteration in which the valuefunction parameters are estimated using temporal difference learning and the policy parameters are updated by stochastic gradient descent. Methods based on policy gradients in this way are of special interest because of their compatibility with function approximation methods, which are needed to handle large or infinite state spaces. The use of temporal difference learning in this way is of interest because in many applications it dramatically reduces the variance of the gradient estimates. The use of the natural gradient is of interest because it can produce better conditioned parameterizations and has been shown to further reduce variance in some cases. Our results extend prior twotimescale convergence results for actorcritic methods by Konda and Tsitsiklis by using temporal difference learning in the actor and by incorporating natural gradients, and they extend prior empirical studies of natural actorcritic methods by Peters, Vijayakumar and Schaal by providing the first convergence proofs and the first fully incremental algorithms. 1