Results 1  10
of
209
Natural Gradient Works Efficiently in Learning
 Neural Computation
, 1998
"... When a parameter space has a certain underlying structure, the ordinary gradient of a function does not represent its steepest direction but the natural gradient does. Information geometry is used for calculating the natural gradients in the parameter space of perceptrons, the space of matrices (for ..."
Abstract

Cited by 289 (16 self)
 Add to MetaCart
When a parameter space has a certain underlying structure, the ordinary gradient of a function does not represent its steepest direction but the natural gradient does. Information geometry is used for calculating the natural gradients in the parameter space of perceptrons, the space of matrices (for blind source separation) and the space of linear dynamical systems (for blind source deconvolution). The dynamical behavior of natural gradient online learning is analyzed and is proved to be Fisher efficient, implying that it has asymptotically the same performance as the optimal batch estimation of parameters. This suggests that the plateau phenomenon which appears in the backpropagation learning algorithm of multilayer perceptrons might disappear or might be not so serious when the natural gradient is used. An adaptive method of updating the learning rate is proposed and analyzed. 1 Introduction The stochastic gradient method (Widrow, 1963; Amari, 1967; Tsypkin, 1973; Rumelhart et al...
Asynchronous Stochastic Approximation and QLearning
 Machine Learning
, 1994
"... Abstract. We provide some general results on the convergence of a class of stochastic approximation algorithms and their parallel and asynchronous variants. We then use these results to study the Qlearning algorithm, a reinforcement learning method for solving Markov decision problems, and establis ..."
Abstract

Cited by 149 (3 self)
 Add to MetaCart
Abstract. We provide some general results on the convergence of a class of stochastic approximation algorithms and their parallel and asynchronous variants. We then use these results to study the Qlearning algorithm, a reinforcement learning method for solving Markov decision problems, and establish its convergence under conditions more general than previously available. Keywords: Reinforcement learning, Qlearning, dynamic programming, stochastic approximation 1.
Opportunistic Fair Scheduling over Multiple Wireless Channels
, 2003
"... Emerging spread spectrum highspeed data networks utilize multiple channels via orthogonal codes or frequencyhopping patterns such that multiple users can transmit concurrently. In this paper, we develop a framework for opportunistic scheduling over multiple wireless channels. With a realistic chan ..."
Abstract

Cited by 80 (3 self)
 Add to MetaCart
Emerging spread spectrum highspeed data networks utilize multiple channels via orthogonal codes or frequencyhopping patterns such that multiple users can transmit concurrently. In this paper, we develop a framework for opportunistic scheduling over multiple wireless channels. With a realistic channel model, any subset of users can be selected for data transmission at any time, albeit with different throughputs and system resource requirements. We first transform selection of the best users and rates from a complex general optimization problem into a decoupled and tractable formulation: a multiuser scheduling problem that maximizes total system throughput and a controlupdate problem that ensures longterm deterministic or probabilistic fairness constraints. We then design and evaluate practical schedulers that approximate these objectives.
Optimization via simulation: a review
 Annals of Operations Research
, 1994
"... We review techniques for optimizing stochastic discreteevent systems via simulation. We discuss both the discrete parameter case and the continuous parameter case, but concentrate on the latter which has dominated most of the recent research in the area. For the discrete parameter case, we focus on ..."
Abstract

Cited by 67 (20 self)
 Add to MetaCart
We review techniques for optimizing stochastic discreteevent systems via simulation. We discuss both the discrete parameter case and the continuous parameter case, but concentrate on the latter which has dominated most of the recent research in the area. For the discrete parameter case, we focus on the techniques for optimization from a finite set: multiplecomparison procedures and rankingandselection procedures. For the continuous parameter case, we focus on gradientbased methods, including perturbation analysis, the likelihood ratio method, and frequency domain experimentation. For illustrative purposes, we compare and contrast the implementation of the techniques for some simple discreteevent systems such as the (s, S) inventory system and the GI/G/1 queue. Finally, we speculate on future directions for the field, particularly in the context of the rapid advances being made in parallel computing.
Independent Component Analysis by General Nonlinear Hebbianlike Learning Rules
 Signal Processing
, 1998
"... A number of neural learning rules have been recently proposed... In this paper, we show that in fact, ICA can be performed by very simple Hebbian or antiHebbian learning rules, which may have only weak relations to such informationtheoretical quantities. Rather suprisingly, practically any nonlin ..."
Abstract

Cited by 56 (11 self)
 Add to MetaCart
A number of neural learning rules have been recently proposed... In this paper, we show that in fact, ICA can be performed by very simple Hebbian or antiHebbian learning rules, which may have only weak relations to such informationtheoretical quantities. Rather suprisingly, practically any nonlinear function can be used in the learning rule, provided only that the sign of the Hebbian/antiHebbian term is chosen correctly. In addition to the Hebbianlike mechanism, the weight vector is here constrained to have unit norm, and the data is preprocessed by prewhitening, or sphering. These results imply that one can choose the nonlinearity so as to optimize desired statistical or numerical criteria.
TD(λ) Converges with Probability 1
, 1994
"... The methods of temporal differences (Samuel, 1959; Sutton, 1984, 1988) allow an agent to learn accurate predictions of stationary stochastic future outcomes. The learning is effectively stochastic approximation based on samples extracted from the process generating the agent's future. Sutton (1988) ..."
Abstract

Cited by 55 (2 self)
 Add to MetaCart
The methods of temporal differences (Samuel, 1959; Sutton, 1984, 1988) allow an agent to learn accurate predictions of stationary stochastic future outcomes. The learning is effectively stochastic approximation based on samples extracted from the process generating the agent's future. Sutton (1988) proved that for a special case of temporal differences, the expected values of the predictions converge to their correct values, as larger samples are taken, and Dayan (1992) extended his proof to the general case. This article proves the stronger result that the predictions of a slightly modified form of temporal difference learning converge with probability one, and shows how to quantify the rate of convergence.
The Asymptotic Efficiency Of Simulation Estimators
 Operations Research
, 1992
"... A decisiontheoretic framework is proposed for evaluating the efficiency of simulation estimators. The framework includes the cost of obtaining the estimate as well as the cost of acting based on the estimate. The cost of obtaining the estimate and the estimate itself are represented as realizations ..."
Abstract

Cited by 43 (14 self)
 Add to MetaCart
A decisiontheoretic framework is proposed for evaluating the efficiency of simulation estimators. The framework includes the cost of obtaining the estimate as well as the cost of acting based on the estimate. The cost of obtaining the estimate and the estimate itself are represented as realizations of jointly distributed stochastic processes. In this context, the efficiency of a simulation estimator based on a given computational budget is defined as the reciprocal of the risk (the overall expected cost). This framework is appealing philosophically, but it is often difficult to apply in practice (e.g., to compare the efficiency of two different estimators) because only rarely can the efficiency associated with a given computational budget be calculated. However, a useful practical framework emerges in a large sample context when we consider the limiting behavior as the computational budget increases. A limit theorem established for this model supports and extends a fairly well known e...
Escaping Nash Inflation
 Review of Economic Studies
, 2002
"... . Mean dynamics describe the convergence to selfconfirming equilibria of selfreferential systems under discounted least squares learning. Escape dynamics recurrently propel away from a selfconfirming equilibrium. In a model with a unique selfconfirming equilibrium, the escape dynamics make the go ..."
Abstract

Cited by 41 (16 self)
 Add to MetaCart
. Mean dynamics describe the convergence to selfconfirming equilibria of selfreferential systems under discounted least squares learning. Escape dynamics recurrently propel away from a selfconfirming equilibrium. In a model with a unique selfconfirming equilibrium, the escape dynamics make the government discover too strong a version of the natural rate hypothesis. The escape route dynamics cause recurrent outcomes close to the Ramsey (commitment) inflation rate in a model with an adaptive government. Key Words: Selfconfirming equilibrium, mean dynamics, escape route, large deviation, natural rate of unemployment, adaptation, experimentation trap. `If an unlikely event occurs, it is very likely to occur in the most likely way.' Michael Harrison 1. INTRODUCTION Building on work by Sims (1988) and Chung (1990), Sargent (1999) showed how a government adaptively fitting an approximating Phillips curve model recurrently sets inflation near the optimal timeinconsistent ouctome, althoug...
Incremental Natural ActorCritic Algorithms
"... We present four new reinforcement learning algorithms based on actorcritic and naturalgradient ideas, and provide their convergence proofs. Actorcritic reinforcement learning methods are online approximations to policy iteration in which the valuefunction parameters are estimated using temporal ..."
Abstract

Cited by 41 (3 self)
 Add to MetaCart
We present four new reinforcement learning algorithms based on actorcritic and naturalgradient ideas, and provide their convergence proofs. Actorcritic reinforcement learning methods are online approximations to policy iteration in which the valuefunction parameters are estimated using temporal difference learning and the policy parameters are updated by stochastic gradient descent. Methods based on policy gradients in this way are of special interest because of their compatibility with function approximation methods, which are needed to handle large or infinite state spaces. The use of temporal difference learning in this way is of interest because in many applications it dramatically reduces the variance of the gradient estimates. The use of the natural gradient is of interest because it can produce better conditioned parameterizations and has been shown to further reduce variance in some cases. Our results extend prior twotimescale convergence results for actorcritic methods by Konda and Tsitsiklis by using temporal difference learning in the actor and by incorporating natural gradients, and they extend prior empirical studies of natural actorcritic methods by Peters, Vijayakumar and Schaal by providing the first convergence proofs and the first fully incremental algorithms. 1
Energy Functions for SelfOrganizing Maps
, 1999
"... This paper is about the last issue. After people started to realize that there is no energy function for the Kohonen learning rule (in the continuous case), many attempts have been made to change the algorithm such that an energy can be defined, without drastically changing its properties. Here we w ..."
Abstract

Cited by 41 (1 self)
 Add to MetaCart
This paper is about the last issue. After people started to realize that there is no energy function for the Kohonen learning rule (in the continuous case), many attempts have been made to change the algorithm such that an energy can be defined, without drastically changing its properties. Here we will review a simple suggestion, which has been proposed 2 and generalized in several different contexts. The advantage over some other attempts is its simplicity: we only need to redefine the determination of the winning ("best matching") unit. The energy function and corresponding learning algorithm are introduced in Section 2. We give two proofs that there is indeed a proper energy function. The first one, in Section 3, is based on explicit computation of derivatives. The second one, in Section 4 follows from a limiting case of a more general (free) energy function derived in a probabilistic setting. The energy formalism allows for a direct interpretation of disordered configurations in terms of local minima, two examples of which are treated in Section 5.