Results 1  10
of
277
Asynchronous stochastic approximation and Qlearning
 Machine Learning
, 1994
"... Abstract £ We provide some general results on the convergence of a class of stochastic approximation algorithms and their parallel and asynchronous variants. We then use these results to study the Qlearning algorithm, areinforcement learning method for solving Markov decision problems, and establi ..."
Abstract

Cited by 160 (3 self)
 Add to MetaCart
Abstract £ We provide some general results on the convergence of a class of stochastic approximation algorithms and their parallel and asynchronous variants. We then use these results to study the Qlearning algorithm, areinforcement learning method for solving Markov decision problems, and establish its convergence under conditions more general than previously available.
Opportunistic Fair Scheduling over Multiple Wireless Channels
, 2003
"... Emerging spread spectrum highspeed data networks utilize multiple channels via orthogonal codes or frequencyhopping patterns such that multiple users can transmit concurrently. In this paper, we develop a framework for opportunistic scheduling over multiple wireless channels. With a realistic chan ..."
Abstract

Cited by 82 (4 self)
 Add to MetaCart
Emerging spread spectrum highspeed data networks utilize multiple channels via orthogonal codes or frequencyhopping patterns such that multiple users can transmit concurrently. In this paper, we develop a framework for opportunistic scheduling over multiple wireless channels. With a realistic channel model, any subset of users can be selected for data transmission at any time, albeit with different throughputs and system resource requirements. We first transform selection of the best users and rates from a complex general optimization problem into a decoupled and tractable formulation: a multiuser scheduling problem that maximizes total system throughput and a controlupdate problem that ensures longterm deterministic or probabilistic fairness constraints. We then design and evaluate practical schedulers that approximate these objectives.
Optimization via simulation: a review
 Annals of Operations Research
, 1994
"... We review techniques for optimizing stochastic discreteevent systems via simulation. We discuss both the discrete parameter case and the continuous parameter case, but concentrate on the latter which has dominated most of the recent research in the area. For the discrete parameter case, we focus on ..."
Abstract

Cited by 69 (21 self)
 Add to MetaCart
We review techniques for optimizing stochastic discreteevent systems via simulation. We discuss both the discrete parameter case and the continuous parameter case, but concentrate on the latter which has dominated most of the recent research in the area. For the discrete parameter case, we focus on the techniques for optimization from a finite set: multiplecomparison procedures and rankingandselection procedures. For the continuous parameter case, we focus on gradientbased methods, including perturbation analysis, the likelihood ratio method, and frequency domain experimentation. For illustrative purposes, we compare and contrast the implementation of the techniques for some simple discreteevent systems such as the (s, S) inventory system and the GI/G/1 queue. Finally, we speculate on future directions for the field, particularly in the context of the rapid advances being made in parallel computing.
TD(λ) Converges with Probability 1
, 1994
"... The methods of temporal differences (Samuel, 1959; Sutton, 1984, 1988) allow an agent to learn accurate predictions of stationary stochastic future outcomes. The learning is effectively stochastic approximation based on samples extracted from the process generating the agent's future. Sutton (1 ..."
Abstract

Cited by 59 (2 self)
 Add to MetaCart
The methods of temporal differences (Samuel, 1959; Sutton, 1984, 1988) allow an agent to learn accurate predictions of stationary stochastic future outcomes. The learning is effectively stochastic approximation based on samples extracted from the process generating the agent's future. Sutton (1988) proved that for a special case of temporal differences, the expected values of the predictions converge to their correct values, as larger samples are taken, and Dayan (1992) extended his proof to the general case. This article proves the stronger result that the predictions of a slightly modified form of temporal difference learning converge with probability one, and shows how to quantify the rate of convergence.
Independent Component Analysis by General Nonlinear Hebbianlike Learning Rules
 Signal Processing
, 1998
"... A number of neural learning rules have been recently proposed... In this paper, we show that in fact, ICA can be performed by very simple Hebbian or antiHebbian learning rules, which may have only weak relations to such informationtheoretical quantities. Rather suprisingly, practically any nonlin ..."
Abstract

Cited by 57 (11 self)
 Add to MetaCart
A number of neural learning rules have been recently proposed... In this paper, we show that in fact, ICA can be performed by very simple Hebbian or antiHebbian learning rules, which may have only weak relations to such informationtheoretical quantities. Rather suprisingly, practically any nonlinear function can be used in the learning rule, provided only that the sign of the Hebbian/antiHebbian term is chosen correctly. In addition to the Hebbianlike mechanism, the weight vector is here constrained to have unit norm, and the data is preprocessed by prewhitening, or sphering. These results imply that one can choose the nonlinearity so as to optimize desired statistical or numerical criteria.
Memorybased Stochastic Optimization
 Neural Information Processing Systems 8
, 1995
"... In this paper we introduce new algorithms for optimizing noisy plants in which each experiment is very expensive. The algorithms build a global nonlinear model of the expected output at the same time as using Bayesian linear regression analysis of locally weighted polynomial models. The local model ..."
Abstract

Cited by 44 (7 self)
 Add to MetaCart
In this paper we introduce new algorithms for optimizing noisy plants in which each experiment is very expensive. The algorithms build a global nonlinear model of the expected output at the same time as using Bayesian linear regression analysis of locally weighted polynomial models. The local model answers queries about confidence, noise, gradient and Hessians, and use them to make automated decisions similar to those made by a practitioner of Response Surface Methodology. The global and local models are combined naturally as a locally weighted regression. We examine the question of whether the global model can really help optimization, and we extend it to the case of timevarying functions. We compare the new algorithms with a highly tuned higherorder stochastic optimization algorithm on randomlygenerated functions and a simulated manufacturing task. We note significant improvements in total regret, time to converge, and final solution quality. 1 INTRODUCTION In a stochastic optim...
The Asymptotic Efficiency Of Simulation Estimators
 Operations Research
, 1992
"... A decisiontheoretic framework is proposed for evaluating the efficiency of simulation estimators. The framework includes the cost of obtaining the estimate as well as the cost of acting based on the estimate. The cost of obtaining the estimate and the estimate itself are represented as realizations ..."
Abstract

Cited by 43 (14 self)
 Add to MetaCart
A decisiontheoretic framework is proposed for evaluating the efficiency of simulation estimators. The framework includes the cost of obtaining the estimate as well as the cost of acting based on the estimate. The cost of obtaining the estimate and the estimate itself are represented as realizations of jointly distributed stochastic processes. In this context, the efficiency of a simulation estimator based on a given computational budget is defined as the reciprocal of the risk (the overall expected cost). This framework is appealing philosophically, but it is often difficult to apply in practice (e.g., to compare the efficiency of two different estimators) because only rarely can the efficiency associated with a given computational budget be calculated. However, a useful practical framework emerges in a large sample context when we consider the limiting behavior as the computational budget increases. A limit theorem established for this model supports and extends a fairly well known e...
Learning in spiking neural networks by reinforcement of stochastic synaptic transmission
 Neuron
, 2003
"... prising and potentially detrimental to brain function. But another possibility is that synaptic unreliability is used by the brain for the purposes of learning (Minsky, 1954; Hinton, 1989), in analogy to the way in which unreliable genetic replication is used for evolution. Here I propose a specific ..."
Abstract

Cited by 42 (7 self)
 Add to MetaCart
prising and potentially detrimental to brain function. But another possibility is that synaptic unreliability is used by the brain for the purposes of learning (Minsky, 1954; Hinton, 1989), in analogy to the way in which unreliable genetic replication is used for evolution. Here I propose a specific implementation of this idea. According to the proposal, synapses are “hedonistic,” responding to a global reward signal by increasing their probabilities of release or failure, depending on which action immediately preceded reward. Remarkably, if each synapse in a network behaves hedonistically, selfishly seeking reward, then the network as a whole behaves hedonistically, learning to increase its average reward by generating appropriate collective actions. This statement can be formulated and justified mathematically
Energy Functions for SelfOrganizing Maps
, 1999
"... This paper is about the last issue. After people started to realize that there is no energy function for the Kohonen learning rule (in the continuous case), many attempts have been made to change the algorithm such that an energy can be defined, without drastically changing its properties. Here we w ..."
Abstract

Cited by 42 (1 self)
 Add to MetaCart
This paper is about the last issue. After people started to realize that there is no energy function for the Kohonen learning rule (in the continuous case), many attempts have been made to change the algorithm such that an energy can be defined, without drastically changing its properties. Here we will review a simple suggestion, which has been proposed 2 and generalized in several different contexts. The advantage over some other attempts is its simplicity: we only need to redefine the determination of the winning ("best matching") unit. The energy function and corresponding learning algorithm are introduced in Section 2. We give two proofs that there is indeed a proper energy function. The first one, in Section 3, is based on explicit computation of derivatives. The second one, in Section 4 follows from a limiting case of a more general (free) energy function derived in a probabilistic setting. The energy formalism allows for a direct interpretation of disordered configurations in terms of local minima, two examples of which are treated in Section 5.