Results 1  10
of
2,121
The Asymptotic ConvergenceRate of QIearning
"... szepes((trnath.uszeged.hu In this paper we show that for discounted MDPs with discount factor ' (> 1/2 the asymptotic rate of convergence of Qlearning is O(I/tR(lO)) if R(1 ':I) < 1/2 and O ( Jlog log t/t) otherwise provided that the stateaction pairs are sampled from a fixed pr ..."
Abstract
 Add to MetaCart
probability distribution. Here R = Pmin/PmRx is the ratio of the minimum and rnaximurn stateaction occupation frequencies. The results extend to convergent oIlline learning provided that Pmin> 0, where Pmin and Pmax now become the minimum and maximum stateaction occupation frequencies corresponding
The Asymptotic ConvergenceRate of Qlearning
, 1998
"... In this paper we show that for discounted MDPs with discount factor fl ? 1=2 the asymptotic rate of convergence of Qlearning is O(1=t R(1\Gammafl) ) if R(1 \Gamma fl) ! 1=2 and O( p log log t=t) otherwise provided that the stateaction pairs are sampled from a fixed probability distribution. He ..."
Abstract

Cited by 22 (3 self)
 Add to MetaCart
. Here R = p min =pmax is the ratio of the minimum and maximum stateaction occupation frequencies. The results extend to convergent online learning provided that p min ? 0, where p min and pmax now become the minimum and maximum stateaction occupation frequencies corresponding to the stationary
MARKOV DECISION PROBLEMS AND STATEACTION FREQUENCIES*
"... Abstract. Consider a controlled Markov chain with countable state and action spaces. Basic quantities that determine the values of average cost functionals are identified. Under some regularity conditions, these turn out to be a collection of numbers, one for each stateaction pair, describing for e ..."
Abstract
 Add to MetaCart
Abstract. Consider a controlled Markov chain with countable state and action spaces. Basic quantities that determine the values of average cost functionals are identified. Under some regularity conditions, these turn out to be a collection of numbers, one for each stateaction pair, describing
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications
, 2001
"... Modern architecture research relies heavily on detailed pipeline simulation. Simulating the full execution of an industry standard benchmark can take weeks to months to complete. To overcome this problem researchers choose a very small portion of a program's execution to evaluate their results, ..."
Abstract

Cited by 315 (31 self)
 Add to MetaCart
, cache miss rate, value misprediction, address misprediction, and reorder buffer occupancy). Since basic block frequencies can be collected using very fast profiling tools, our approach provides a practical technique for finding the periodicity and simulation points in applications.
On the Empirical StateAction Frequencies in Markov Decision Processes Under General Policies
 Math. of Operations Research
, 2004
"... We consider the empirical stateaction frequencies and the empirical reward in weakly communicating finitestate Markov decision processes under general policies. We define a certain polytope and establish that every element of this polytope is the limit of the empirical frequency vector, under some ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
We consider the empirical stateaction frequencies and the empirical reward in weakly communicating finitestate Markov decision processes under general policies. We define a certain polytope and establish that every element of this polytope is the limit of the empirical frequency vector, under
© 2005 INFORMS On the Empirical StateAction Frequencies in Markov Decision Processes Under General Policies
, 2003
"... doi 10.1287/moor.1050.0148 ..."
DOI 10.1287/moor.xxxx.xxxx c○20xx INFORMS On the Empirical StateAction Frequencies in Markov Decision Processes Under General Policies
"... web.mit.edu / ∼jnt/www/home.html We consider the empirical stateaction frequencies and the empirical reward in weakly communicating finitestate Markov decision processes under general policies. We define a certain polytope and establish that every element of this polytope is the limit of the empir ..."
Abstract
 Add to MetaCart
web.mit.edu / ∼jnt/www/home.html We consider the empirical stateaction frequencies and the empirical reward in weakly communicating finitestate Markov decision processes under general policies. We define a certain polytope and establish that every element of this polytope is the limit
Behavioral theories and the neurophysiology of reward,
 Annu. Rev. Psychol.
, 2006
"... ■ Abstract The functions of rewards are based primarily on their effects on behavior and are less directly governed by the physics and chemistry of input events as in sensory systems. Therefore, the investigation of neural mechanisms underlying reward functions requires behavioral theories that can ..."
Abstract

Cited by 187 (0 self)
 Add to MetaCart
reward mechanisms. The framework is based on the description of observable behavior and superficially resembles the behaviorist approach, although mental states 92 SCHULTZ of representation and prediction are essential. Dropping the issues of subjective feelings of pleasure will allow us to do objective
NEURAL EXCITABILITY, SPIKING AND BURSTING
, 2000
"... Bifurcation mechanisms involved in the generation of action potentials (spikes) by neurons are reviewed here. We show how the type of bifurcation determines the neurocomputational properties of the cells. For example, when the rest state is near a saddlenode bifurcation, the cell can fire allorn ..."
Abstract

Cited by 145 (4 self)
 Add to MetaCart
Bifurcation mechanisms involved in the generation of action potentials (spikes) by neurons are reviewed here. We show how the type of bifurcation determines the neurocomputational properties of the cells. For example, when the rest state is near a saddlenode bifurcation, the cell can fire all
Regime shifts, resilience, and biodiversity in ecosystem management
 Annu. Rev. Ecol. Evol. Syst
, 2004
"... Key Words alternate states, regime shifts, response diversity, complex adaptive systems, ecosystem services ■ Abstract We review the evidence of regime shifts in terrestrial and aquatic environments in relation to resilience of complex adaptive ecosystems and the functional roles of biological dive ..."
Abstract

Cited by 146 (6 self)
 Add to MetaCart
Key Words alternate states, regime shifts, response diversity, complex adaptive systems, ecosystem services ■ Abstract We review the evidence of regime shifts in terrestrial and aquatic environments in relation to resilience of complex adaptive ecosystems and the functional roles of biological
Results 1  10
of
2,121