The Asymptotic ConvergenceRate of QIearning
"... szepes((trnath.uszeged.hu In this paper we show that for discounted MDPs with discount factor ' (> 1/2 the asymptotic rate of convergence of Qlearning is O(I/tR(lO)) if R(1 ':I) < 1/2 and O ( Jlog log t/t) otherwise provided that the stateaction pairs are sampled from a fixed pr ..."
Abstract
probability distribution. Here R = Pmin/PmRx is the ratio of the minimum and rnaximurn stateaction occupation frequencies. The results extend to convergent oIlline learning provided that Pmin> 0, where Pmin and Pmax now become the minimum and maximum stateaction occupation frequencies corresponding
MARKOV DECISION PROBLEMS AND STATEACTION FREQUENCIES*
Consider a controlled Markov chain with countable state and action spaces. Basic quantities that determine the values of average cost functionals are identified. Under some regularity conditions, these turn out to be a collection of numbers, one for each stateaction pair, describing
Abstract
Abstract. Consider a controlled Markov chain with countable state and action spaces. Basic quantities that determine the values of average cost functionals are identified. Under some regularity conditions, these turn out to be a collection of numbers, one for each stateaction pair, describing
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications
, 2001
"... Modern architecture research relies heavily on detailed pipeline simulation. Simulating the full execution of an industry standard benchmark can take weeks to months to complete. To overcome this problem researchers choose a very small portion of a program's execution to evaluate their results, ..."
Abstract

Cited by 315 (31 self)
, cache miss rate, value misprediction, address misprediction, and reorder buffer occupancy). Since basic block frequencies can be collected using very fast profiling tools, our approach provides a practical technique for finding the periodicity and simulation points in applications.
On the Empirical StateAction Frequencies in Markov Decision Processes Under General Policies
 Math. of Operations Research
, 2004
We consider the empirical stateaction frequencies and the empirical reward in weakly communicating finitestate Markov decision processes under general policies. We define a certain polytope and establish that every element of this polytope is the limit of the empirical frequency vector, under some
Abstract

Cited by 11 (1 self)
We consider the empirical stateaction frequencies and the empirical reward in weakly communicating finitestate Markov decision processes under general policies. We define a certain polytope and establish that every element of this polytope is the limit of the empirical frequency vector, under
Behavioral theories and the neurophysiology of reward,
 Annu. Rev. Psychol.
, 2006
"... ■ Abstract The functions of rewards are based primarily on their effects on behavior and are less directly governed by the physics and chemistry of input events as in sensory systems. Therefore, the investigation of neural mechanisms underlying reward functions requires behavioral theories that can ..."
Abstract

Cited by 187 (0 self)
reward mechanisms. The framework is based on the description of observable behavior and superficially resembles the behaviorist approach, although mental states 92 SCHULTZ of representation and prediction are essential. Dropping the issues of subjective feelings of pleasure will allow us to do objective
NEURAL EXCITABILITY, SPIKING AND BURSTING
, 2000
Bifurcation mechanisms involved in the generation of action potentials (spikes) by neurons are reviewed here. We show how the type of bifurcation determines the neurocomputational properties of the cells. For example, when the rest state is near a saddlenode bifurcation, the cell can fire all
Abstract

Cited by 145 (4 self)
Bifurcation mechanisms involved in the generation of action potentials (spikes) by neurons are reviewed here. We show how the type of bifurcation determines the neurocomputational properties of the cells. For example, when the rest state is near a saddlenode bifurcation, the cell can fire all
Regime shifts, resilience, and biodiversity in ecosystem management
 Annu. Rev. Ecol. Evol. Syst
, 2004
Key Words alternate states, regime shifts, response diversity, complex adaptive systems, ecosystem services ■ Abstract We review the evidence of regime shifts in terrestrial and aquatic environments in relation to resilience of complex adaptive ecosystems and the functional roles of biological
Abstract

Cited by 146 (6 self)
Key Words alternate states, regime shifts, response diversity, complex adaptive systems, ecosystem services ■ Abstract We review the evidence of regime shifts in terrestrial and aquatic environments in relation to resilience of complex adaptive ecosystems and the functional roles of biological
