• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 3,023
Next 10 →

On average versus discounted reward temporal-difference learning

by John N. Tsitsiklis, Benjamin Van Roy, Satinder Singh - Machine Learning , 2002
"... Abstract. We provide an analytical comparison between discounted and average reward temporal-difference (TD) learning with linearly parameterized approximations. We first consider the asymptotic behavior of the two algorithms. We show that as the discount factor approaches 1, the value function prod ..."
Abstract - Cited by 13 (2 self) - Add to MetaCart
Abstract. We provide an analytical comparison between discounted and average reward temporal-difference (TD) learning with linearly parameterized approximations. We first consider the asymptotic behavior of the two algorithms. We show that as the discount factor approaches 1, the value function

LETTER Communicated by A. David Redish A Neurocomputational Model for Cocaine Addiction

by Hamed Ekhtiari, Azarakhsh Mokri, Amir Dezfouli, Payam Piray, Mohammad Mahdi Keramati, Hamed Ekhtiari, Caro Lucas, Azarakhsh Mokri
"... Based on the dopamine hypotheses of cocaine addiction and the assump-tion of decrement of brain reward system sensitivity after long-term drug exposure, we propose a computational model for cocaine addiction. Uti-lizing average reward temporal difference reinforcement learning, we incorporate the el ..."
Abstract - Add to MetaCart
Based on the dopamine hypotheses of cocaine addiction and the assump-tion of decrement of brain reward system sensitivity after long-term drug exposure, we propose a computational model for cocaine addiction. Uti-lizing average reward temporal difference reinforcement learning, we incorporate

LETTER Communicated by A. David Redish A Neurocomputational Model for Cocaine Addiction

by Amir Dezfouli, Payam Piray, Mohammad Mahdi Keramati, Hamed Ekhtiari, Caro Lucas, Azarakhsh Mokri
"... Based on the dopamine hypotheses of cocaine addiction and the assumption of decrement of brain reward system sensitivity after long-term drug exposure, we propose a computational model for cocaine addiction. Utilizing average reward temporal difference reinforcement learning, we incorporate the elev ..."
Abstract - Add to MetaCart
Based on the dopamine hypotheses of cocaine addiction and the assumption of decrement of brain reward system sensitivity after long-term drug exposure, we propose a computational model for cocaine addiction. Utilizing average reward temporal difference reinforcement learning, we incorporate

Policy gradient methods for reinforcement learning with function approximation.

by Richard S Sutton , David Mcallester , Satinder Singh , Yishay Mansour - In NIPS, , 1999
"... Abstract Function approximation is essential to reinforcement learning, but the standard approach of approximating a value function and determining a policy from it has so far proven theoretically intractable. In this paper we explore an alternative approach in which the policy is explicitly repres ..."
Abstract - Cited by 439 (20 self) - Add to MetaCart
output is action selection probabilities, and whose weights are the policy parameters. Let θ denote the vector of policy parameters and ρ the performance of the corresponding policy (e.g., the average reward per step). Then, in the policy gradient approach, the policy parameters are updated approximately

Average cost temporal-difference learning

by John N. Tsitsiklis, Benjamin Van Roy , 1999
"... We propose a variant of temporal-difference learning that approximates average and differential costs of an irreducible aperiodic Markov chain. Approximations are comprised of linear combinations of fixed basis functions whose weights are incrementally updated during a single endless trajectory of t ..."
Abstract - Cited by 27 (4 self) - Add to MetaCart
We propose a variant of temporal-difference learning that approximates average and differential costs of an irreducible aperiodic Markov chain. Approximations are comprised of linear combinations of fixed basis functions whose weights are incrementally updated during a single endless trajectory

Temporal-difference networks

by Richard S. Sutton, Brian Tanner - In Advances in Neural Information Processing Systems 17 , 2005
"... We introduce a generalization of temporal-difference (TD) learning to networks of interrelated predictions. Rather than relating a single prediction to itself at a later time, as in conventional TD methods, a TD network relates each prediction in a set of predictions to other predictions in the set ..."
Abstract - Cited by 44 (8 self) - Add to MetaCart
world knowledge in entirely predictive, grounded terms. Temporal-difference (TD) learning is widely used in reinforcement learning methods to learn moment-to-moment predictions of total future reward (value functions). In this setting, TD learning is often simpler and more data-efficient than other

Conditional skewness in asset pricing tests

by Campbell R. Harvey, Akhtar Siddique - Journal of Finance , 2000
"... If asset returns have systematic skewness, expected returns should include rewards for accepting this risk. We formalize this intuition with an asset pricing model that incorporates conditional skewness. Our results show that conditional skewness helps explain the cross-sectional variation of expect ..."
Abstract - Cited by 342 (6 self) - Add to MetaCart
If asset returns have systematic skewness, expected returns should include rewards for accepting this risk. We formalize this intuition with an asset pricing model that incorporates conditional skewness. Our results show that conditional skewness helps explain the cross-sectional variation

Temporal difference models and reward-related learning in the human brain

by Peter Dayan, Karl Friston, Hugo Critchley, London Wcn Bg - Neuron , 2003
"... John P. O’Doherty, V(tUCS) and V(tUCS � 1) generates a positive prediction error that, in the simplest form of TD learning, is used to increment the value at time tUCS – 1 (in proportion to ..."
Abstract - Cited by 200 (11 self) - Add to MetaCart
John P. O’Doherty, V(tUCS) and V(tUCS � 1) generates a positive prediction error that, in the simplest form of TD learning, is used to increment the value at time tUCS – 1 (in proportion to

R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning

by Ronen I. Brafman, Moshe Tennenholtz, Pack Kaelbling , 2001
"... R-max is a very simple model-based reinforcement learning algorithm which can attain near-optimal average reward in polynomial time. In R-max, the agent always maintains a complete, but possibly inaccurate model of its environment and acts based on the optimal policy derived from this model. The mod ..."
Abstract - Cited by 297 (10 self) - Add to MetaCart
R-max is a very simple model-based reinforcement learning algorithm which can attain near-optimal average reward in polynomial time. In R-max, the agent always maintains a complete, but possibly inaccurate model of its environment and acts based on the optimal policy derived from this model

The reward circuit: linking primate anatomy and human imaging

by Suzanne N Haber, Brian Knutson - Neuropsychopharmacology , 2010
"... Although cells in many brain regions respond to reward, the cortical-basal ganglia circuit is at the heart of the reward system. The key structures in this network are the anterior cingulate cortex, the orbital prefrontal cortex, the ventral striatum, the ventral pallidum, and the midbrain dopamine ..."
Abstract - Cited by 220 (3 self) - Add to MetaCart
between these areas forms a complex neural network that mediates different aspects of reward processing. Advances in neuroimaging techniques allow better spatial and temporal resolution. These studies now demonstrate that human functional and structural imaging results map increasingly close to primate
Next 10 →
Results 1 - 10 of 3,023
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University