• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 3,586
Next 10 →

The Nonstochastic Multiarmed Bandit Problem

by Peter Auer, Nicolo Cesa-bianchi, Yoav Freund, Robert E. Schapire - SIAM JOURNAL OF COMPUTING , 2002
"... In the multiarmed bandit problem, a gambler must decide which arm of K non-identical slot machines to play in a sequence of trials so as to maximize his reward. This classical problem has received much attention because of the simple model it provides of the trade-off between exploration (trying out ..."
Abstract - Cited by 491 (34 self) - Add to MetaCart
In the multiarmed bandit problem, a gambler must decide which arm of K non-identical slot machines to play in a sequence of trials so as to maximize his reward. This classical problem has received much attention because of the simple model it provides of the trade-off between exploration (trying

Finite-time analysis of the multiarmed bandit problem

by Peter Auer, Paul Fischer, Jyrki Kivinen - Machine Learning , 2002
"... Abstract. Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the search for a balance between exploring the environment to find profitable actions while taking the empirically best action as often as possible. A popular measure of a policy’s success in addressing ..."
Abstract - Cited by 817 (15 self) - Add to MetaCart
, and for all reward distributions with bounded support. Keywords: bandit problems, adaptive allocation rules, finite horizon regret 1.

Decision-Theoretic Planning: Structural Assumptions and Computational Leverage

by Craig Boutilier, Thomas Dean, Steve Hanks - JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH , 1999
"... Planning under uncertainty is a central problem in the study of automated sequential decision making, and has been addressed by researchers in many different fields, including AI planning, decision analysis, operations research, control theory and economics. While the assumptions and perspectives ..."
Abstract - Cited by 515 (4 self) - Add to MetaCart
or plans. Planning problems commonly possess structure in the reward and value functions used to de...

Solving the Distal Reward Problem through Linkage of STDP and Dopamine Signaling

by Eugene M. Izhikevich - CEREBRAL CORTEX , 2007
"... In Pavlovian and instrumental conditioning, reward typically comes seconds after reward-triggering actions, creating an explanatory conundrum known as ‘‘distal reward problem’’: How does the brain know what firing patterns of what neurons are responsible for the reward if 1) the patterns are no long ..."
Abstract - Cited by 74 (0 self) - Add to MetaCart
In Pavlovian and instrumental conditioning, reward typically comes seconds after reward-triggering actions, creating an explanatory conundrum known as ‘‘distal reward problem’’: How does the brain know what firing patterns of what neurons are responsible for the reward if 1) the patterns

LogP: Towards a Realistic Model of Parallel Computation

by David Culler , Richard Karp , David Patterson, Abhijit Sahay, Klaus Erik Schauser, Eunice Santos, Ramesh Subramonian, Thorsten von Eicken , 1993
"... A vast body of theoretical research has focused either on overly simplistic models of parallel computation, notably the PRAM, or overly specific models that have few representatives in the real world. Both kinds of models encourage exploitation of formal loopholes, rather than rewarding developme ..."
Abstract - Cited by 560 (15 self) - Add to MetaCart
A vast body of theoretical research has focused either on overly simplistic models of parallel computation, notably the PRAM, or overly specific models that have few representatives in the real world. Both kinds of models encourage exploitation of formal loopholes, rather than rewarding

Motivation through the Design of Work: Test of a Theory. Organizational Behavior and Human Performance,

by ] Richard Hackman , Grec R Oldham , 1976
"... A model is proposed that specifies the conditions under which individuals will become internally motivated to perform effectively on their jobs. The model focuses on the interaction among three classes of variables: (a) the psychological states of employees that must be present for internally motiv ..."
Abstract - Cited by 622 (2 self) - Add to MetaCart
under government sponsorship are encouraged to express their own judgment freely, this report does not necessarily represent the official opinion or policy of the government. redesign are not fully adequate to meet the problems encountered in their application. Especially troublesome is the paucity

Solving the distal reward problem through linkage of STDP and

by Ssbiomed Centbmc Neuroscience, Open Acceoral Presentation
"... dopamine signaling ..."
Abstract - Add to MetaCart
dopamine signaling

Solving the Distal Reward Problem through Linkage of STDP and Dopamine Signaling

by unknown authors , 2007
"... In Pavlovian and instrumental conditioning, reward typically comes seconds after reward-triggering actions, creating an explanatory conundrum known as ‘‘distal reward problem’’: How does the brain know what firing patterns of what neurons are responsible for the reward if 1) the patterns are no long ..."
Abstract - Add to MetaCart
In Pavlovian and instrumental conditioning, reward typically comes seconds after reward-triggering actions, creating an explanatory conundrum known as ‘‘distal reward problem’’: How does the brain know what firing patterns of what neurons are responsible for the reward if 1) the patterns

Beyond Plan Length: Heuristic Search Planning for Maximum Reward Problems.

by Jason Farquhar, Chris Harris
"... Abstract. Automatic extraction of heuristic estimates has been extremely fruitful in classical planning domains. We present a simple extension to the heuristic extraction process from the well-known HSP and FF systems which allow us to apply them to reward maximisation problems. These extensions inv ..."
Abstract - Add to MetaCart
Abstract. Automatic extraction of heuristic estimates has been extremely fruitful in classical planning domains. We present a simple extension to the heuristic extraction process from the well-known HSP and FF systems which allow us to apply them to reward maximisation problems. These extensions

Doing It Now or Later

by Ted O'Donoghue, Matthew Rabin , 1996
"... Though economists assume that intertemporal preferences are time-consistent, evidence suggests that a person 's relative preference for well-being at an earlier moment over a later moment increases as the earlier moment gets closer. We explore the behavioral and welfare implications of such tim ..."
Abstract - Cited by 326 (9 self) - Add to MetaCart
salient rewards # where the rewards of an action are immediate but any costs are delayed? Second, are people sophisticated #theyforesee future self-control problems # or are they naive # they do not anticipate these self-control problems? Naive people procrastinate activities with salient costs
Next 10 →
Results 1 - 10 of 3,586
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University