Results 1  10
of
286,717
Contextual Bandit Learning with Predictable Rewards
"... Contextual bandit learning is a reinforcement learning problem where the learner repeatedly receives a set of features (context), takes an action and receives a reward based on the action and context. We consider this problem under a realizability assumption: there exists a function in a (known) fun ..."
Abstract
 Add to MetaCart
Contextual bandit learning is a reinforcement learning problem where the learner repeatedly receives a set of features (context), takes an action and receives a reward based on the action and context. We consider this problem under a realizability assumption: there exists a function in a (known
Finitetime analysis of the multiarmed bandit problem
 Machine Learning
, 2002
"... Abstract. Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the search for a balance between exploring the environment to find profitable actions while taking the empirically best action as often as possible. A popular measure of a policy’s success in addressing ..."
Abstract

Cited by 804 (15 self)
 Add to MetaCart
, and for all reward distributions with bounded support. Keywords: bandit problems, adaptive allocation rules, finite horizon regret 1.
Predictive reward signal of dopamine neurons
 Journal of Neurophysiology
, 1998
"... Schultz, Wolfram. Predictive reward signal of dopamine neurons. is called rewards, which elicit and reinforce approach behavJ. Neurophysiol. 80: 1–27, 1998. The effects of lesions, receptor ior. The functions of rewards were developed further during blocking, electrical selfstimulation, and drugs ..."
Abstract

Cited by 717 (12 self)
 Add to MetaCart
Schultz, Wolfram. Predictive reward signal of dopamine neurons. is called rewards, which elicit and reinforce approach behavJ. Neurophysiol. 80: 1–27, 1998. The effects of lesions, receptor ior. The functions of rewards were developed further during blocking, electrical selfstimulation, and drugs
Bandit based MonteCarlo Planning
 In: ECML06. Number 4212 in LNCS
, 2006
"... Abstract. For large statespace Markovian Decision Problems MonteCarlo planning is one of the few viable approaches to find nearoptimal solutions. In this paper we introduce a new algorithm, UCT, that applies bandit ideas to guide MonteCarlo planning. In finitehorizon or discounted MDPs the algo ..."
Abstract

Cited by 433 (7 self)
 Add to MetaCart
Abstract. For large statespace Markovian Decision Problems MonteCarlo planning is one of the few viable approaches to find nearoptimal solutions. In this paper we introduce a new algorithm, UCT, that applies bandit ideas to guide MonteCarlo planning. In finitehorizon or discounted MDPs
Predicting How People Play Games: Reinforcement Learning . . .
 AMERICAN ECONOMIC REVIEW
, 1998
"... ..."
A MetaAnalytic Review of Experiments Examining the Effects of Extrinsic Rewards on Intrinsic Motivation
"... A metaanalysis of 128 studies examined the effects of extrinsic rewards on intrinsic motivation. As predicted, engagementcontingent, completioncontingent, and performancecontingent rewards significantly undermined freechoice intrinsic motivation (d =0.40,0.36, and0.28, respectively), as did ..."
Abstract

Cited by 602 (16 self)
 Add to MetaCart
A metaanalysis of 128 studies examined the effects of extrinsic rewards on intrinsic motivation. As predicted, engagementcontingent, completioncontingent, and performancecontingent rewards significantly undermined freechoice intrinsic motivation (d =0.40,0.36, and0.28, respectively
Reinforcement Learning I: Introduction
, 1998
"... In which we try to give a basic intuitive sense of what reinforcement learning is and how it differs and relates to other fields, e.g., supervised learning and neural networks, genetic algorithms and artificial life, control theory. Intuitively, RL is trial and error (variation and selection, search ..."
Abstract

Cited by 5500 (120 self)
 Add to MetaCart
In which we try to give a basic intuitive sense of what reinforcement learning is and how it differs and relates to other fields, e.g., supervised learning and neural networks, genetic algorithms and artificial life, control theory. Intuitively, RL is trial and error (variation and selection
Boosting a Weak Learning Algorithm By Majority
, 1995
"... We present an algorithm for improving the accuracy of algorithms for learning binary concepts. The improvement is achieved by combining a large number of hypotheses, each of which is generated by training the given learning algorithm on a different set of examples. Our algorithm is based on ideas pr ..."
Abstract

Cited by 516 (15 self)
 Add to MetaCart
We present an algorithm for improving the accuracy of algorithms for learning binary concepts. The improvement is achieved by combining a large number of hypotheses, each of which is generated by training the given learning algorithm on a different set of examples. Our algorithm is based on ideas
Text Chunking using TransformationBased Learning
, 1995
"... Eric Brill introduced transformationbased learning and showed that it can do partofspeech tagging with fairly high accuracy. The same method can be applied at a higher level of textual interpretation for locating chunks in the tagged text, including nonrecursive "baseNP" chunks. For ..."
Abstract

Cited by 509 (0 self)
 Add to MetaCart
Eric Brill introduced transformationbased learning and showed that it can do partofspeech tagging with fairly high accuracy. The same method can be applied at a higher level of textual interpretation for locating chunks in the tagged text, including nonrecursive "baseNP" chunks
Results 1  10
of
286,717