Results 1  10
of
3,200,704
Experienceefficient learning in associative bandit problems. ICML
 Proceedings of the Twentythird International Conference on Machine Learning (ICML06
, 2006
"... We formalize the associative bandit problem framework introduced by Kaelbling as a learningtheory problem. The learning environment is modeled as a karmed bandit where arm payoffs are conditioned on an observable input selected on each trial. We show that, if the payoff functions are constrained t ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
We formalize the associative bandit problem framework introduced by Kaelbling as a learningtheory problem. The learning environment is modeled as a karmed bandit where arm payoffs are conditioned on an observable input selected on each trial. We show that, if the payoff functions are constrained
Continuous Time Associative Bandit Problems ∗
"... In this paper we consider an extension of the multiarmed bandit problem. In this generalized setting, the decision maker receives some side information, performs an action chosen from a finite set and then receives a reward. Unlike in the standard bandit settings, performing an action takes a random ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
In this paper we consider an extension of the multiarmed bandit problem. In this generalized setting, the decision maker receives some side information, performs an action chosen from a finite set and then receives a reward. Unlike in the standard bandit settings, performing an action takes a
The Nonstochastic Multiarmed Bandit Problem
 SIAM JOURNAL OF COMPUTING
, 2002
"... In the multiarmed bandit problem, a gambler must decide which arm of K nonidentical slot machines to play in a sequence of trials so as to maximize his reward. This classical problem has received much attention because of the simple model it provides of the tradeoff between exploration (trying out ..."
Abstract

Cited by 492 (34 self)
 Add to MetaCart
In the multiarmed bandit problem, a gambler must decide which arm of K nonidentical slot machines to play in a sequence of trials so as to maximize his reward. This classical problem has received much attention because of the simple model it provides of the tradeoff between exploration (trying
Bandit based MonteCarlo Planning
 In: ECML06. Number 4212 in LNCS
, 2006
"... Abstract. For large statespace Markovian Decision Problems MonteCarlo planning is one of the few viable approaches to find nearoptimal solutions. In this paper we introduce a new algorithm, UCT, that applies bandit ideas to guide MonteCarlo planning. In finitehorizon or discounted MDPs the algo ..."
Abstract

Cited by 433 (7 self)
 Add to MetaCart
Abstract. For large statespace Markovian Decision Problems MonteCarlo planning is one of the few viable approaches to find nearoptimal solutions. In this paper we introduce a new algorithm, UCT, that applies bandit ideas to guide MonteCarlo planning. In finitehorizon or discounted MDPs
Finitetime analysis of the multiarmed bandit problem
 Machine Learning
, 2002
"... Abstract. Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the search for a balance between exploring the environment to find profitable actions while taking the empirically best action as often as possible. A popular measure of a policy’s success in addressing ..."
Abstract

Cited by 804 (15 self)
 Add to MetaCart
this dilemma is the regret, that is the loss due to the fact that the globally optimal policy is not followed all the times. One of the simplest examples of the exploration/exploitation dilemma is the multiarmed bandit problem. Lai and Robbins were the first ones to show that the regret for this problem has
Optimal approximation by piecewise smooth functions and associated variational problems
 Commun. Pure Applied Mathematics
, 1989
"... (Article begins on next page) The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters. Citation Mumford, David Bryant, and Jayant Shah. 1989. Optimal approximations by piecewise smooth functions and associated variational problems. ..."
Abstract

Cited by 1290 (14 self)
 Add to MetaCart
(Article begins on next page) The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters. Citation Mumford, David Bryant, and Jayant Shah. 1989. Optimal approximations by piecewise smooth functions and associated variational problems
A Note on the Confinement Problem
, 1973
"... This not explores the problem of confining a program during its execution so that it cannot transmit information to any other program except its caller. A set of examples attempts to stake out the boundaries of the problem. Necessary conditions for a solution are stated and informally justified. ..."
Abstract

Cited by 532 (0 self)
 Add to MetaCart
This not explores the problem of confining a program during its execution so that it cannot transmit information to any other program except its caller. A set of examples attempts to stake out the boundaries of the problem. Necessary conditions for a solution are stated and informally justified.
The Extended Linear Complementarity Problem
, 1993
"... We consider an extension of the horizontal linear complementarity problem, which we call the extended linear complementarity problem (XLCP). With the aid of a natural bilinear program, we establish various properties of this extended complementarity problem; these include the convexity of the biline ..."
Abstract

Cited by 776 (28 self)
 Add to MetaCart
We consider an extension of the horizontal linear complementarity problem, which we call the extended linear complementarity problem (XLCP). With the aid of a natural bilinear program, we establish various properties of this extended complementarity problem; these include the convexity
The Hungarian method for the assignment problem
 Naval Res. Logist. Quart
, 1955
"... Assuming that numerical scores are available for the performance of each of n persons on each of n jobs, the "assignment problem" is the quest for an assignment of persons to jobs so that the sum of the n scores so obtained is as large as possible. It is shown that ideas latent in the work ..."
Abstract

Cited by 1238 (0 self)
 Add to MetaCart
Assuming that numerical scores are available for the performance of each of n persons on each of n jobs, the "assignment problem" is the quest for an assignment of persons to jobs so that the sum of the n scores so obtained is as large as possible. It is shown that ideas latent
The Symbol Grounding Problem
, 1990
"... There has been much discussion recently about the scope and limits of purely symbolic models of the mind and about the proper role of connectionism in cognitive modeling. This paper describes the "symbol grounding problem": How can the semantic interpretation of a formal symbol system be m ..."
Abstract

Cited by 1072 (18 self)
 Add to MetaCart
There has been much discussion recently about the scope and limits of purely symbolic models of the mind and about the proper role of connectionism in cognitive modeling. This paper describes the "symbol grounding problem": How can the semantic interpretation of a formal symbol system
Results 1  10
of
3,200,704