Results 1  10
of
13
Nonparametric bandits with covariates
 In COLT
, 2010
"... We consider a bandit problem which involves sequential sampling from two populations (arms). Each arm produces a noisy reward realization which depends on an observable random covariate. The goal is to maximize cumulative expected reward. We derive general lower bounds on the performance of any admi ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
admissible policy, and develop an algorithm whose performance achieves the order of said lower bound up to logarithmic terms. This is done by decomposing the global problem into suitably “localized ” bandit problems. Proofs blend ideas from nonparametric statistics and traditional methods used in the bandit
Machine learning and nonparametric bandit theory
 IEEE Trans. Automat. Contr
, 1995
"... AbstructIn its most basic form, bandit theory is concerned that asymptotically the relative number of times the best arm with the design problem of sequentially choosing members Rom is chosen converges a.s. to one, assuming only independence a given collection of random varisbk 80 ulat the regret, ..."
Abstract

Cited by 19 (2 self)
 Add to MetaCart
AbstructIn its most basic form, bandit theory is concerned that asymptotically the relative number of times the best arm with the design problem of sequentially choosing members Rom is chosen converges a.s. to one, assuming only independence a given collection of random varisbk 80 ulat the regret
Randomized Allocation with Nonparametric Estimation for a MultiArmed Bandit Problem with Covariates
 Annals of Statistics
, 2001
"... We study a multiarmed bandit problem in a setting with covariates available. We take a nonparametric approach to estimate the functional relationship between the response (reward) and the covariates. The estimated relationships and an appropriate randomization are used to select a good arm to play ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
We study a multiarmed bandit problem in a setting with covariates available. We take a nonparametric approach to estimate the functional relationship between the response (reward) and the covariates. The estimated relationships and an appropriate randomization are used to select a good arm to play
Nonparametric Learning Rules from Bandit Experiments: The Eyes have it! ∗
"... How do people learn? We assess, in a distributionfree manner, subjects ’ learning and choice rules in dynamic twoarmed bandit (probabilistic reversal learning) experiments. To aid in identification and estimation, we use auxiliary measures of subjects’ beliefs, in the form of their eyemovements d ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
How do people learn? We assess, in a distributionfree manner, subjects ’ learning and choice rules in dynamic twoarmed bandit (probabilistic reversal learning) experiments. To aid in identification and estimation, we use auxiliary measures of subjects’ beliefs, in the form of their eye
Nonparametric Learning Rules from Bandit Experiments: The Eyes Have It!
, 2011
"... How do people learn? We assess, in a distributionfree manner, subjects ’ learning and choice rules in dynamic twoarmed bandit learning experiment. To aid in identification and estimation, we use auxiliary measures of subjects ’ beliefs, in the form of their eyemovements during the experiment. Our ..."
Abstract
 Add to MetaCart
How do people learn? We assess, in a distributionfree manner, subjects ’ learning and choice rules in dynamic twoarmed bandit learning experiment. To aid in identification and estimation, we use auxiliary measures of subjects ’ beliefs, in the form of their eyemovements during the experiment
THE MULTIARMED BANDIT PROBLEM WITH COVARIATES
, 2013
"... We consider a multiarmed bandit problem in a setting where each arm produces a noisy reward realization which depends on an observable random covariate. As opposed to the traditional static multiarmed bandit problem, this setting allows for dynamically changing rewards that better describe applic ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
We consider a multiarmed bandit problem in a setting where each arm produces a noisy reward realization which depends on an observable random covariate. As opposed to the traditional static multiarmed bandit problem, this setting allows for dynamically changing rewards that better describe
Bayesian Nonparametric Modeling of Individual Differences: A Case Study Using DecisionMaking on Bandit Problems
"... We develop and compare two nonparametric Bayesian approaches for modeling individual differences in cognitive processes. These approaches both allow major discrete differences between groups of people to be modeled, without making strong prior assumptions about how many groups are required. Instead ..."
Abstract
 Add to MetaCart
heuristic model of human decisionmaking on bandit problems, applied to previously reported behavioral data from 451 participants. We conclude that the ability to model both discrete and continuous aspects of individual differences in cognition is important, and that nonparametric approaches are well suited
Optimal Strategy under Unknown Stochastic Environment  Nonparametric LobPass Problem
"... The bandit problem consists of two factors, one being exploration or the collection of information on the environment and the other being the exploitation or taking benefit by choosing the optimal action in the uncertain environment. It is necessary to choose only the optimal actions for the exploit ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
The bandit problem consists of two factors, one being exploration or the collection of information on the environment and the other being the exploitation or taking benefit by choosing the optimal action in the uncertain environment. It is necessary to choose only the optimal actions
Stochastic Game under Unknown Environment  a Strategy for Nonparametric LobPass Problem
"... We treat an online learning model named lobpass problem, that is an extension of the bandit problem. The nonparametric case is considered, and a class of strategies which can obtain O(t ffl ) cumulative regret for arbitrary ffl ? 0 is constructed. It is also shown that no strategy can achieve O( ..."
Abstract
 Add to MetaCart
We treat an online learning model named lobpass problem, that is an extension of the bandit problem. The nonparametric case is considered, and a class of strategies which can obtain O(t ffl ) cumulative regret for arbitrary ffl ? 0 is constructed. It is also shown that no strategy can achieve O
Results 1  10
of
13