QLearning for Bandit Problems
 IN PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING
, 1995
"... Multiarmed bandits may be viewed as decompositionallystructured Markov decision processes (MDP's) with potentially verylarge state sets. A particularly elegant methodology for computing optimal policies was developed over twenty ago by Gittins [Gittins & Jones, 1974]. Gittins' a ..."
Abstract

Cited by 15 (1 self)
explores the problem of learning the Gittins indices online without the aid of a process model; it suggests utilizing proc...
The Nonstochastic Multiarmed Bandit Problem
 SIAM JOURNAL OF COMPUTING
, 2002
"... In the multiarmed bandit problem, a gambler must decide which arm of K nonidentical slot machines to play in a sequence of trials so as to maximize his reward. This classical problem has received much attention because of the simple model it provides of the tradeoff between exploration (trying out ..."
Abstract

Cited by 492 (34 self)
In the multiarmed bandit problem, a gambler must decide which arm of K nonidentical slot machines to play in a sequence of trials so as to maximize his reward. This classical problem has received much attention because of the simple model it provides of the tradeoff between exploration (trying
Finitetime analysis of the multiarmed bandit problem
 Machine Learning
, 2002
"... Abstract. Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the search for a balance between exploring the environment to find profitable actions while taking the empirically best action as often as possible. A popular measure of a policy's success in addressing ..."
Abstract

Cited by 804 (15 self)
this dilemma is the regret, that is the loss due to the fact that the globally optimal policy is not followed all the times. One of the simplest examples of the exploration/exploitation dilemma is the multiarmed bandit problem. Lai and Robbins were the first ones to show that the regret for this problem has
OnLine QLearning Using Connectionist Systems
, 1994
"... Reinforcement learning algorithms are a powerful machine learning technique. However, much of the work on these algorithms has been developed with regard to discrete finitestate Markovian problems, which is too restrictive for many realworld environments. Therefore, it is desirable to extend these ..."
Abstract

Cited by 383 (1 self)
of different algorithms based around QLearning (Watkins 1989) combined with the Temporal Difference algorithm (Sutton 1988), including a new algorithm (Modified Connectionist QLearning), and Q() (Peng and Williams 1994). In addition, we present algorithms for applying these updates online during trials
Bayesian Qlearning
 In AAAI/IAAI
, 1998
"... A central problem in learning in complex environments is balancing exploration of untested actions against exploitation of actions that are known to be good. The benefit of exploration can be estimated using the classical notion of Value of Information the expected improvement in future decision ..."
Abstract

Cited by 144 (1 self)
Watkins' Qlearning by maintaining and propagating probability distributions over the Qvalues. These distributions are used to compute a myopic approximation to the value of information for each action and hence to select the action that best balances exploration and exploitation. We establish
Bandit based MonteCarlo Planning
 In: ECML06. Number 4212 in LNCS
, 2006
"... Abstract. For large statespace Markovian Decision Problems MonteCarlo planning is one of the few viable approaches to find nearoptimal solutions. In this paper we introduce a new algorithm, UCT, that applies bandit ideas to guide MonteCarlo planning. In finitehorizon or discounted MDPs the algo ..."
Abstract

Cited by 433 (7 self)
Abstract. For large statespace Markovian Decision Problems MonteCarlo planning is one of the few viable approaches to find nearoptimal solutions. In this paper we introduce a new algorithm, UCT, that applies bandit ideas to guide MonteCarlo planning. In finitehorizon or discounted MDPs
Nonlinear component analysis as a kernel eigenvalue problem

, 1996
"... We describe a new method for performing a nonlinear form of Principal Component Analysis. By the use of integral operator kernel functions, we can efficiently compute principal components in highdimensional feature spaces, related to input space by some nonlinear map; for instance the space of all ..."
Abstract

Cited by 1554 (85 self)
We describe a new method for performing a nonlinear form of Principal Component Analysis. By the use of integral operator kernel functions, we can efficiently compute principal components in highdimensional feature spaces, related to input space by some nonlinear map; for instance the space of all possible 5pixel products in 16x16 images. We give the derivation of the method, along with a discussion of other techniques which can be made nonlinear with the kernel approach; and present first experimental results on nonlinear feature extraction for pattern recognition.
Risk and protective factors for alcohol and other drug problems in adolescence and early adulthood: Implications for substance abuse prevention
 Psychological Bulletin
, 1992
"... The authors suggest that the most promising route to effective strategies for the prevention of adolescent alcohol and other drug problems is through a riskfocused approach. This approach requires the identification of risk factors for drug abuse, identification of methods by which risk factors hav ..."
Abstract

Cited by 693 (18 self)
The authors suggest that the most promising route to effective strategies for the prevention of adolescent alcohol and other drug problems is through a riskfocused approach. This approach requires the identification of risk factors for drug abuse, identification of methods by which risk factors
Object exchange across heterogeneous information sources
 INTERNATIONAL CONFERENCE ON DATA ENGINEERING
, 1995
"... We address the problem of providing integrated access to diverse and dynamic information sources. We explain how this problem differs from the traditional database integration problem and we focus on one aspect of the information integration problem, namely information exchange. We define an object ..."
Abstract

Cited by 513 (57 self)
We address the problem of providing integrated access to diverse and dynamic information sources. We explain how this problem differs from the traditional database integration problem and we focus on one aspect of the information integration problem, namely information exchange. We define an object
