Results 1  10
of
244
Using Confidence Bounds for ExploitationExploration Tradeoffs
 Journal of Machine Learning Research
, 2002
"... We show how a standard tool from statistics  namely confidence bounds  can be used to elegantly deal with situations which exhibit an exploitationexploration tradeo#. Our technique for designing and analyzing algorithms for such situations is general and can be applied when an algorithm h ..."
Abstract

Cited by 177 (4 self)
 Add to MetaCart
We show how a standard tool from statistics  namely confidence bounds  can be used to elegantly deal with situations which exhibit an exploitationexploration tradeo#. Our technique for designing and analyzing algorithms for such situations is general and can be applied when an algorithm has to make exploitationversusexploration decisions based on uncertain information provided by a random process.
Adaptive game playing using multiplicative weights
 GAMES AND ECONOMIC BEHAVIOR
, 1999
"... We present a simple algorithm for playing a repeated game. We show that a player using this algorithm suffers average loss that is guaranteed to come close to the minimum loss achievable by any fixed strategy. Our bounds are nonasymptotic and hold for any opponent. The algorithm, which uses the mult ..."
Abstract

Cited by 165 (17 self)
 Add to MetaCart
We present a simple algorithm for playing a repeated game. We show that a player using this algorithm suffers average loss that is guaranteed to come close to the minimum loss achievable by any fixed strategy. Our bounds are nonasymptotic and hold for any opponent. The algorithm, which uses the multiplicativeweight methods of Littlestone and Warmuth, is analyzed using the Kullback–Liebler divergence. This analysis yields a new, simple proof of the min–max theorem, as well as a provable method of approximately solving a game. A variant of our gameplaying algorithm is proved to be optimal in a very strong sense.
Regret in the Online Decision Problem
, 1999
"... At each point in time a decision maker must choose a decision. The payoff in a period from the decision chosen depends on the decision as well as the state of the world that obtains at that time. The difficulty is that the decision must be made in advance of any knowledge, even probabilistic, about ..."
Abstract

Cited by 129 (2 self)
 Add to MetaCart
At each point in time a decision maker must choose a decision. The payoff in a period from the decision chosen depends on the decision as well as the state of the world that obtains at that time. The difficulty is that the decision must be made in advance of any knowledge, even probabilistic, about which state of the world will obtain. A range of problems from a variety of disciplines can be framed in this way. In this
Nearly tight bounds for the continuumarmed bandit problem
 Advances in Neural Information Processing Systems 17
, 2005
"... In the multiarmed bandit problem, an online algorithm must choose from a set of strategies in a sequence of n trials so as to minimize the total cost of the chosen strategies. While nearly tight upper and lower bounds are known in the case when the strategy set is finite, much less is known when th ..."
Abstract

Cited by 121 (7 self)
 Add to MetaCart
(Show Context)
In the multiarmed bandit problem, an online algorithm must choose from a set of strategies in a sequence of n trials so as to minimize the total cost of the chosen strategies. While nearly tight upper and lower bounds are known in the case when the strategy set is finite, much less is known when there is an infinite strategy set. Here we consider the case when the set of strategies is a subset of R d, and the cost functions are continuous. In the d = 1 case, we improve on the bestknown upper and lower bounds, closing the gap to a sublogarithmic factor. We also consider the case where d> 1 and the cost functions are convex, adapting a recent online convex optimization algorithm of Zinkevich to the sparser feedback model of the multiarmed bandit problem. 1
Shopbots and Pricebots
, 1999
"... Shopbots are agents that automatically search the Internet to obtain information about prices and other attributes of goods and services. They herald a future in which autonomous agents profoundly influence electronic markets. In this study, a simple economic model is proposed and analyzed, which is ..."
Abstract

Cited by 108 (13 self)
 Add to MetaCart
Shopbots are agents that automatically search the Internet to obtain information about prices and other attributes of goods and services. They herald a future in which autonomous agents profoundly influence electronic markets. In this study, a simple economic model is proposed and analyzed, which is intended to quantify some of the likely impacts of a proliferation of shopbots and other economicallymotivated software agents. In addition, this paper reports on simulations of pricebots  adaptive, pricesetting agents which firms may well implement to combat, or even take advantage of, the growing community of shopbots. This study forms part of a larger research program that aims to provide insights into the impact of agent technology on the nascent information economy.
A survey of Monte Carlo tree search methods
 IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI
, 2012
"... Monte Carlo Tree Search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a ra ..."
Abstract

Cited by 101 (17 self)
 Add to MetaCart
(Show Context)
Monte Carlo Tree Search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm’s derivation, impart some structure on the many variations and enhancements that have been proposed, and summarise the results from the key game and nongame domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work.
AWESOME: A general multiagent learning algorithm that converges in selfplay and learns a best response against stationary opponents
, 2003
"... A satisfactory multiagent learning algorithm should, at a minimum, learn to play optimally against stationary opponents and converge to a Nash equilibrium in selfplay. The algorithm that has come closest, WoLFIGA, has been proven to have these two properties in 2player 2action repeated games— as ..."
Abstract

Cited by 98 (5 self)
 Add to MetaCart
A satisfactory multiagent learning algorithm should, at a minimum, learn to play optimally against stationary opponents and converge to a Nash equilibrium in selfplay. The algorithm that has come closest, WoLFIGA, has been proven to have these two properties in 2player 2action repeated games— assuming that the opponent’s (mixed) strategy is observable. In this paper we present AWESOME, the first algorithm that is guaranteed to have these two properties in all repeated (finite) games. It requires only that the other players ’ actual actions (not their strategies) can be observed at each step. It also learns to play optimally against opponents that eventually become stationary. The basic idea behind AWESOME (Adapt When Everybody is Stationary, Otherwise Move to Equilibrium) is to try to adapt to the others’ strategies when they appear stationary, but otherwise to retreat to a precomputed equilibrium strategy. The techniques used to prove the properties of AWESOME are fundamentally different from those used for previous algorithms, and may help in analyzing other multiagent learning algorithms also.
Competitive online statistics
 International Statistical Review
, 1999
"... A radically new approach to statistical modelling, which combines mathematical techniques of Bayesian statistics with the philosophy of the theory of competitive online algorithms, has arisen over the last decade in computer science (to a large degree, under the influence of Dawid’s prequential sta ..."
Abstract

Cited by 96 (15 self)
 Add to MetaCart
(Show Context)
A radically new approach to statistical modelling, which combines mathematical techniques of Bayesian statistics with the philosophy of the theory of competitive online algorithms, has arisen over the last decade in computer science (to a large degree, under the influence of Dawid’s prequential statistics). In this approach, which we call “competitive online statistics”, it is not assumed that data are generated by some stochastic mechanism; the bounds derived for the performance of competitive online statistical procedures are guaranteed to hold (and not just hold with high probability or on the average). This paper reviews some results in this area; the new material in it includes the proofs for the performance of the Aggregating Algorithm in the problem of linear regression with square loss. Keywords: Bayes’s rule, competitive online algorithms, linear regression, prequential statistics, worstcase analysis.
Sequential prediction of individual sequences under general loss functions
 IEEE Trans. on Information Theory
, 1998
"... Abstract—We consider adaptive sequential prediction of arbitrary binary sequences when the performance is evaluated using a general loss function. The goal is to predict on each individual sequence nearly as well as the best prediction strategy in a given comparison class of (possibly adaptive) pre ..."
Abstract

Cited by 93 (8 self)
 Add to MetaCart
(Show Context)
Abstract—We consider adaptive sequential prediction of arbitrary binary sequences when the performance is evaluated using a general loss function. The goal is to predict on each individual sequence nearly as well as the best prediction strategy in a given comparison class of (possibly adaptive) prediction strategies, called experts. By using a general loss function, we generalize previous work on universal prediction, forecasting, and data compression. However, here we restrict ourselves to the case when the comparison class is finite. For a given sequence, we define the regret as the total loss on the entire sequence suffered by the adaptive sequential predictor, minus the total loss suffered by the predictor in the comparison class that performs best on that particular sequence. We show that for a large class of loss functions, the minimax regret is either (log N)