Results 1 - 10
of
20
The nonstochastic multiarmed bandit problem
- SIAM Journal on Computing
, 2002
"... In the multi-armed bandit problem, a gambler must decide which arm of £ non-identical slot machines to play in a sequence of trials so as to maximize his reward. This classical problem has received much attention because of the simple model it provides of the trade-off between exploration (trying ou ..."
Abstract
-
Cited by 204 (16 self)
- Add to MetaCart
In the multi-armed bandit problem, a gambler must decide which arm of £ non-identical slot machines to play in a sequence of trials so as to maximize his reward. This classical problem has received much attention because of the simple model it provides of the trade-off between exploration (trying out each arm to find the best one) and exploitation (playing the arm believed to give the best payoff). Past solutions for the bandit problem have almost always relied on assumptions about the statistics of the slot machines. In this work, we make no statistical assumptions whatsoever about the nature of the process generating the payoffs of the slot machines. We give a solution to the bandit problem in which an adversary, rather than a well-behaved stochastic process, has complete control over the payoffs. In a sequence of ¤ plays, we prove that the per-round payoff of our algorithm approaches that of the best arm at the rate ¥§¦¨¤�©������� �. We show by a matching lower bound that this is best possible. We also prove that our algorithm approaches the per-round payoff of any set of strategies at a similar rate: if the best strategy is chosen from a pool of � strategies then our algorithm approaches the per-round payoff of the strategy at the rate ¥ ¦��¨���� � �§ � ���� � ¤ ©����� � �. Finally, we apply our results to the problem of playing an unknown repeated matrix game. We show that our algorithm approaches the minimax payoff of the unknown game at the rate ¥ ¦ ¤ ©����� � �.
Gambling in a rigged casino: The adversarial multi-armed bandit problem
, 1995
"... In the multi-armed bandit problem, a gambler must decide which arm of K non-identical slot machines to play in a sequence of trials so as to maximize his reward. This classical problem has received much attention because of the simple model it provides of the trade-off between exploration (trying ou ..."
Abstract
-
Cited by 144 (6 self)
- Add to MetaCart
In the multi-armed bandit problem, a gambler must decide which arm of K non-identical slot machines to play in a sequence of trials so as to maximize his reward. This classical problem has received much attention because of the simple model it provides of the trade-off between exploration (trying out each arm to find the best one) and exploitation (playing the arm believed to give the best payoff). Past solutions for the bandit problem have almost always relied on assumptions about the statistics of the slot machines. In this work, we make no statistical assumptions whatsoever about the nature of the process generating the payoffs of the slot machines. We give a solution to the bandit problem in which an adversary, rather than a well-behaved stochastic process, has complete control over the payoffs. In a sequence of T plays, we prove that the expected per-round payoff of our algorithm approaches that of the best arm at the rate O(T \Gamma1=2 ), and we give an improved rate of conver...
Game Theory, On-line Prediction and Boosting
- In Proceedings of the Ninth Annual Conference on Computational Learning Theory
, 1996
"... We study the close connections between game theory, on-line prediction and boosting. After a brief review of game theory, we describe an algorithm for learning to play repeated games based on the on-line prediction methods of Littlestone and Warmuth. The analysis of this algorithm yields a simple pr ..."
Abstract
-
Cited by 117 (13 self)
- Add to MetaCart
We study the close connections between game theory, on-line prediction and boosting. After a brief review of game theory, we describe an algorithm for learning to play repeated games based on the on-line prediction methods of Littlestone and Warmuth. The analysis of this algorithm yields a simple proof of von Neumann's famous minmax theorem, as well as a provable method of approximately solving a game. We then show that the on-line prediction model is obtained by applying this gameplaying algorithm to an appropriate choice of game and that boosting is obtained by applying the same algorithm to the "dual" of this game. 1 INTRODUCTION The purpose of this paper is to bring out the close connections between game theory, on-line prediction and boosting. Briefly, game theory is the study of games and other interactions of various sorts. On-line prediction is a learning model in which an agent predicts the classification of a sequence of items and attempts to minimize the total number of pre...
Adaptive Game Playing Using Multiplicative Weights
"... this paper, we present a simple algorithm for solving this problem, and give a simple analysis of the algorithm. The bounds we obtain are not asymptotic and hold for any finite number of rounds. The algorithm and its analysis are based directly on the "on-line prediction" methods of Littlestone and ..."
Abstract
-
Cited by 106 (14 self)
- Add to MetaCart
this paper, we present a simple algorithm for solving this problem, and give a simple analysis of the algorithm. The bounds we obtain are not asymptotic and hold for any finite number of rounds. The algorithm and its analysis are based directly on the "on-line prediction" methods of Littlestone and Warmuth [24]. The analysis of this algorithm yields a new (as far as we know) and simple proof of von Neumann's minmax theorem, as well as a provable method of approximately solving a game. We also give more refined variants of the algorithm for this purpose, and we show that one of these is optimal in a very strong sense. The paper is organized as follows. In Section 2 we define the mathematical setup and notation. In Section 3 we introduce the basic multiplicative weights algorithm whose average performance is guaranteed to be almost as good as that of the best fixed mixed strategy. In Section 4 we outline the relationship between our work and some of the extensive existing work on the use of multiplicative weights algorithms for on-line prediction. In Section 5 we show how the algorithm can be used to give a simple proof of Von-Neumann's min-max theorem. In Section 6 we give a version of the algorithm whose distributions are guaranteed to converge to an optimal mixed strategy. We note the possible application of this algorithm to solving linear programming problems and reference other work that have used multiplicative weights to this end. Finally, in Section 7 we show that the convergence rate of the second version of the algorithm is asymptotically optimal. 2 Playing repeated games
Regret in the On-line Decision Problem
, 1999
"... At each point in time a decision maker must choose a decision. The payoff in a period from the decision chosen depends on the decision as well as the state of the world that obtains at that time. The difficulty is that the decision must be made in advance of any knowledge, even probabilistic, about ..."
Abstract
-
Cited by 98 (2 self)
- Add to MetaCart
At each point in time a decision maker must choose a decision. The payoff in a period from the decision chosen depends on the decision as well as the state of the world that obtains at that time. The difficulty is that the decision must be made in advance of any knowledge, even probabilistic, about which state of the world will obtain. A range of problems from a variety of disciplines can be framed in this way. In this
A Game of Prediction with Expert Advice
- Journal of Computer and System Sciences
, 1997
"... We consider the following problem. At each point of discrete time the learner must make a prediction; he is given the predictions made by a pool of experts. Each prediction and the outcome, which is disclosed after the learner has made his prediction, determine the incurred loss. It is known that, u ..."
Abstract
-
Cited by 86 (6 self)
- Add to MetaCart
We consider the following problem. At each point of discrete time the learner must make a prediction; he is given the predictions made by a pool of experts. Each prediction and the outcome, which is disclosed after the learner has made his prediction, determine the incurred loss. It is known that, under weak regularity, the learner can ensure that his cumulative loss never exceeds cL+ a ln n, where c and a are some constants, n is the size of the pool, and L is the cumulative loss incurred by the best expert in the pool. We find the set of those pairs (c; a) for which this is true.
Shopbots and Pricebots
, 1999
"... Shopbots are agents that automatically search the Internet to obtain information about prices and other attributes of goods and services. They herald a future in which autonomous agents profoundly influence electronic markets. In this study, a simple economic model is proposed and analyzed, which is ..."
Abstract
-
Cited by 84 (11 self)
- Add to MetaCart
Shopbots are agents that automatically search the Internet to obtain information about prices and other attributes of goods and services. They herald a future in which autonomous agents profoundly influence electronic markets. In this study, a simple economic model is proposed and analyzed, which is intended to quantify some of the likely impacts of a proliferation of shopbots and other economically-motivated software agents. In addition, this paper reports on simulations of pricebots - adaptive, pricesetting agents which firms may well implement to combat, or even take advantage of, the growing community of shopbots. This study forms part of a larger research program that aims to provide insights into the impact of agent technology on the nascent information economy.
On-line algorithms in machine learning
- IN FIAT, AND WOEGINGER., EDS., ONLINE ALGORITHMS: THE STATE OF THE ART
, 1998
"... The areas of On-Line Algorithms and Machine Learning are both concerned with problems of making decisions about the present based only on knowledge of the past. Although these areas differ in terms of their emphasis and the problems typically studied, there are a collection of results in Computation ..."
Abstract
-
Cited by 46 (2 self)
- Add to MetaCart
The areas of On-Line Algorithms and Machine Learning are both concerned with problems of making decisions about the present based only on knowledge of the past. Although these areas differ in terms of their emphasis and the problems typically studied, there are a collection of results in Computational Learning Theory that fit nicely into the "on-line algorithms" framework. This survey article discusses some of the results, models, and open problems from Computational Learning Theory that seem particularly interesting from the point of view of on-line algorithms. The emphasis in this article is on describing some of the simpler, more intuitive results, whose proofs can be given in their entirity. Pointers to the literature are given for more sophisticated versions of these algorithms.
Asymptotic calibration
- Biometrika
, 1998
"... Can we forecast the probability of an arbitrary sequence of events happening so that the stated probability of an event happening is close to its empirical probability? We can view this prediction problem as a game played against nature, where at the beginning of the game Nature picks a data sequenc ..."
Abstract
-
Cited by 45 (4 self)
- Add to MetaCart
Can we forecast the probability of an arbitrary sequence of events happening so that the stated probability of an event happening is close to its empirical probability? We can view this prediction problem as a game played against nature, where at the beginning of the game Nature picks a data sequence and the forecaster picks a forecasting algorithm. If the forecaster is not allowed to randomize, then Nature win; there will always be data for which the forecaster does poorly. This paper shows that, if the forecaster can randomize, the forecaster wins in the sense that the forecasted probabilities and the empirical probabilities can be made arbitrarily close to each other.
Universal Portfolios With and Without Transaction Costs
- Machine Learning
, 1997
"... A constant rebalanced portfolio is an investment strategy which keeps the same distribution of wealth among a set of stocks from period to period. Recently there has been work on on-line investment strategies that are competitive with the best constant rebalanced portfolio determined in hindsight (C ..."
Abstract
-
Cited by 44 (3 self)
- Add to MetaCart
A constant rebalanced portfolio is an investment strategy which keeps the same distribution of wealth among a set of stocks from period to period. Recently there has been work on on-line investment strategies that are competitive with the best constant rebalanced portfolio determined in hindsight (Cover, 1991; Helmbold et al., 1996; Cover and Ordentlich, 1996a; Cover and Ordentlich, 1996b; Ordentlich and Cover, 1996; Cover, 1996). For the universal algorithm of Cover (Cover, 1991), we provide a simple analysis which naturally extends to the case of a fixed percentage transaction cost (commission), answering a question raised in (Cover, 1991; Helmbold et al., 1996; Cover and Ordentlich, 1996a; Cover and Ordentlich, 1996b; Ordentlich and Cover, 1996; Cover, 1996). In addition, we present a simple randomized implementation that is significantly faster in practice. We conclude by explaining how these algorithms can be applied to other problems, such as combining the predictions of statis...

