Results 1  10
of
149
SCHAPIRE: Adaptive game playing using multiplicative weights
 Games and Economic Behavior
, 1999
"... We present a simple algorithm for playing a repeated game. We show that a player using this algorithm suffers average loss that is guaranteed to come close to the minimum loss achievable by any fixed strategy. Our bounds are nonasymptotic and hold for any opponent. The algorithm, which uses the mult ..."
Abstract

Cited by 134 (14 self)
 Add to MetaCart
We present a simple algorithm for playing a repeated game. We show that a player using this algorithm suffers average loss that is guaranteed to come close to the minimum loss achievable by any fixed strategy. Our bounds are nonasymptotic and hold for any opponent. The algorithm, which uses the multiplicativeweight methods of Littlestone and Warmuth, is analyzed using the Kullback–Liebler divergence. This analysis yields a new, simple proof of the min–max theorem, as well as a provable method of approximately solving a game. A variant of our gameplaying algorithm is proved to be optimal in a very strong sense. Journal of Economic Literature
Regret in the Online Decision Problem
, 1999
"... At each point in time a decision maker must choose a decision. The payoff in a period from the decision chosen depends on the decision as well as the state of the world that obtains at that time. The difficulty is that the decision must be made in advance of any knowledge, even probabilistic, about ..."
Abstract

Cited by 115 (2 self)
 Add to MetaCart
At each point in time a decision maker must choose a decision. The payoff in a period from the decision chosen depends on the decision as well as the state of the world that obtains at that time. The difficulty is that the decision must be made in advance of any knowledge, even probabilistic, about which state of the world will obtain. A range of problems from a variety of disciplines can be framed in this way. In this
Using Confidence Bounds for ExploitationExploration Tradeoffs
 Journal of Machine Learning Research
, 2002
"... We show how a standard tool from statistics  namely confidence bounds  can be used to elegantly deal with situations which exhibit an exploitationexploration tradeo#. Our technique for designing and analyzing algorithms for such situations is general and can be applied when an algorithm h ..."
Abstract

Cited by 109 (2 self)
 Add to MetaCart
We show how a standard tool from statistics  namely confidence bounds  can be used to elegantly deal with situations which exhibit an exploitationexploration tradeo#. Our technique for designing and analyzing algorithms for such situations is general and can be applied when an algorithm has to make exploitationversusexploration decisions based on uncertain information provided by a random process.
Shopbots and Pricebots
, 1999
"... Shopbots are agents that automatically search the Internet to obtain information about prices and other attributes of goods and services. They herald a future in which autonomous agents profoundly influence electronic markets. In this study, a simple economic model is proposed and analyzed, which is ..."
Abstract

Cited by 89 (12 self)
 Add to MetaCart
Shopbots are agents that automatically search the Internet to obtain information about prices and other attributes of goods and services. They herald a future in which autonomous agents profoundly influence electronic markets. In this study, a simple economic model is proposed and analyzed, which is intended to quantify some of the likely impacts of a proliferation of shopbots and other economicallymotivated software agents. In addition, this paper reports on simulations of pricebots  adaptive, pricesetting agents which firms may well implement to combat, or even take advantage of, the growing community of shopbots. This study forms part of a larger research program that aims to provide insights into the impact of agent technology on the nascent information economy.
AWESOME: A general multiagent learning algorithm that converges in selfplay and learns a best response against stationary opponents
, 2003
"... A satisfactory multiagent learning algorithm should, at a minimum, learn to play optimally against stationary opponents and converge to a Nash equilibrium in selfplay. The algorithm that has come closest, WoLFIGA, has been proven to have these two properties in 2player 2action repeated games— as ..."
Abstract

Cited by 81 (5 self)
 Add to MetaCart
A satisfactory multiagent learning algorithm should, at a minimum, learn to play optimally against stationary opponents and converge to a Nash equilibrium in selfplay. The algorithm that has come closest, WoLFIGA, has been proven to have these two properties in 2player 2action repeated games— assuming that the opponent’s (mixed) strategy is observable. In this paper we present AWESOME, the first algorithm that is guaranteed to have these two properties in all repeated (finite) games. It requires only that the other players ’ actual actions (not their strategies) can be observed at each step. It also learns to play optimally against opponents that eventually become stationary. The basic idea behind AWESOME (Adapt When Everybody is Stationary, Otherwise Move to Equilibrium) is to try to adapt to the others’ strategies when they appear stationary, but otherwise to retreat to a precomputed equilibrium strategy. The techniques used to prove the properties of AWESOME are fundamentally different from those used for previous algorithms, and may help in analyzing other multiagent learning algorithms also.
Nearly tight bounds for the continuumarmed bandit problem
 Advances in Neural Information Processing Systems 17
, 2005
"... In the multiarmed bandit problem, an online algorithm must choose from a set of strategies in a sequence of n trials so as to minimize the total cost of the chosen strategies. While nearly tight upper and lower bounds are known in the case when the strategy set is finite, much less is known when th ..."
Abstract

Cited by 69 (4 self)
 Add to MetaCart
In the multiarmed bandit problem, an online algorithm must choose from a set of strategies in a sequence of n trials so as to minimize the total cost of the chosen strategies. While nearly tight upper and lower bounds are known in the case when the strategy set is finite, much less is known when there is an infinite strategy set. Here we consider the case when the set of strategies is a subset of R d, and the cost functions are continuous. In the d = 1 case, we improve on the bestknown upper and lower bounds, closing the gap to a sublogarithmic factor. We also consider the case where d> 1 and the cost functions are convex, adapting a recent online convex optimization algorithm of Zinkevich to the sparser feedback model of the multiarmed bandit problem. 1
Convergence and noregret in multiagent learning
 In Advances in Neural Information Processing Systems 17
, 2005
"... Learning in a multiagent system is a challenging problem due to two key factors. First, if other agents are simultaneously learning then the environment is no longer stationary, thus undermining convergence guarantees. Second, learning is often susceptible to deception, where the other agents may be ..."
Abstract

Cited by 66 (0 self)
 Add to MetaCart
Learning in a multiagent system is a challenging problem due to two key factors. First, if other agents are simultaneously learning then the environment is no longer stationary, thus undermining convergence guarantees. Second, learning is often susceptible to deception, where the other agents may be able to exploit a learner’s particular dynamics. In the worst case, this could result in poorer performance than if the agent was not learning at all. These challenges are identifiable in the two most common evaluation criteria for multiagent learning algorithms: convergence and regret. Algorithms focusing on convergence or regret in isolation are numerous. In this paper, we seek to address both criteria in a single algorithm by introducing GIGAWoLF, a learning algorithm for normalform games. We prove the algorithm guarantees at most zero average regret, while demonstrating the algorithm converges in many situations of selfplay. We prove convergence in a limited setting and give empirical results in a wider variety of situations. These results also suggest a third new learning criterion combining convergence and regret, which we call negative nonconvergence regret (NNR). 1
Competitive online statistics
 International Statistical Review
, 1999
"... A radically new approach to statistical modelling, which combines mathematical techniques of Bayesian statistics with the philosophy of the theory of competitive online algorithms, has arisen over the last decade in computer science (to a large degree, under the influence of Dawid’s prequential sta ..."
Abstract

Cited by 63 (10 self)
 Add to MetaCart
A radically new approach to statistical modelling, which combines mathematical techniques of Bayesian statistics with the philosophy of the theory of competitive online algorithms, has arisen over the last decade in computer science (to a large degree, under the influence of Dawid’s prequential statistics). In this approach, which we call “competitive online statistics”, it is not assumed that data are generated by some stochastic mechanism; the bounds derived for the performance of competitive online statistical procedures are guaranteed to hold (and not just hold with high probability or on the average). This paper reviews some results in this area; the new material in it includes the proofs for the performance of the Aggregating Algorithm in the problem of linear regression with square loss. Keywords: Bayes’s rule, competitive online algorithms, linear regression, prequential statistics, worstcase analysis.
Online Learning in Online Auctions
, 2003
"... ding truthfully and setting b i = v i . As shown in that paper, this condition # Department of Computer Science, Carnegie Mellon University, Pittsburgh, PA, Email: avrim@cs.cmu.edu + Strategic Planning and Optimization Team, Amazon.com, Seattle, WA, Email: vijayk@amazon.com # Department of Compute ..."
Abstract

Cited by 58 (5 self)
 Add to MetaCart
ding truthfully and setting b i = v i . As shown in that paper, this condition # Department of Computer Science, Carnegie Mellon University, Pittsburgh, PA, Email: avrim@cs.cmu.edu + Strategic Planning and Optimization Team, Amazon.com, Seattle, WA, Email: vijayk@amazon.com # Department of Computer Science, University of Texas at Austin, Austin, TX. This work was done while the author was at IBM India Research Lab, New Delhi, India. Email: atri@cs.utexas.edu Computer Science Division, University of California at Berkeley, Berkeley, CA, Email: felix@cs.berkeley.edu is equivalent to the condition that each s i depends only on the first i 1 bids, and not on the ith bid. Hence, the auction mechanism is essentially trying to guess the ith valuation, based on the first i 1 valuations. As in previous papers [3, 5, 6], we will use competitive analysis to analyze the performance of any given auction. Hence, we are interested in the worstcase ratio (over all sequences of valuations)