Results 1  10
of
90
The nonstochastic multiarmed bandit problem
 SIAM Journal on Computing
, 2002
"... In the multiarmed bandit problem, a gambler must decide which arm of £ nonidentical slot machines to play in a sequence of trials so as to maximize his reward. This classical problem has received much attention because of the simple model it provides of the tradeoff between exploration (trying ou ..."
Abstract

Cited by 316 (27 self)
 Add to MetaCart
In the multiarmed bandit problem, a gambler must decide which arm of £ nonidentical slot machines to play in a sequence of trials so as to maximize his reward. This classical problem has received much attention because of the simple model it provides of the tradeoff between exploration (trying out each arm to find the best one) and exploitation (playing the arm believed to give the best payoff). Past solutions for the bandit problem have almost always relied on assumptions about the statistics of the slot machines. In this work, we make no statistical assumptions whatsoever about the nature of the process generating the payoffs of the slot machines. We give a solution to the bandit problem in which an adversary, rather than a wellbehaved stochastic process, has complete control over the payoffs. In a sequence of ¤ plays, we prove that the perround payoff of our algorithm approaches that of the best arm at the rate ¥§¦¨¤�©������� �. We show by a matching lower bound that this is best possible. We also prove that our algorithm approaches the perround payoff of any set of strategies at a similar rate: if the best strategy is chosen from a pool of � strategies then our algorithm approaches the perround payoff of the strategy at the rate ¥ ¦��¨���� � �§ � ���� � ¤ ©����� � �. Finally, we apply our results to the problem of playing an unknown repeated matrix game. We show that our algorithm approaches the minimax payoff of the unknown game at the rate ¥ ¦ ¤ ©����� � �.
Online Convex Programming and Generalized Infinitesimal Gradient Ascent
, 2003
"... Convex programming involves a convex set F R and a convex function c : F ! R. The goal of convex programming is to nd a point in F which minimizes c. In this paper, we introduce online convex programming. In online convex programming, the convex set is known in advance, but in each step of some ..."
Abstract

Cited by 183 (4 self)
 Add to MetaCart
Convex programming involves a convex set F R and a convex function c : F ! R. The goal of convex programming is to nd a point in F which minimizes c. In this paper, we introduce online convex programming. In online convex programming, the convex set is known in advance, but in each step of some repeated optimization problem, one must select a point in F before seeing the cost function for that step. This can be used to model factory production, farm production, and many other industrial optimization problems where one is unaware of the value of the items produced until they have already been constructed. We introduce an algorithm for this domain, apply it to repeated games, and show that it is really a generalization of in nitesimal gradient ascent, and the results here imply that generalized in nitesimal gradient ascent (GIGA) is universally consistent.
Efficient Algorithms for Online Decision Problems
 J. Comput. Syst. Sci
, 2003
"... In an online decision problem, one makes a sequence of decisions without knowledge of the future. Tools from learning such as Weighted Majority and its many variants [13, 18, 4] demonstrate that online algorithms can perform nearly as well as the best single decision chosen in hindsight, even when t ..."
Abstract

Cited by 136 (3 self)
 Add to MetaCart
In an online decision problem, one makes a sequence of decisions without knowledge of the future. Tools from learning such as Weighted Majority and its many variants [13, 18, 4] demonstrate that online algorithms can perform nearly as well as the best single decision chosen in hindsight, even when there are exponentially many possible decisions. However, the naive application of these algorithms is inefficient for such large problems. For some problems with nice structure, specialized efficient solutions have been developed [10, 16, 17, 6, 3].
Shopbots and Pricebots
, 1999
"... Shopbots are agents that automatically search the Internet to obtain information about prices and other attributes of goods and services. They herald a future in which autonomous agents profoundly influence electronic markets. In this study, a simple economic model is proposed and analyzed, which is ..."
Abstract

Cited by 89 (12 self)
 Add to MetaCart
Shopbots are agents that automatically search the Internet to obtain information about prices and other attributes of goods and services. They herald a future in which autonomous agents profoundly influence electronic markets. In this study, a simple economic model is proposed and analyzed, which is intended to quantify some of the likely impacts of a proliferation of shopbots and other economicallymotivated software agents. In addition, this paper reports on simulations of pricebots  adaptive, pricesetting agents which firms may well implement to combat, or even take advantage of, the growing community of shopbots. This study forms part of a larger research program that aims to provide insights into the impact of agent technology on the nascent information economy.
Correlated Qlearning
 In Proceedings of the Twentieth International Conference on Machine Learning
, 2003
"... There have been several attempts to design multiagent Qlearning algorithms capable of learning equilibrium policies in generalsum Markov games, just as Qlearning learns optimal policies in Markov decision processes. We introduce correlated Qlearning, one such algorithm based on the correlated eq ..."
Abstract

Cited by 56 (2 self)
 Add to MetaCart
There have been several attempts to design multiagent Qlearning algorithms capable of learning equilibrium policies in generalsum Markov games, just as Qlearning learns optimal policies in Markov decision processes. We introduce correlated Qlearning, one such algorithm based on the correlated equilibrium solution concept. Motivated by a fixed point proof of the existence of stationary correlated equilibrium policies in Markov games, we present a generic multiagent Qlearning algorithm of which many popular algorithms are immediate special cases. We also prove that certain variants of correlated (and Nash) Qlearning are guaranteed to converge to stationary correlated (and Nash) equilibrium policies in two special classes of Markov games, namely zerosum and commoninterest. Finally, we show empirically that correlated Qlearning outperforms Nash Qlearning, further justifying the former beyond noting that it is less computationally expensive than the latter.
Computing Equilibria in MultiPlayer Games
 In Proceedings of the Annual ACMSIAM Symposium on Discrete Algorithms (SODA
, 2004
"... We initiate the systematic study of algorithmic issues involved in finding equilibria (Nash and correlated) in games with a large number of players; such games, in order to be computationally meaningful, must be presented in some succinct, gamespecific way. We develop a general framework for obta ..."
Abstract

Cited by 53 (3 self)
 Add to MetaCart
We initiate the systematic study of algorithmic issues involved in finding equilibria (Nash and correlated) in games with a large number of players; such games, in order to be computationally meaningful, must be presented in some succinct, gamespecific way. We develop a general framework for obtaining polynomialtime algorithms for optimizing over correlated equilibria in such settings, and show how it can be applied successfully to symmetric games (for which we actually find an exact polytopal characterization), graphical games, and congestion games, among others. We also present complexity results implying that such algorithms are not possible in certain other such games. Finally, we present a polynomialtime algorithm, based on quantifier elimination, for finding a Nash equilibrium in symmetric games when the number of strategies is relatively small.
The multiplicative weights update method: a meta algorithm and applications
, 2005
"... Algorithms in varied fields use the idea of maintaining a distribution over a certain set and use the multiplicative update rule to iteratively change these weights. Their analysis are usually very similar and rely on an exponential potential function. We present a simple meta algorithm that unifies ..."
Abstract

Cited by 53 (10 self)
 Add to MetaCart
Algorithms in varied fields use the idea of maintaining a distribution over a certain set and use the multiplicative update rule to iteratively change these weights. Their analysis are usually very similar and rely on an exponential potential function. We present a simple meta algorithm that unifies these disparate algorithms and drives them as simple instantiations of the meta algorithm. 1
CorrelatedQ learning
 In NIPS Workshop on Multiagent Learning
, 2002
"... Bowling named two desiderata for multiagent learning algorithms: rationality and convergence. This paper introduces co~elatedQ learning, a natural generalization of NashQ and FFQ that satisfies these criteria. NashoQ satisfies rationality, but in general it does not converge. FFQ satisfies conve ..."
Abstract

Cited by 44 (1 self)
 Add to MetaCart
Bowling named two desiderata for multiagent learning algorithms: rationality and convergence. This paper introduces co~elatedQ learning, a natural generalization of NashQ and FFQ that satisfies these criteria. NashoQ satisfies rationality, but in general it does not converge. FFQ satisfies convergence, but in general it is not rational. CorrelatedQ satisfies rationality by construction. This papers demonstrates the empirical convergence of correlatedQ on a standard testbed of generalsum Markov games.