Results 1  10
of
248
Online Convex Programming and Generalized Infinitesimal Gradient Ascent
, 2003
"... Convex programming involves a convex set F R and a convex function c : F ! R. The goal of convex programming is to nd a point in F which minimizes c. In this paper, we introduce online convex programming. In online convex programming, the convex set is known in advance, but in each step of some ..."
Abstract

Cited by 195 (4 self)
 Add to MetaCart
Convex programming involves a convex set F R and a convex function c : F ! R. The goal of convex programming is to nd a point in F which minimizes c. In this paper, we introduce online convex programming. In online convex programming, the convex set is known in advance, but in each step of some repeated optimization problem, one must select a point in F before seeing the cost function for that step. This can be used to model factory production, farm production, and many other industrial optimization problems where one is unaware of the value of the items produced until they have already been constructed. We introduce an algorithm for this domain, apply it to repeated games, and show that it is really a generalization of in nitesimal gradient ascent, and the results here imply that generalized in nitesimal gradient ascent (GIGA) is universally consistent.
Regret in the Online Decision Problem
, 1999
"... At each point in time a decision maker must choose a decision. The payoff in a period from the decision chosen depends on the decision as well as the state of the world that obtains at that time. The difficulty is that the decision must be made in advance of any knowledge, even probabilistic, about ..."
Abstract

Cited by 116 (2 self)
 Add to MetaCart
At each point in time a decision maker must choose a decision. The payoff in a period from the decision chosen depends on the decision as well as the state of the world that obtains at that time. The difficulty is that the decision must be made in advance of any knowledge, even probabilistic, about which state of the world will obtain. A range of problems from a variety of disciplines can be framed in this way. In this
Shopbots and Pricebots
, 1999
"... Shopbots are agents that automatically search the Internet to obtain information about prices and other attributes of goods and services. They herald a future in which autonomous agents profoundly influence electronic markets. In this study, a simple economic model is proposed and analyzed, which is ..."
Abstract

Cited by 96 (12 self)
 Add to MetaCart
Shopbots are agents that automatically search the Internet to obtain information about prices and other attributes of goods and services. They herald a future in which autonomous agents profoundly influence electronic markets. In this study, a simple economic model is proposed and analyzed, which is intended to quantify some of the likely impacts of a proliferation of shopbots and other economicallymotivated software agents. In addition, this paper reports on simulations of pricebots  adaptive, pricesetting agents which firms may well implement to combat, or even take advantage of, the growing community of shopbots. This study forms part of a larger research program that aims to provide insights into the impact of agent technology on the nascent information economy.
AWESOME: A general multiagent learning algorithm that converges in selfplay and learns a best response against stationary opponents
, 2003
"... A satisfactory multiagent learning algorithm should, at a minimum, learn to play optimally against stationary opponents and converge to a Nash equilibrium in selfplay. The algorithm that has come closest, WoLFIGA, has been proven to have these two properties in 2player 2action repeated games— as ..."
Abstract

Cited by 91 (5 self)
 Add to MetaCart
A satisfactory multiagent learning algorithm should, at a minimum, learn to play optimally against stationary opponents and converge to a Nash equilibrium in selfplay. The algorithm that has come closest, WoLFIGA, has been proven to have these two properties in 2player 2action repeated games— assuming that the opponent’s (mixed) strategy is observable. In this paper we present AWESOME, the first algorithm that is guaranteed to have these two properties in all repeated (finite) games. It requires only that the other players ’ actual actions (not their strategies) can be observed at each step. It also learns to play optimally against opponents that eventually become stationary. The basic idea behind AWESOME (Adapt When Everybody is Stationary, Otherwise Move to Equilibrium) is to try to adapt to the others’ strategies when they appear stationary, but otherwise to retreat to a precomputed equilibrium strategy. The techniques used to prove the properties of AWESOME are fundamentally different from those used for previous algorithms, and may help in analyzing other multiagent learning algorithms also.
Calibrated Learning and Correlated Equilibrium
 Games and Economic Behavior
, 1996
"... Suppose two players meet each other in a repeated game where: 1. each uses a learning rule with the property that it is a calibrated forecast of the others plays, and 2. each plays a best response to this forecast distribution. ..."
Abstract

Cited by 88 (5 self)
 Add to MetaCart
Suppose two players meet each other in a repeated game where: 1. each uses a learning rule with the property that it is a calibrated forecast of the others plays, and 2. each plays a best response to this forecast distribution.
A general class of adaptive strategies
 Journal of Economic Theory
"... We exhibit and characterize an entire class of simple adaptive strategies, in the repeated play of a game, having the Hannanconsistency property: In the longrun, the player is guaranteed an average payoff as large as the bestreply payoff to the empirical distribution of play of the other players; ..."
Abstract

Cited by 81 (4 self)
 Add to MetaCart
We exhibit and characterize an entire class of simple adaptive strategies, in the repeated play of a game, having the Hannanconsistency property: In the longrun, the player is guaranteed an average payoff as large as the bestreply payoff to the empirical distribution of play of the other players; i.e., there is no “regret. ” Smooth fictitious play (Fudenberg and Levine [1995]) and regretmatching (Hart and MasColell [2000]) are particular cases. The motivation and application of the current paper come from the study of procedures whose empirical distribution of play is, in the longrun, (almost) a correlated equilibrium. For the analysis we first develop a generalization of Blackwell’s [1956a] approachability strategy for games with vector payoffs.
Convergence and noregret in multiagent learning
 In Advances in Neural Information Processing Systems 17
, 2005
"... Learning in a multiagent system is a challenging problem due to two key factors. First, if other agents are simultaneously learning then the environment is no longer stationary, thus undermining convergence guarantees. Second, learning is often susceptible to deception, where the other agents may be ..."
Abstract

Cited by 71 (0 self)
 Add to MetaCart
Learning in a multiagent system is a challenging problem due to two key factors. First, if other agents are simultaneously learning then the environment is no longer stationary, thus undermining convergence guarantees. Second, learning is often susceptible to deception, where the other agents may be able to exploit a learner’s particular dynamics. In the worst case, this could result in poorer performance than if the agent was not learning at all. These challenges are identifiable in the two most common evaluation criteria for multiagent learning algorithms: convergence and regret. Algorithms focusing on convergence or regret in isolation are numerous. In this paper, we seek to address both criteria in a single algorithm by introducing GIGAWoLF, a learning algorithm for normalform games. We prove the algorithm guarantees at most zero average regret, while demonstrating the algorithm converges in many situations of selfplay. We prove convergence in a limited setting and give empirical results in a wider variety of situations. These results also suggest a third new learning criterion combining convergence and regret, which we call negative nonconvergence regret (NNR). 1
Uncoupled dynamics do not lead to Nash equilibrium
 Amer. Econ. Rev
, 2003
"... It is notoriously dif � cult to formulate sensible adaptive dynamics that guarantee convergence to Nash equilibrium. In fact, short of variants of exhaustive search (deterministic or stochastic), there are no general results; of course, there are many important, interesting and wellstudied particul ..."
Abstract

Cited by 63 (4 self)
 Add to MetaCart
It is notoriously dif � cult to formulate sensible adaptive dynamics that guarantee convergence to Nash equilibrium. In fact, short of variants of exhaustive search (deterministic or stochastic), there are no general results; of course, there are many important, interesting and wellstudied particular cases. See the books