Results 1 - 10
of
16
Improved second-order bounds for prediction with expert advice
- In COLT
, 2005
"... Abstract. This work studies external regret in sequential prediction games with both positive and negative payoffs. External regret measures the difference between the payoff obtained by the forecasting strategy and the payoff of the best action. In this setting, we derive new and sharper regret bou ..."
Abstract
-
Cited by 31 (6 self)
- Add to MetaCart
Abstract. This work studies external regret in sequential prediction games with both positive and negative payoffs. External regret measures the difference between the payoff obtained by the forecasting strategy and the payoff of the best action. In this setting, we derive new and sharper regret bounds for the well-known exponentially weighted average forecaster and for a new forecaster with a different multiplicative update rule. Our analysis has two main advantages: first, no preliminary knowledge about the payoff sequence is needed, not even its range; second, our bounds are expressed in terms of sums of squared payoffs, replacing larger firstorder quantities appearing in previous bounds. In addition, our most refined bounds have the natural and desirable property of being stable under rescalings and general translations of the payoff sequence. 1.
Regret minimization under partial monitoring
- MATHEMATICS OF OPERATIONS RESEARCH
, 2004
"... We consider repeated games in which the player, instead of observing the action chosen by the opponent in each game round, receives a feedback generated by the combined choice of the two players. We study Hannan consistent players for this games; that is, randomized playing strategies whose per-roun ..."
Abstract
-
Cited by 24 (5 self)
- Add to MetaCart
We consider repeated games in which the player, instead of observing the action chosen by the opponent in each game round, receives a feedback generated by the combined choice of the two players. We study Hannan consistent players for this games; that is, randomized playing strategies whose per-round regret vanishes with probability one as the number n of game rounds goes to infinity. We prove a general lower bound of Ω(n^−1/3) on the convergence rate of the regret, and exhibit a specific strategy that attains this rate on any game for which a Hannan consistent player exists.
Global Nash convergence of Foster and Young’s regret testing
- Games and Economic Behavior
, 2007
"... We construct an uncoupled randomized strategy of repeated play such that, if every player plays according to it, mixed action profiles converge almost surely to a Nash equilibrium of the stage game. The strategy requires very little in terms of information about the game, as players ’ actions are ba ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
We construct an uncoupled randomized strategy of repeated play such that, if every player plays according to it, mixed action profiles converge almost surely to a Nash equilibrium of the stage game. The strategy requires very little in terms of information about the game, as players ’ actions are based only on their own past payoffs. Moreover, in a variant of the procedure, players need not know that there are other players in the game and that payoffs are determined through other players ’ actions. The procedure works for finite generic games and is based on appropriate modifications of a simple stochastic learning rule introduced by Foster and Young [12]. Keywords Regret testing; Regret-based learning; Random search; Stochastic dynamics; Uncoupled dynamics; Global convergence to
The communication complexity of uncoupled Nash equilibrium procedures
- Games and Economic Behavior
, 2006
"... We study the question of how long it takes players to reach a Nash equilibrium in uncoupled setups, where each player initially knows only his own payoff function. We derive lower bounds on the communication complexity of reaching a Nash equilibrium, i.e., on the number of bits that need to be trans ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
We study the question of how long it takes players to reach a Nash equilibrium in uncoupled setups, where each player initially knows only his own payoff function. We derive lower bounds on the communication complexity of reaching a Nash equilibrium, i.e., on the number of bits that need to be transmitted, and thus also on the required number of steps. Specifically, we show lower bounds that are exponential in the number of players in each one of the following cases: (1) reaching a pure Nash equilibrium; (2) reaching a pure Nash equilibrium in a Bayesian setting; and (3) reaching a mixed Nash equilibrium. We then show that, in contrast, the communication complexity of reaching a correlated equilibrium is polynomial in the number of players.
A Parameter-free Hedging Algorithm
"... We study the problem of decision-theoretic online learning (DTOL). Motivated by practical applications, we focus on DTOL when the number of actions is very large. Previous algorithms for learning in this framework have a tunable learning rate parameter, and a barrier to using online-learning in prac ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
We study the problem of decision-theoretic online learning (DTOL). Motivated by practical applications, we focus on DTOL when the number of actions is very large. Previous algorithms for learning in this framework have a tunable learning rate parameter, and a barrier to using online-learning in practical applications is that it is not understood how to set this parameter optimally, particularly when the number of actions is large. In this paper, we offer a clean solution by proposing a novel and completely parameter-free algorithm for DTOL. We introduce a new notion of regret, which is more natural for applications with a large number of actions. We show that our algorithm achieves good performance with respect to this new notion of regret; in addition, it also achieves performance close to that of the best bounds achieved by previous algorithms with optimally-tuned parameters, according to previous notions of regret. 1
Recognition Tasks are Imitation Games
"... There is need for more formal specification of recognition tasks. Currently, it is common to use labeled training samples to illustrate the task to be performed. The mathematical theory of games may provide more formal and complete definitions for recognition tasks. ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
There is need for more formal specification of recognition tasks. Currently, it is common to use labeled training samples to illustrate the task to be performed. The mathematical theory of games may provide more formal and complete definitions for recognition tasks.
More Efficient Internal-Regret-Minimizing Algorithms
"... Standard no-internal-regret (NIR) algorithms compute a fixed point of a matrix, and hence typically require O(n 3) run time per round of learning, where n is the dimensionality of the matrix. The main contribution of this paper is a novel NIR algorithm, which is a simple and straightforward variant ..."
Abstract
- Add to MetaCart
Standard no-internal-regret (NIR) algorithms compute a fixed point of a matrix, and hence typically require O(n 3) run time per round of learning, where n is the dimensionality of the matrix. The main contribution of this paper is a novel NIR algorithm, which is a simple and straightforward variant of a standard NIR algorithm. However, rather than compute a fixed point every round, our algorithm relies on power iteration to estimate a fixed point, and hence runs in O(n 2) time per round. Nonetheless, it is not enough to look only at the per-round run time of an online learning algorithm. One must also consider the algorithm’s convergence rate. It turns out that the convergence rate of the aforementioned algorithm is slower than desired. This observation motivates our second contribution, which is an analysis of a multithreaded NIR algorithm that trades-off between its run time per round of learning and its convergence rate. 1

