Results 1  10
of
302,626
NearOptimal NoRegret Algorithms for ZeroSum Games
"... We propose a new noregret learning algorithm. When used against an adversary, “ ” our algorithm achieves average 1 regret that scales as O √T with the number T of rounds. This regret bound is optimal but not rare, as there are a multitude of learning algorithms with this regret guarantee. However ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
. However, when our algorithm is used by both players of a zerosum game, their average regret scales as O ` ´ ln T T guaranteeing a nearlinear rate of convergence to the value of the game. This represents an almostquadratic improvement on the rate of convergence to the value of a game known
to NoRegret Online Learning
"... Sequential prediction problems such as imitation learning, where future observations depend on previous predictions (actions), violate the common i.i.d. assumptions made in statistical learning. This leads to poor performance in theory and often in practice. Some recent approaches (Daumé III et al., ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
policy, that can be seen as a no regret algorithm in an online learning setting.
RMAX  A General Polynomial Time Algorithm for NearOptimal Reinforcement Learning
, 2001
"... Rmax is a very simple modelbased reinforcement learning algorithm which can attain nearoptimal average reward in polynomial time. In Rmax, the agent always maintains a complete, but possibly inaccurate model of its environment and acts based on the optimal policy derived from this model. The mod ..."
Abstract

Cited by 294 (10 self)
 Add to MetaCart
's E algorithm, covering zerosum stochastic games. (2) It has a builtin mechanism for resolving the exploration vs...
Private Equilibrium Release, Large Games, and NoRegret Learning ∗
, 2012
"... We give mechanisms in which each of n players in a game is given their component of an (approximate) equilibrium in a way that guarantees differential privacy — that is, the revelation of the equilibrium components does not reveal too much information about the utilities of other players. More preci ..."
Abstract
 Add to MetaCart
of classical convergence results for noregret learning, and the noisy mechanisms developed for differential privacy. Our results imply the ability to truthfully implement good socialwelfare solutions in many games, such as games with small Price of Anarchy, even if the mechanism does not have the ability
Convergence and NoRegret in MultiagentLearning
"... 1 Introduction Learning to select actions to achieve goals in a multiagent setting requires overcoming anumber of key challenges. One of these challenges is the loss of the stationarity assumption when multiple agents are learning simultaneously. Another challenge is guaranteeing thatthe learner can ..."
Abstract
 Add to MetaCart
1 Introduction Learning to select actions to achieve goals in a multiagent setting requires overcoming anumber of key challenges. One of these challenges is the loss of the stationarity assumption when multiple agents are learning simultaneously. Another challenge is guaranteeing thatthe learner cannot be deceptively exploited by another agent. Both of these challenges distinguish the multiagent learning problem from traditional singleagent learning, and havebeen gaining recent attention as multiagent applications continue to proliferate. In singleagent learning tasks, it is reasonable to assume that the same action from thesame state will result in the same distribution over outcomes, both rewards and next states. In other words, the environment is stationary. In a multiagent task with other learningagents, the outcomes of an agent's action will vary with the changing policies of the other agents. Since most of the convergence results in reinforcement learning depend upon theenvironment being stationary, convergence is often difficult to obtain in multiagent settings.
Repeated zerosum games with budget
"... When a zerosum game is played once, a riskneutral player will want to maximize his expected outcome in that single play. However, if that single play instead only determines how much one player must pay to the other, and the same game must be played again, until either player runs out of money, op ..."
Abstract
 Add to MetaCart
When a zerosum game is played once, a riskneutral player will want to maximize his expected outcome in that single play. However, if that single play instead only determines how much one player must pay to the other, and the same game must be played again, until either player runs out of money
Convergence and noregret in multiagent learning
 In Advances in Neural Information Processing Systems 17
, 2005
"... Learning in a multiagent system is a challenging problem due to two key factors. First, if other agents are simultaneously learning then the environment is no longer stationary, thus undermining convergence guarantees. Second, learning is often susceptible to deception, where the other agents may be ..."
Abstract

Cited by 85 (0 self)
 Add to MetaCart
focusing on convergence or regret in isolation are numerous. In this paper, we seek to address both criteria in a single algorithm by introducing GIGAWoLF, a learning algorithm for normalform games. We prove the algorithm guarantees at most zero average regret, while demonstrating the algorithm converges
Results 1  10
of
302,626