Results 1 -
4 of
4
Convergence and no-regret in multiagent learning
- In Advances in Neural Information Processing Systems 17
, 2005
"... Learning in a multiagent system is a challenging problem due to two key factors. First, if other agents are simultaneously learning then the environment is no longer stationary, thus undermining convergence guarantees. Second, learning is often susceptible to deception, where the other agents may be ..."
Abstract
-
Cited by 49 (0 self)
- Add to MetaCart
Learning in a multiagent system is a challenging problem due to two key factors. First, if other agents are simultaneously learning then the environment is no longer stationary, thus undermining convergence guarantees. Second, learning is often susceptible to deception, where the other agents may be able to exploit a learner’s particular dynamics. In the worst case, this could result in poorer performance than if the agent was not learning at all. These challenges are identifiable in the two most common evaluation criteria for multiagent learning algorithms: convergence and regret. Algorithms focusing on convergence or regret in isolation are numerous. In this paper, we seek to address both criteria in a single algorithm by introducing GIGA-WoLF, a learning algorithm for normalform games. We prove the algorithm guarantees at most zero average regret, while demonstrating the algorithm converges in many situations of self-play. We prove convergence in a limited setting and give empirical results in a wider variety of situations. These results also suggest a third new learning criterion combining convergence and regret, which we call negative non-convergence regret (NNR). 1
Learning against opponents with bounded memory
- In IJCAI
, 2005
"... Recently, a number of authors have proposed criteria for evaluating learning algorithms in multiagent systems. While well-justified, each of these has generally given little attention to one of the main challenges of a multi-agent setting: the capability of the other agents to adapt and learn as wel ..."
Abstract
-
Cited by 32 (2 self)
- Add to MetaCart
Recently, a number of authors have proposed criteria for evaluating learning algorithms in multiagent systems. While well-justified, each of these has generally given little attention to one of the main challenges of a multi-agent setting: the capability of the other agents to adapt and learn as well. We propose extending existing criteria to apply to a class of adaptive opponents with bounded memory which we describe. We then show an algorithm that provably achieves an ɛ-best response against this richer class of opponents while simultaneously guaranteeing a minimum payoff against any opponent and performing well in self-play. This new algorithm also demonstrates strong performance in empirical tests against a variety of opponents in a wide range of environments. 1
A general criterion and an algorithmic framework for learning in multi-agent systems
- Machine Learning
, 2007
"... in multi-agent systems ..."
CMU-ML-08-112 On Fixed Convex Combinations of No-Regret Learners
, 2008
"... No-regret algorithms are powerful tools for learning in online convex problems that have received increased attention in recent years. Considering affine and external regret, we investigate what happens when a set of no-regret learners (voters) merge their respective strategies in each learning iter ..."
Abstract
- Add to MetaCart
No-regret algorithms are powerful tools for learning in online convex problems that have received increased attention in recent years. Considering affine and external regret, we investigate what happens when a set of no-regret learners (voters) merge their respective strategies in each learning iteration to a single, common one in form of a convex combination. We show that an agent who executes this merged decision in each iteration of the online learning process and each time feeds back a reward function to the voters that is a correspondingly weighted version of its own reward, incurs sublinear regret itself. As a by-product, we obtain a simple method that allows us

