Information theory  the bridge connecting bounded rational game theory and statistical physics
 Statistical Physics
, 2004
"... A longrunning difficulty with conventional game theory has been how to modify it to accommodate the bounded rationality of all realworld players. A recurring issue in statistical physics is how best to approximate joint probability distributions with decoupled (and therefore far more tractable) di ..."
A longrunning difficulty with conventional game theory has been how to modify it to accommodate the bounded rationality of all realworld players. A recurring issue in statistical physics is how best to approximate joint probability distributions with decoupled (and therefore far more tractable) distributions. This paper shows that the same information theoretic mathematical structure, known as Product Distribution (PD) theory, addresses both issues. In this, PD theory not only provides a principled formulation of bounded rationality and a set of new types of mean field theory in statistical physics; it also shows that those topics are fundamentally one and the same. 1
Cooperative control and potential game
 IEEE Trans. Syst., Man, Cybern. B
, 2009
"... Abstract—We present a view of cooperative control using the language of learning in games. We review the gametheoretic concepts of potential and weakly acyclic games, and demonstrate how several cooperative control problems, such as consensus and dynamic sensor coverage, can be formulated in these ..."
Abstract—We present a view of cooperative control using the language of learning in games. We review the gametheoretic concepts of potential and weakly acyclic games, and demonstrate how several cooperative control problems, such as consensus and dynamic sensor coverage, can be formulated in these settings. Motivated by this connection, we build upon gametheoretic concepts to better accommodate a broader class of cooperative control problems. In particular, we extend existing learning algorithms to accommodate restricted action sets caused by the limitations of agent capabilities and groupbased decision making. Furthermore, we also introduce a new class of games called sometimes weakly acyclic games for timevarying objective functions and action sets, and provide distributed algorithms for convergence to an equilibrium. Index Terms—Cooperative control, game theory, learning in games, multiagent systems. I.
Regret based dynamics: Convergence in weakly acyclic games
 In Proceedings of the 2007 International Conference on Autonomous Agents and Multiagent Systems (AAMAS
, 2007
"... Regret based algorithms have been proposed to control a wide variety of multiagent systems. The appeal of regretbased algorithms is that (1) these algorithms are easily implementable in large scale multiagent systems and (2) there are existing results proving that the behavior will asymptotically ..."
Regret based algorithms have been proposed to control a wide variety of multiagent systems. The appeal of regretbased algorithms is that (1) these algorithms are easily implementable in large scale multiagent systems and (2) there are existing results proving that the behavior will asymptotically converge to a set of points of “noregret ” in any game. We illustrate, through a simple example, that noregret points need not reflect desirable operating conditions for a multiagent system. Multiagent systems often exhibit an additional structure (i.e. being “weakly acyclic”) that has not been exploited in the context of regret based algorithms. In this paper, we introduce a modification of regret based algorithms by (1) exponentially discounting the memory and (2) bringing in a notion of inertia in players ’ decision process. We show how these modifications can lead to an entire class of regret based algorithm that provide almost sure convergence to a pure Nash equilibrium in any weakly acyclic game.
All Learning is Local: Multiagent learning in global reward games
"... In large multiagent games, partial observability, coordination, and credit assignment persistently plague attempts to design good learning algorithms. ..."
In large multiagent games, partial observability, coordination, and credit assignment persistently plague attempts to design good learning algorithms.
Decentralized camera network control using game theory
 in Proc. ACM/IEEE Int. Conf. Distributed Smart Cameras
, 2008
"... ..."
Theoretical considerations of potentialbased reward shaping for multiagent systems
 In Proceedings of the 10th International Conference on Autonomous Agents and Multiagent Systems (AAMAS
, 2011
"... Potentialbased reward shaping has previously been proven to both be equivalent to Qtable initialisation and guarantee policy invariance in singleagent reinforcement learning. The method has since been used in multiagent reinforcement learning without consideration of whether the theoretical equi ..."
Potentialbased reward shaping has previously been proven to both be equivalent to Qtable initialisation and guarantee policy invariance in singleagent reinforcement learning. The method has since been used in multiagent reinforcement learning without consideration of whether the theoretical equivalence and guarantees hold. This paper extends the existing proofs to similar results in multiagent systems, providing the theoretical background to explain the success of previous empirical studies. Specifically, it is proven that the equivalence to Qtable initialisation remains and the Nash Equilibria of the underlying stochastic game are not modified. Furthermore, we demonstrate empirically that potentialbased reward shaping affects exploration and, consequentially, can alter the joint policy converged upon.
Achieving Pareto Optimality Through Distributed Learning
, 2012
"... We propose a simple payoffbased learning rule that is completely decentralized, and that leads to an efficient configuration of actions in any nperson finite strategicform game with generic payoffs. The algorithm follows the theme of exploration versus exploitation and is hence stochastic in natu ..."
We propose a simple payoffbased learning rule that is completely decentralized, and that leads to an efficient configuration of actions in any nperson finite strategicform game with generic payoffs. The algorithm follows the theme of exploration versus exploitation and is hence stochastic in nature. We prove that if all agents adhere to this algorithm, then the agents will select the action profile that maximizes the sum of the agents ’ payoffs a high percentage of time. The algorithm requires no communication. Agents respond solely to changes in their own realized payoffs, which are affected by the actions of other agents in the system in ways that they do not necessarily understand. The method can be applied to the optimization of complex systems with many distributed components, such as the routing of information in networks and the design and control of wind farms. The proof of the proposed learning algorithm relies on the theory of large deviations for perturbed Markov chains.
Distributed Welfare Games
"... We consider a variation of the resource allocation problem. In the traditional problem, there is a global planner who would like to assign a set of players to a set of resources so as to maximize welfare. We consider the situation where the global planner does not have the authority to assign player ..."
We consider a variation of the resource allocation problem. In the traditional problem, there is a global planner who would like to assign a set of players to a set of resources so as to maximize welfare. We consider the situation where the global planner does not have the authority to assign players to resources; rather, players are selfinterested. The question that emerges is how can the global planner entice the players to settle on a desirable allocation with respect to the global welfare? To study this question, we focus on a class of games that we refer to as distributed welfare games. Within this context, we investigate how the global planner should distribute the welfare to the players. We measure the efficacy of a distribution rule in two ways: (i) Does a pure Nash equilibrium exist? (ii) How does the welfare associated with a pure Nash equilibrium compare to the global welfare associated with the optimal allocation? In this paper we explore the applicability of cost sharing methodologies for distributing welfare in such resource allocation problems. We demonstrate that obtaining desirable distribution rules, such as distribution rules that are budget balanced and guarantee the existence of a pure Nash equilibrium, often comes at a significant informational and computational cost. In light of this, we derive a systematic procedure for designing desirable distribution rules with a minimal informational and computational cost for a special class of distributed welfare games. Furthermore, we derive a bound on the price of anarchy for distributed welfare games in a variety of settings. Lastly, we highlight the implications of these results using the problem of sensor coverage.
Multiagent resource allocation with kadditive utility functions
 In Workshop on Computer Science and Decision Theory
, 2004
"... We briefly review previous work on the welfare engineering framework where autonomous software agents negotiate on the allocation of a number of discrete resources, and point out connections to combinatorial optimisation problems, including combinatorial auctions, that shed light on the computationa ..."
We briefly review previous work on the welfare engineering framework where autonomous software agents negotiate on the allocation of a number of discrete resources, and point out connections to combinatorial optimisation problems, including combinatorial auctions, that shed light on the computational complexity of the framework. We give particular consideration to scenarios where the preferences of agents are modelled in terms of kadditive utility functions, i.e. scenarios where synergies between different resources are restricted to bundles of at most k items. Key words: negotiation, representation of utility functions, social welfare, combinatorial optimisation, bidding languages for combinatorial auctions 1
An Empirical Study of PotentialBased Reward Shaping and Advice in Complex, MultiAgent Systems
 ADVANCES IN COMPLEX SYSTEMS C
, 2011
"... This paper investigates the impact of reward shaping in multiagent reinforcement learning as a way to incorporate domain knowledge about good strategies. In theory, potentialbased reward shaping does not alter the Nash Equilibria of a stochastic game, only the exploration of the shaped agent. We d ..."
This paper investigates the impact of reward shaping in multiagent reinforcement learning as a way to incorporate domain knowledge about good strategies. In theory, potentialbased reward shaping does not alter the Nash Equilibria of a stochastic game, only the exploration of the shaped agent. We demonstrate empirically the performance of reward shaping in two problem domains within the context of RoboCup KeepAway by designing three reward shaping schemes, encouraging specific behaviour such as keeping a minimum distance from other players on the same team and taking on specific roles. The results illustrate that reward shaping with multiple, simultaneous learning agents can reduce the time needed to learn a suitable policy and can alter the final group performance.