Results 1 - 10
of
104
Revisiting Log-Linear Learning: Asynchrony, Completeness and Payoff-Based Implementation
, 2008
"... Log-linear learning is a learning algorithm with equilibrium selection properties. Log-linear learning provides guarantees on the percentage of time that the joint action profile will be at a potential maximizer in potential games. The traditional analysis of log-linear learning has centered around ..."
Abstract
-
Cited by 42 (11 self)
- Add to MetaCart
Log-linear learning is a learning algorithm with equilibrium selection properties. Log-linear learning provides guarantees on the percentage of time that the joint action profile will be at a potential maximizer in potential games. The traditional analysis of log-linear learning has centered around explicitly computing the stationary distribution. This analysis relied on a highly structured setting: i) players ’ utility functions constitute a potential game, ii) players update their strategies one at a time, which we refer to as asynchrony, iii) at any stage, a player can select any action in the action set, which we refer to as completeness, and iv) each player is endowed with the ability to assess the utility he would have received for any alternative action provided that the actions of all other players remain fixed. Since the appeal of log-linear learning is not solely the explicit form of the stationary distribution, we seek to address to what degree one can relax the structural assumptions while maintaining that only potential function maximizers are the stochastically stable action profiles. In this paper, we introduce slight variants of log-linear learning to include both synchronous updates and incomplete action sets. In both settings, we prove that only potential function maximizers are stochastically stable. Furthermore, we introduce a payoff-based version of log-linear learning, in which players are only aware of the utility they received and the action that they played. Note that log-linear learning in its original form is not a payoff-based learning algorithm. In payoff-based log-linear learning, we also prove that only potential maximizers are stochastically stable. The key enabler for these results is to change the focus of the analysis away from deriving the explicit form of the stationary distribution of the learning process towards characterizing the stochastically stable states. The resulting analysis uses the theory of resistance trees for regular perturbed Markov decision processes, thereby allowing a relaxation of the aforementioned structural assumptions.
Payoff-based dynamics for multi-player weakly acyclic games
- SIAM J. CONTROL OPT
, 2009
"... We consider repeated multiplayer games in which players repeatedly and simultaneously choose strategies from a finite set of available strategies according to some strategy adjustment process. We focus on the specific class of weakly acyclic games, which is particularly relevant for multiagent coo ..."
Abstract
-
Cited by 33 (12 self)
- Add to MetaCart
(Show Context)
We consider repeated multiplayer games in which players repeatedly and simultaneously choose strategies from a finite set of available strategies according to some strategy adjustment process. We focus on the specific class of weakly acyclic games, which is particularly relevant for multiagent cooperative control problems. A strategy adjustment process determines how players select their strategies at any stage as a function of the information gathered over previous stages. Of particular interest are “payoff-based ” processes in which, at any stage, players know only their own actions and (noise corrupted) payoffs from previous stages. In particular, players do not know the actions taken by other players and do not know the structural form of payoff functions. We introduce three different payoff-based processes for increasingly general scenarios and prove that, after a sufficiently large number of stages, player actions constitute a Nash equilibrium at any stage with arbitrarily high probability. We also show how to modify player utility functions through tolls and incentives in so-called congestion games, a special class of weakly acyclic games, to guarantee that a centralized objective can be realized as a Nash equilibrium. We illustrate the methods with a simulation of distributed routing over a network.
Payoff Based Dynamics for Multi-Player Weakly Acyclic Games
- SIAM JOURNAL ON CONTROL AND OPTIMIZATION, SPECIAL ISSUE ON CONTROL AND OPTIMIZATION IN COOPERATIVE NETWORKS
, 2007
"... We consider repeated multi-player games in which players repeatedly and simultaneously choose strategies from a finite set of available strategies according to some strategy adjustment process. We focus on the specific class of weakly acyclic games, which is particularly relevant for multi-agent coo ..."
Abstract
-
Cited by 28 (15 self)
- Add to MetaCart
(Show Context)
We consider repeated multi-player games in which players repeatedly and simultaneously choose strategies from a finite set of available strategies according to some strategy adjustment process. We focus on the specific class of weakly acyclic games, which is particularly relevant for multi-agent cooperative control problems. A strategy adjustment process determines how players select their strategies at any stage as a function of the information gathered over previous stages. Of particular interest are “payoff based ” processes, in which at any stage, players only know their own actions and (noise corrupted) payoffs from previous stages. In particular, players do not know the actions taken by other players and do not know the structural form of payoff functions. We introduce three different payoff based processes for increasingly general scenarios and prove that after a sufficiently large number of stages, player actions constitute a Nash equilibrium at any stage with arbitrarily high probability. We also show how to modify player utility functions through tolls and incentives in so-called congestion games, a special class of weakly acyclic games, to guarantee that a centralized objective can be realized as a Nash equilibrium. We illustrate the methods with a simulation of distributed routing over a network.
Delay-sensitive resource management in multi-hop cognitive radio networks
- in: Proceedings of the IEEE DySPAN
, 2008
"... Abstract–Dynamic resource management by the various cognitive nodes fundamentally changes the passive way that wireless nodes are currently adapting their transmission strategies to match available wireless resources, by enabling them to consciously influence the wireless system dynamics based on th ..."
Abstract
-
Cited by 22 (0 self)
- Add to MetaCart
(Show Context)
Abstract–Dynamic resource management by the various cognitive nodes fundamentally changes the passive way that wireless nodes are currently adapting their transmission strategies to match available wireless resources, by enabling them to consciously influence the wireless system dynamics based on the gathered information about other network nodes. In this paper, we discuss the main challenges of performing such dynamic resource management by emphasizing the distributed information in the dynamic multi-agent system. Specifically, the decisions on how to adapt the aforementioned resource management at sources and relays need to be performed in an informationally-decentralized manner, as the tolerable delay does not allow propagating information back and forth throughout the multi-hop infrastructure to a centralized decision maker. The term “cognitive ” refers in our paper to both the capability of the network nodes to achieving large spectral efficiencies through exploitation and mitigation of channel and interference variability by dynamically using different frequency bands as well as their ability to learn the “environment” (channel conditions and source characteristic) and the actions of competing nodes through the designed information exchange. We propose our dynamic resource management algorithms performed at each network nodes integrated with multi-agent learning that explicitly consider the timeliness and the cost of such information exchange. The results show that our dynamic resource management approach improves the PSNR of multiple video streams by more than 3dB as opposed to the state-of-the-art dynamic frequency channel/route selection approaches without learning capability, when the network resources are limited.
Theoretical considerations of potential-based reward shaping for multi-agent systems
- In Proceedings of the 10th International Conference on Autonomous Agents and Multiagent Systems (AAMAS
, 2011
"... Potential-based reward shaping has previously been proven to both be equivalent to Q-table initialisation and guarantee policy invariance in single-agent reinforcement learning. The method has since been used in multi-agent reinforcement learning without consideration of whether the theoretical equi ..."
Abstract
-
Cited by 21 (12 self)
- Add to MetaCart
(Show Context)
Potential-based reward shaping has previously been proven to both be equivalent to Q-table initialisation and guarantee policy invariance in single-agent reinforcement learning. The method has since been used in multi-agent reinforcement learning without consideration of whether the theoretical equivalence and guarantees hold. This paper extends the existing proofs to similar results in multi-agent systems, providing the theoretical background to explain the success of previous empirical studies. Specifically, it is proven that the equivalence to Q-table initialisation remains and the Nash Equilibria of the underlying stochastic game are not modified. Furthermore, we demonstrate empirically that potential-based reward shaping affects exploration and, consequentially, can alter the joint policy converged upon.
Social Reward Shaping in the Prisoner’s Dilemma (Short Paper)
"... Reward shaping is a well-known technique applied to help reinforcement-learning agents converge more quickly to nearoptimal behavior. In this paper, we introduce social reward shaping, which is reward shaping applied in the multiagentlearning framework. We present preliminary experiments in the iter ..."
Abstract
-
Cited by 20 (3 self)
- Add to MetaCart
(Show Context)
Reward shaping is a well-known technique applied to help reinforcement-learning agents converge more quickly to nearoptimal behavior. In this paper, we introduce social reward shaping, which is reward shaping applied in the multiagentlearning framework. We present preliminary experiments in the iterated Prisoner’s dilemma setting that show that agents using social reward shaping appropriately can behave more effectively than other classical learning and nonlearning strategies. In particular, we show that these agents can both lead —encourage adaptive opponents to stably cooperate — and follow —adopt a best-response strategy when paired with a fixed opponent — where better known approaches achieve only one of these objectives.
Frequency Adjusted Multi-agent Q-learning
"... Multi-agent learning is a crucial method to control or find solutions for systems, in which more than one entity needs to be adaptive. In today’s interconnected world, such systems are ubiquitous in many domains, including auctions in economics, swarm robotics in computer science, and politics in so ..."
Abstract
-
Cited by 19 (11 self)
- Add to MetaCart
(Show Context)
Multi-agent learning is a crucial method to control or find solutions for systems, in which more than one entity needs to be adaptive. In today’s interconnected world, such systems are ubiquitous in many domains, including auctions in economics, swarm robotics in computer science, and politics in social sciences. Multi-agent learning is inherently more complex than single-agent learning and has a relatively thin theoretical framework supporting it. Recently, multi-agent learning dynamics have been linked to evolutionary game theory, allowing the interpretation of learning as an evolution of competing policies in the mind of the learning agents. The dynamical system from evolutionary game theory that has been linked to Q-learning predicts the expected behavior of the learning agents. Closer analysis however allows for two interesting observations: the predicted behavior is not always the same as the actual behavior, and in case of deviation, the predicted behavior is more desirable. This discrepancy is elucidated in this article, and based on these new insights Frequency Adjusted Q- (FAQ-) learning is proposed. This variation of Q-learning perfectly adheres to the predictions of the evolutionary model for an arbitrarily large part of the policy space. In addition to the theoretical discussion, experiments in the three classes of two-agent twoaction games illustrate the superiority of FAQ-learning.
An Empirical Study of Potential-Based Reward Shaping and Advice in Complex, Multi-Agent Systems
- ADVANCES IN COMPLEX SYSTEMS C
, 2011
"... This paper investigates the impact of reward shaping in multi-agent reinforcement learning as a way to incorporate domain knowledge about good strategies. In theory, potential-based reward shaping does not alter the Nash Equilibria of a stochastic game, only the exploration of the shaped agent. We d ..."
Abstract
-
Cited by 18 (9 self)
- Add to MetaCart
This paper investigates the impact of reward shaping in multi-agent reinforcement learning as a way to incorporate domain knowledge about good strategies. In theory, potential-based reward shaping does not alter the Nash Equilibria of a stochastic game, only the exploration of the shaped agent. We demonstrate empirically the performance of reward shaping in two problem domains within the context of RoboCup KeepAway by designing three reward shaping schemes, encouraging specific behaviour such as keeping a minimum distance from other players on the same team and taking on specific roles. The results illustrate that reward shaping with multiple, simultaneous learning agents can reduce the time needed to learn a suitable policy and can alter the final group performance.
Multi-agent learning for engineers
- Artificial Intelligence
, 2007
"... As suggested by the title of Shoham, Powers, and Grenager’s position paper [34], the ultimate lens through which the multi-agent learning framework should be assessed is “what is the question?”. In this paper, we address this question by presenting challenges motivated by engineering applications an ..."
Abstract
-
Cited by 18 (5 self)
- Add to MetaCart
(Show Context)
As suggested by the title of Shoham, Powers, and Grenager’s position paper [34], the ultimate lens through which the multi-agent learning framework should be assessed is “what is the question?”. In this paper, we address this question by presenting challenges motivated by engineering applications and discussing the potential appeal of multi-agent learning to meet these challenges. Moreover, we highlight various differences in the underlying assumptions and issues of concern that generally distinguish engineering applications from models that are typically considered in the economic game theory literature. 1