Results 1 -
9 of
9
Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm
, 1998
"... In this paper, we adopt general-sum stochastic games as a framework for multiagent reinforcement learning. Our work extends previous work by Littman on zero-sum stochastic games to a broader framework. We design a multiagent Q-learning method under this framework, and prove that it converges to a Na ..."
Abstract
-
Cited by 237 (4 self)
- Add to MetaCart
In this paper, we adopt general-sum stochastic games as a framework for multiagent reinforcement learning. Our work extends previous work by Littman on zero-sum stochastic games to a broader framework. We design a multiagent Q-learning method under this framework, and prove that it converges to a Nash equilibrium under specified conditions. This algorithm is useful for finding the optimal strategy when there exists a unique Nash equilibrium in the game. When there exist multiple Nash equilibria in the game, this algorithm should be combined with other learning techniques to find optimal strategies.
Multiagent Learning Using a Variable Learning Rate
- Artificial Intelligence
, 2002
"... Learning to act in a multiagent environment is a difficult problem since the normal definition of an optimal policy no longer applies. The optimal policy at any moment depends on the policies of the other agents and so creates a situation of learning a moving target. Previous learning algorithms hav ..."
Abstract
-
Cited by 150 (8 self)
- Add to MetaCart
Learning to act in a multiagent environment is a difficult problem since the normal definition of an optimal policy no longer applies. The optimal policy at any moment depends on the policies of the other agents and so creates a situation of learning a moving target. Previous learning algorithms have one of two shortcomings depending on their approach. They either converge to a policy that may not be optimal against the specific opponents' policies, or they may not converge at all. In this article we examine this learning problem in the framework of stochastic games. We look at a number of previous learning algorithms showing how they fail at one of the above criteria. We then contribute a new reinforcement learning technique using a variable learning rate to overcome these shortcomings. Specifically, we introduce the WoLF principle, "Win or Learn Fast", for varying the learning rate. We examine this technique theoretically, proving convergence in self-play on a restricted class of iterated matrix games. We also present empirical results on a variety of more general stochastic games, in situations of self-play and otherwise, demonstrating the wide applicability of this method.
Computing Equilibria for Two-Person Games
, 1998
"... This paper is a survey and exposition of linear methods for finding Nash equilibria. Above all, these apply to games with two players. In an equilibrium of a twoperson game, the mixed strategy probabilities of one player equalize the expected payoffs for the pure strategies used by the other player. ..."
Abstract
-
Cited by 47 (4 self)
- Add to MetaCart
This paper is a survey and exposition of linear methods for finding Nash equilibria. Above all, these apply to games with two players. In an equilibrium of a twoperson game, the mixed strategy probabilities of one player equalize the expected payoffs for the pure strategies used by the other player. This defines an optimization problem with linear constraints. We do not consider nonlinear methods like simplicial subdivision for approximating fixed points, or systems of inequalities for higher-degree polynomials as they arise for noncooperative games with more than two players. These are surveyed in McKelvey and McLennan (1996)
Fast Concurrent Reinforcement Learners
- In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence
, 2001
"... When several agents learn concurrently, the payoff received by an agent is dependent on the behavior of the other agents. As the other agents learn, the reward of one agent becomes non-stationary. This makes learning in multiagent systems more difficult than single-agent learning. A few methods ..."
Abstract
-
Cited by 36 (6 self)
- Add to MetaCart
When several agents learn concurrently, the payoff received by an agent is dependent on the behavior of the other agents. As the other agents learn, the reward of one agent becomes non-stationary. This makes learning in multiagent systems more difficult than single-agent learning. A few methods, however, are known to guarantee convergence to equilibrium in the limit in such systems. In this paper we experimentally study one such technique, the minimax-Q, in a competitive domain and prove its equivalence with another well-known method for competitive domains. We study the rate of convergence of minimax-Q and investigate possible ways for increasing the same. We also present a variant of the algorithm, minimax-SARSA, and prove its convergence to minimax-Q values under appropriate conditions. Finally we show that this new algorithm performs better than simple minimax-Q in a general-sum domain as well.
Learning Mutual Trust
- In Working
, 2000
"... Multiagent learning literature has looked at iterated two-player games to develop mechanisms that allow agents to learn to converge on Nash Equilibrium strategy profiles. Such equilibrium configuration implies that there is no motivation for one player to change its strategy if the other does not. O ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
Multiagent learning literature has looked at iterated two-player games to develop mechanisms that allow agents to learn to converge on Nash Equilibrium strategy profiles. Such equilibrium configuration implies that there is no motivation for one player to change its strategy if the other does not. Often, in general sum games, a higher payoff can be obtained by both players if one chooses not to respond optimally to the other player. By developing mutual trust, agents can avoid iterated best responses that will lead to a lesser payoff Nash Equilibrium. In this paper we consider 1-level agents (modelers) who select actions based on expected utility considering probability distributions over the actions of the opponent(s). We show that in certain situations, such stochastically-greedy agents can perform better (by developing mutually trusting behavior) than those that explicitly attempt to converge to Nash Equilibrium.
An Experimental Analysis of Lemke-Howson Algorithm
, 811
"... We present an experimental investigation of the performance of the Lemke-Howson algorithm, which is the most widely used algorithm for the computation of a Nash equilibrium for bimatrix games. Lemke-Howson algorithm is based upon a simple pivoting strategy, which corresponds to following a path whos ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We present an experimental investigation of the performance of the Lemke-Howson algorithm, which is the most widely used algorithm for the computation of a Nash equilibrium for bimatrix games. Lemke-Howson algorithm is based upon a simple pivoting strategy, which corresponds to following a path whose endpoint is a Nash equilibrium. We analyze both the basic Lemke-Howson algorithm and a heuristic modification of it, which we designed to cope with the effects of a ‘bad ’ initial choice of the pivot. Our experimental findings show that, on uniformly random games, the heuristics achieves a linear running time, while the basic Lemke-Howson algorithm runs in time roughly proportional to a polynomial of degree seven. To conduct the experiments, we have developed our own implementation of Lemke-Howson algorithm, which turns out to be significantly faster than state-of-the-art software. This allowed us to run the algorithm on a much larger set of data, and on instances of much larger size, compared with previous work. 1
I.2.8 [Artificial Intelligence]: Learning—reinforcement
"... Multiagent learning literature has investigated iterated twoplayer games to develop mechanisms that allow agents to learn to converge on Nash Equilibrium strategy profiles. Such equilibrium configuration implies that there is no motivation for one player to change its strategy if the other does not. ..."
Abstract
- Add to MetaCart
Multiagent learning literature has investigated iterated twoplayer games to develop mechanisms that allow agents to learn to converge on Nash Equilibrium strategy profiles. Such equilibrium configuration implies that there is no motivation for one player to change its strategy if the other does not. Often, in general sum games, a higher payoff can be obtained by both players if one chooses not to respond optimally to the other player. By developing mutual trust, agents can avoid iterated best responses that will lead to a lesser payoff Nash Equilibrium. In this paper we work with agents who select actions based on expected utility calculations that incorporates the observed frequencies of the actions of the opponent(s). We augment this stochasticallygreedy agents with an interesting action revelation strategy that involves strategic revealing of one’s action to avoid worst-case, pessimistic moves. We argue that in certain situations, such apparently risky revealing can indeed produce better payoff than a non-revealing approach. In particular, it is possible to obtain Pareto-optimal solutions that dominate Nash Equilibrium. We present results over a large number of randomly generated payoff matrices of varying sizes and compare the payoffs of strategically revealing learners to payoffs at Nash equilibrium.
Reinforcement Learning in Multiagent Systems using Game Theory Concepts
, 2001
"... Reinforcement learning has attracted increasing interest in the machine learning and artificial intelligence communities during the past fifteen years. It promises a way of programming agents merely by using reward and punishment, without the need to encode methods to achieve a specific task. Tradit ..."
Abstract
- Add to MetaCart
Reinforcement learning has attracted increasing interest in the machine learning and artificial intelligence communities during the past fifteen years. It promises a way of programming agents merely by using reward and punishment, without the need to encode methods to achieve a specific task. Traditional reinforcement learning algorithms were concerned with the problems that face a single agent acting in an environment. However, no agent lives in a vacuum - it must interact with other agents in the environment to achieve its goal. Multiagent systems research is the subfield of artificial intelligence that aims to provide both principles for construction of complex systems involving multiple agents, and mechanisms for coordinating the behavior of independent agents. In this paper, I will review three approaches to using game theory concepts, in particular the notion of Nash equilibria, to aid in the research area of reinforcement learning in multiagent systems.

