Results 1  10
of
25
Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm
, 1998
"... In this paper, we adopt generalsum stochastic games as a framework for multiagent reinforcement learning. Our work extends previous work by Littman on zerosum stochastic games to a broader framework. We design a multiagent Qlearning method under this framework, and prove that it converges to a Na ..."
Abstract

Cited by 315 (4 self)
 Add to MetaCart
In this paper, we adopt generalsum stochastic games as a framework for multiagent reinforcement learning. Our work extends previous work by Littman on zerosum stochastic games to a broader framework. We design a multiagent Qlearning method under this framework, and prove that it converges to a Nash equilibrium under specified conditions. This algorithm is useful for finding the optimal strategy when there exists a unique Nash equilibrium in the game. When there exist multiple Nash equilibria in the game, this algorithm should be combined with other learning techniques to find optimal strategies.
Multiagent Learning Using a Variable Learning Rate
 Artificial Intelligence
, 2002
"... Learning to act in a multiagent environment is a difficult problem since the normal definition of an optimal policy no longer applies. The optimal policy at any moment depends on the policies of the other agents and so creates a situation of learning a moving target. Previous learning algorithms hav ..."
Abstract

Cited by 219 (9 self)
 Add to MetaCart
Learning to act in a multiagent environment is a difficult problem since the normal definition of an optimal policy no longer applies. The optimal policy at any moment depends on the policies of the other agents and so creates a situation of learning a moving target. Previous learning algorithms have one of two shortcomings depending on their approach. They either converge to a policy that may not be optimal against the specific opponents' policies, or they may not converge at all. In this article we examine this learning problem in the framework of stochastic games. We look at a number of previous learning algorithms showing how they fail at one of the above criteria. We then contribute a new reinforcement learning technique using a variable learning rate to overcome these shortcomings. Specifically, we introduce the WoLF principle, "Win or Learn Fast", for varying the learning rate. We examine this technique theoretically, proving convergence in selfplay on a restricted class of iterated matrix games. We also present empirical results on a variety of more general stochastic games, in situations of selfplay and otherwise, demonstrating the wide applicability of this method.
Fast Concurrent Reinforcement Learners
 IN PROCEEDINGS OF THE SEVENTEENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE
, 2001
"... When several agents learn concurrently, the payoff received by an agent is dependent on the behavior of the other agents. As the other agents learn, the reward of one agent becomes nonstationary. This makes learning in multiagent systems more difficult than singleagent learning. A few methods ..."
Abstract

Cited by 41 (6 self)
 Add to MetaCart
When several agents learn concurrently, the payoff received by an agent is dependent on the behavior of the other agents. As the other agents learn, the reward of one agent becomes nonstationary. This makes learning in multiagent systems more difficult than singleagent learning. A few methods, however, are known to guarantee convergence to equilibrium in the limit in such systems. In this paper we experimentally study one such technique, the minimaxQ, in a competitive domain and prove its equivalence with another wellknown method for competitive domains. We study the rate of convergence of minimaxQ and investigate possible ways for increasing the same. We also present a variant of the algorithm, minimaxSARSA, and prove its convergence to minimaxQ values under appropriate conditions. Finally we show that this new algorithm performs better than simple minimaxQ in a generalsum domain as well.
Learning Mutual Trust
 In Working
, 2000
"... Multiagent learning literature has looked at iterated twoplayer games to develop mechanisms that allow agents to learn to converge on Nash Equilibrium strategy profiles. Such equilibrium configuration implies that there is no motivation for one player to change its strategy if the other does not. O ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
(Show Context)
Multiagent learning literature has looked at iterated twoplayer games to develop mechanisms that allow agents to learn to converge on Nash Equilibrium strategy profiles. Such equilibrium configuration implies that there is no motivation for one player to change its strategy if the other does not. Often, in general sum games, a higher payoff can be obtained by both players if one chooses not to respond optimally to the other player. By developing mutual trust, agents can avoid iterated best responses that will lead to a lesser payoff Nash Equilibrium. In this paper we consider 1level agents (modelers) who select actions based on expected utility considering probability distributions over the actions of the opponent(s). We show that in certain situations, such stochasticallygreedy agents can perform better (by developing mutually trusting behavior) than those that explicitly attempt to converge to Nash Equilibrium.
An Experimental Analysis of LemkeHowson Algorithm
, 2008
"... We present an experimental investigation of the performance of the LemkeHowson algorithm, which is the most widely used algorithm for the computation of a Nash equilibrium for bimatrix games. LemkeHowson algorithm is based upon a simple pivoting strategy, which corresponds to following a path whos ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
We present an experimental investigation of the performance of the LemkeHowson algorithm, which is the most widely used algorithm for the computation of a Nash equilibrium for bimatrix games. LemkeHowson algorithm is based upon a simple pivoting strategy, which corresponds to following a path whose endpoint is a Nash equilibrium. We analyze both the basic LemkeHowson algorithm and a heuristic modification of it, which we designed to cope with the effects of a ‘bad ’ initial choice of the pivot. Our experimental findings show that, on uniformly random games, the heuristics achieves a linear running time, while the basic LemkeHowson algorithm runs in time roughly proportional to a polynomial of degree seven. To conduct the experiments, we have developed our own implementation of LemkeHowson algorithm, which turns out to be significantly faster than stateoftheart software. This allowed us to run the algorithm on a much larger set of data, and on instances of much larger size, compared with previous work.
The author is grateful to participants in the OSU Theory/Experimental brownbag seminar
, 2011
"... for helpful comments and suggestions. ..."
A Glimpse at Paul G. Spirakis
"... Paul Spirakis is an eminent, talented, and influential researcher that contributed significantly to computer science. This article is a modest attempt of a biograph ..."
Abstract
 Add to MetaCart
(Show Context)
Paul Spirakis is an eminent, talented, and influential researcher that contributed significantly to computer science. This article is a modest attempt of a biograph
Towards a Paretooptimal solution in . . .
, 2003
"... Multiagent learning literature has investigated iterated twoplayer games to develop mechanisms that allow agents to learn to converge on Nash Equilibrium strategy profiles. Such equilibrium configuration implies that there is no motivation for one player to change its strategy if the other does not. ..."
Abstract
 Add to MetaCart
Multiagent learning literature has investigated iterated twoplayer games to develop mechanisms that allow agents to learn to converge on Nash Equilibrium strategy profiles. Such equilibrium configuration implies that there is no motivation for one player to change its strategy if the other does not. Often, in general sum games, a higher payoff can be obtained by both players if one chooses not to respond optimally to the other player. By developing mutual trust, agents can avoid iterated best responses that will lead to a lesser payoff Nash Equilibrium. In this paper we work with agents who select actions based on expected utility calculations that incorporates the observed frequencies of the actions of the opponent(s). We augment this stochasticallygreedy agents with an interesting action revelation strategy that involves strategic revealing of one’s action to avoid worstcase, pessimistic moves. We argue that in certain situations, such apparently risky revealing can indeed produce better payoff than a nonrevealing approach. In particular, it is possible to obtain Paretooptimal solutions that dominate Nash Equilibrium. We present results over a large number of randomly generated payoff matrices of varying sizes and compare the payoffs of strategically revealing learners to payoffs at Nash equilibrium.