Results 1  10
of
52
Learning by Trial and Error
, 2008
"... A person learns by trial and error if he occasionally tries out new strategies, rejecting choices that are erroneous in the sense that they do not lead to higher payoffs. In a game, however, strategies can become erroneous due to a change of behavior by someone else. We introduce a learning rule in ..."
Abstract

Cited by 17 (3 self)
 Add to MetaCart
A person learns by trial and error if he occasionally tries out new strategies, rejecting choices that are erroneous in the sense that they do not lead to higher payoffs. In a game, however, strategies can become erroneous due to a change of behavior by someone else. We introduce a learning rule in which behavior is conditional on whether a player experiences an error of the first or second type. This rule, called interactive trial and error learning, implements Nash equilibrium behavior in any game with generic payoffs and at least one pure Nash equilibrium. JEL Classification: C72, D83
AspirationBased Reinforcement Learning In Repeated Interaction Games: An Overview
, 2001
"... In models of aspirationbased... This paper provides an informal overview of a range of such theories applied to repeated interaction games. We describe different models of aspiration formation: where (1) aspirations are fixed but required to be consistent with longrun average payoffs; (2) aspiratio ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
In models of aspirationbased... This paper provides an informal overview of a range of such theories applied to repeated interaction games. We describe different models of aspiration formation: where (1) aspirations are fixed but required to be consistent with longrun average payoffs; (2) aspirations evolve based on past personal experience or of previous generations of players; and (3) aspirations are based on the experience of peers. Convergence to nonNash outcomes may result in either of these formulations. Indeed, cooperative behaviour can emerge and survive in the long run, even though it may be a strictly dominated strategy in the stage game, and despite the myopic adaptation of stage game strategies. Differences between reinforcement learning and evolutionary game theory are also discussed.
The efficiency of adapting aspiration levels
 Proceedings of the Royal Society of London Series BBiological Sciences 266
, 1999
"... review. Views or opinions expressed herein do not necessarily represent those of the Institute, its National Member Organizations, or other organizations supporting the work. IIASA STUDIES IN ADAPTIVE DYNAMICS NO. 33 The Adaptive Dynamics Network at IIASA fosters the development of new mathematical ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
review. Views or opinions expressed herein do not necessarily represent those of the Institute, its National Member Organizations, or other organizations supporting the work. IIASA STUDIES IN ADAPTIVE DYNAMICS NO. 33 The Adaptive Dynamics Network at IIASA fosters the development of new mathematical and conceptual techniques for understanding the evolution of complex adaptive systems. Focusing on these longterm implications of adaptive processes in systems of limited growth, the Adaptive Dynamics Network brings together scientists and institutions from around the world with IIASA acting as the central node. Scientific progress within the network
Neglect Tolerant Teaming: Issues and Dilemmas
 In proceedings of the 2003 AAAI Spring Symposium on Human Interaction with Autonomous Systems in Complex Environments
"... In this paper, a brief overview of neglecttolerant humanrobot interaction is presented. Recent results of a neglecttolerance study are then summarized. The problem is then posed of how neglect tolerance affects how a human interacts with multiple robots, and a scenario is constructed that illust ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
(Show Context)
In this paper, a brief overview of neglecttolerant humanrobot interaction is presented. Recent results of a neglecttolerance study are then summarized. The problem is then posed of how neglect tolerance affects how a human interacts with multiple robots, and a scenario is constructed that illustrates how multiple robot management can produce a problem with the form of a prisoner’s dilemma. An abstraction of this prisoner’s dilemma problem is then presented, and two robot learning algorithms are outlined that may address key points in this abstracted dilemma.
Distributed dynamic reinforcement of efficient outcomes in multiagent coordination
, 2007
"... We consider the problem of achieving distributed convergence to coordination in a multiagent environment. Each agent is modeled as a learning automaton which repeatedly interacts with an unknown environment, receives a reward, and updates the probabilities of its next action based on its own prev ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
(Show Context)
We consider the problem of achieving distributed convergence to coordination in a multiagent environment. Each agent is modeled as a learning automaton which repeatedly interacts with an unknown environment, receives a reward, and updates the probabilities of its next action based on its own previous actions and received rewards. In this class of problems, more than one stable equilibrium (i.e., coordination structure) exists. We analyze the dynamic behavior of the distributed system in terms of convergence to an efficient equilibrium, suitably defined. In particular, we analyze the effect of dynamic processing on convergence properties, where agents include the derivative of their own reward into the decision process (i.e., derivative action). We show that derivative action can be used as an equilibrium selection scheme by appropriately adjusting derivative feedback gains.
Learning Efficient Nash Equilibria in Distributed Systems
, 2010
"... Abstract. An individual’s learning rule is completely uncoupled if it does not depend on the actions or payoffs of anyone else. We propose a variant of log linear learning that is completely uncoupled and that selects an efficient pure Nash equilibrium in all generic nperson games that possess at l ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
Abstract. An individual’s learning rule is completely uncoupled if it does not depend on the actions or payoffs of anyone else. We propose a variant of log linear learning that is completely uncoupled and that selects an efficient pure Nash equilibrium in all generic nperson games that possess at least one pure Nash equilibrium. In games that do not have such an equilibrium, there is a simple formula that expresses the longrun probability of the various disequilibrium states in terms of two factors: i) the sum of payoffs over all agents, and ii) the maximum payoff gain that results from a unilateral deviation by some agent. This welfare/stability tradeoff criterion provides a novel framework for analyzing the selection of disequilibrium as well as equilibrium states in nperson games. JEL: C72, C73 1 1. Learning equilibrium in complex interactive systems Game theory has traditionally focussed on situations that involve a small number of players. In these environments it makes sense to assume that players know the structure of the game and can predict the strategic behavior of their opponents. But there are many situations involving huge numbers of players where these assumptions are not particularly persuasive.
Regret testing: A simple payoffbased procedure for learning Nash equilibrium
 Games Econ. Behav
, 2004
"... constructive comments on an earlier draft. 1 2 A learning rule is uncoupled if a player does not condition his strategy on the opponent’s payoffs. It is radically uncoupled if the player does not condition his strategy on the opponent’s actions or payoffs. We demonstrate a simple class of radically ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
(Show Context)
constructive comments on an earlier draft. 1 2 A learning rule is uncoupled if a player does not condition his strategy on the opponent’s payoffs. It is radically uncoupled if the player does not condition his strategy on the opponent’s actions or payoffs. We demonstrate a simple class of radically uncoupled learning rules, patterned after aspiration learning models, whose periodbyperiod behavior comes arbitrarily close to Nash equilibrium behavior in any finite twoperson game. 1 Payoffbased learning rules In this paper we propose a class of simple, adaptive learning rules that depend only on players ’ realized payoffs, such that when two players employ a rule from this class their periodbyperiod strategic behavior approximates Nash equilibrium behavior. Like reinforcement and aspiration models, this type of rule depends only on summary statistics that are derived from the players’ received payoffs; 1 indeed the players do not even need to know they are involved in a game for them to learn equilibrium eventually. To position our contribution with respect to the recent literature, we need to consider three separate issues: i) the amount of information needed to implement a learning rule; ii) the type of equilibrium to which the learning process tends (Nash, correlated, etc.); iii) the sense in which the process can be said to “approximate ” the type of equilibrium behavior in question. (For a further discussion of these issues see Young, 2004) Consider, for example, the recently discovered regret matching rules of Hart and MasColell (2000, 2001). The essential idea is that players randomize among actions in proportion to their regrets from not having played those actions in the past. Like the regrettesting rules we introduce here,
Measuring Beliefs and Rewards: A Neuroeconomic Approach.” mimeo
, 2009
"... The neurotransmitter dopamine is central to the emerging discipline of neuroeconomics; it is hypothesized to encode the difference between expected and realized rewards and thereby to mediate belief formation and choice. We develop the first formal test of this theory of dopaminergic function, based ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
The neurotransmitter dopamine is central to the emerging discipline of neuroeconomics; it is hypothesized to encode the difference between expected and realized rewards and thereby to mediate belief formation and choice. We develop the first formal test of this theory of dopaminergic function, based on a recent axiomatization by Caplin and Dean [2008A]. These tests are satisfied by neural activity in the nucleus accumbens, an area rich in dopamine receptors. Intriguingly, we find evidence for separate positive and negative reward prediction error signals, a novel empirical result suggesting that behavioral asymmetries in response to losses and gains may be encoded by activity in the nucleus accumbens. Our findings provide researchers with new methods for studying beliefs, learning, and choice. 1
Aspiration learning in coordination games
 in IEEE Conference on Decision and Control
, 2010
"... Abstract — We consider the problem of distributed convergence to efficient outcomes in coordination games through payoffbased learning dynamics, namely aspiration learning. The proposed learning scheme assumes that players reinforce well performed actions, by successively playing these actions, oth ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
Abstract — We consider the problem of distributed convergence to efficient outcomes in coordination games through payoffbased learning dynamics, namely aspiration learning. The proposed learning scheme assumes that players reinforce well performed actions, by successively playing these actions, otherwise they randomize among alternative actions. Our first contribution is the characterization of the asymptotic behavior of the induced Markov chain of the iterated process by an equivalent finitestate Markov chain, which simplifies previously introduced analysis on aspiration learning. We then characterize explicitly the behavior of the proposed aspiration learning in a generalized version of socalled coordination games, an example of which is network formation games. In particular, we show that in coordination games the expected percentage of time that the efficient action profile is played can become arbitrarily large. I.