A new consequence of Simpson’s paradox: Stable co-operation in one-shot Prisoner’s Dilemma from populations of individualistic learning agents
BibTeX
@MISC{Chater_anew,
author = {Nick Chater and Ivo Vlaev and Maurice Grinberg},
title = {A new consequence of Simpson’s paradox: Stable co-operation in one-shot Prisoner’s Dilemma from populations of individualistic learning agents},
year = {}
}
OpenURL
Abstract
* corresponding author Normative theories of individual choice in economics typically assume that interacting agents should each act individualistically: i.e., they should maximize their own utility function. Specifically, game theory proposes that interaction should be governed by Nash equilibria. Computationally limited agents (whether artificial, animal or human) may not, however, have the capacity to carry out the sophisticated reasoning to converge directly on Nash equilibria. Nonetheless it is often assumed that Nash equilibria will be obtained, in any case, if agents embody simple learning algorithms like reinforcement learning. If so, then learners should converge on Nash equilibria, after sufficient practice in playing a game---and hence, for example, individualistic agents should end up playing D (defect) in one-shot Prisoners ’ Dilemmas (PD). In an experiment and in a multi-agent simulation, we show, however, that this is not always the case---under certain circumstances, reinforcement learners can converge on co-operative behaviour in PD. That is, even though each agent would receive higher pay-off from switching to D, agents obtain more reinforcement, on average, from playing C, and hence C is more strongly reinforced. This effect arises from a well-known statistical paradox, Simpson’s paradox. We speculate that this effect may be relevant to some aspects of real-world human co-operative behaviour. 2







