Results 1  10
of
13
Learning to Cooperate via Policy Search
, 2000
"... Cooperative games are those in which both agents share the same payoff structure. Valuebased reinforcementlearning algorithms, such as variants of Qlearning, have been applied to learning cooperative games, but they only apply when the game state is completely observable to both agents. Poli ..."
Abstract

Cited by 140 (4 self)
 Add to MetaCart
(Show Context)
Cooperative games are those in which both agents share the same payoff structure. Valuebased reinforcementlearning algorithms, such as variants of Qlearning, have been applied to learning cooperative games, but they only apply when the game state is completely observable to both agents. Policy search methods are a reasonable alternative to valuebased methods for partially observable environments. In this paper, we provide a gradientbased distributed policysearch method for cooperative games and compare the notion of local optimum to that of Nash equilibrium. We demonstrate the effectiveness of this method experimentally in a small, partially observable simulated soccer domain. 1 INTRODUCTION The interaction of decision makers who share an environment is traditionally studied in game theory and economics. The game theoretic formalism is very general, and analyzes the problem in terms of solution concepts such as Nash equilibrium [12], but usually works under the assu...
Reinforcement Learning by Policy Search
, 2000
"... One objective of artificial intelligence is to model the behavior of an intelligent agent interacting with its environment. The environment's transformations could be modeled as a Markov chain, whose state is partially observable to the agent and affected by its actions; such processes are know ..."
Abstract

Cited by 29 (2 self)
 Add to MetaCart
One objective of artificial intelligence is to model the behavior of an intelligent agent interacting with its environment. The environment's transformations could be modeled as a Markov chain, whose state is partially observable to the agent and affected by its actions; such processes are known as partially observable Markov decision processes (POMDPs). While the environment's dynamics are assumed to obey certain rules, the agent does not know them and must learn. In this dissertation we focus on the agent's adaptation as captured by the reinforcement learning framework. Reinforcement learning means learning a policya mapping of observations into actionsbased on feedback from the environment. The learning can be viewed as browsing a set of policies while evaluating them by trial through interaction with the environment. The set of policies being searched is constrained by the architecture of the agent's controller. POMDPs require a controller to have a memory. We investigate various architectures for controllers with memory, including controllers with external memory, finite state controllers and distributed controllers for multiagent system. For these various controllers we work out the details of the algorithms which learn by ascending the gradient of expected cumulative reinforcement. Building on statistical learning theory and experiment design theory, a policy evaluation algorithm is developed for the case of experience reuse. We address the question of sufficient experience for uniform convergence of policy evaluation and obtain sample complexity bounds for various estimators. Finally, we demonstrate the performance of the proposed algorithms on several domains, the most complex of which is simulated adaptive packet routing in a telecommunication network.
Imitation  Theory and experimental evidence
, 2004
"... We introduce a generalized theoretical approach to study imitation and subject it to rigorous experimental testing. In our theoretical analysis we find that the different predictions of previous imitation models are due to different informational assumptions, not to different behavioral rules. It is ..."
Abstract

Cited by 20 (6 self)
 Add to MetaCart
We introduce a generalized theoretical approach to study imitation and subject it to rigorous experimental testing. In our theoretical analysis we find that the different predictions of previous imitation models are due to different informational assumptions, not to different behavioral rules. It is more important whom one imitates rather than how. In a laboratory experiment we test the different theories by systematically varying information conditions. We find significant effects of seemingly innocent changes in information. Moreover, the generalized imitation model predicts the differences between treatments well. The data provide support for imitation on the individual level, both in terms of choice and in terms of perception. But imitation is not unconditional. Rather individuals’ propensity to imitate more successful actions is increasing in payoff differences.
Through Trial & Error to Collusion
, 1998
"... In this note we study a very simple trial & error learning process in the context of a Cournot oligopoly. Without any knowledge of the payoff functions players increase, respectively decrease, their quantity by one unit as long as this leads to higher profits. We show that this process converges ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
In this note we study a very simple trial & error learning process in the context of a Cournot oligopoly. Without any knowledge of the payoff functions players increase, respectively decrease, their quantity by one unit as long as this leads to higher profits. We show that this process converges to a collusive outcome.
Noisy Directional Learning and the Logit Equilibrium
 SCANDINAVIAN JOURNAL OF ECONOMICS
, 2004
"... We specify a dynamic model in which agents adjust their decisions toward higher payoffs, subject to normal error. This process generates a probability distribution of players’ decisions that evolves over time according to the Fokker–Planck equation. The dynamic process is stable for all potential ga ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
We specify a dynamic model in which agents adjust their decisions toward higher payoffs, subject to normal error. This process generates a probability distribution of players’ decisions that evolves over time according to the Fokker–Planck equation. The dynamic process is stable for all potential games, a class of payoff structures that includes several widely studied games. In equilibrium, the distributions that determine expected payoffs correspond to the distributions that arise from the logit function applied to those expected payoffs. This "logit equilibrium" forms a stochastic generalization of the Nash equilibrium and provides a possible explanation of anomalous laboratory data.
Bounded Rationality: Static versus Dynamic Approach
, 2002
"... Two kinds of theories of the boundedly rational behavior are possible. Static theories focus on stationary behavior and do not include any explicit mechanism for temporal change. Dynamic theories, on the other hand, explicitly model the finegrain adjustments made by the subjects in response to the ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Two kinds of theories of the boundedly rational behavior are possible. Static theories focus on stationary behavior and do not include any explicit mechanism for temporal change. Dynamic theories, on the other hand, explicitly model the finegrain adjustments made by the subjects in response to their recent experiences. The main contribution of this paper is to argue that the restrictions usually imposed on the distribution of choices in the static approach are generically not supported by an explicit adjustment mechanism. 1
Through Trial & Error to Collusion  The Discrete Case
, 2000
"... In this note we study a very simple trial & error learning process in the context of a Cournot oligopoly. Without any knowledge of the payoff functions players increase, respectively decrease, their quantity by one unit as long as this leads to higher profits. We show that despite the absence of ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
In this note we study a very simple trial & error learning process in the context of a Cournot oligopoly. Without any knowledge of the payoff functions players increase, respectively decrease, their quantity by one unit as long as this leads to higher profits. We show that despite the absence of any coordination or punishing device this process converges to a collusive outcome.
NearPotential Games: Geometry and Dynamics
, 2011
"... Potential games are a special class of games for which many adaptive user dynamics converge to a Nash equilibrium. In this paper, we study properties of nearpotential games, i.e., games that are close in terms of payoffs to potential games, and show that such games admit similar limiting dynamics. ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Potential games are a special class of games for which many adaptive user dynamics converge to a Nash equilibrium. In this paper, we study properties of nearpotential games, i.e., games that are close in terms of payoffs to potential games, and show that such games admit similar limiting dynamics. We first present a distance notion in the space of games and study the geometry of potential games and sets of games that are equivalent, with respect to various equivalence relations, to potential games. We discuss how given an arbitrary game, one can find a nearby game in these sets. We then study dynamics in nearpotential games by focusing on continuoustime fictitious play dynamics. We characterize the limiting behavior of this dynamics in terms of the level sets of the potential function of a close potential game and approximate equilibria of the game. Exploiting structural properties of approximate equilibrium sets, we strengthen our result and show that for games that are sufficiently close to a potential game, the sequence of mixed strategies generated by this dynamics converges to a small neighborhood of equilibria whose size is a function of the distance from the set of potential games. We also consider continuoustime ɛfictitious play dynamics, a variant of fictitious play dynamics where players update their strategies only when the utility improvement is larger than some fixed level ɛ. When the game is sufficiently close to a potential game and ɛ is small, we establish convergence of this dynamics to a small neighborhood of equilibria.
Incentives for Boundedly Rational Agents
"... stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher, bepress, which has been given certain exclusive rights by the author. Topics in Theoretical Economics is pr ..."
Abstract
 Add to MetaCart
stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher, bepress, which has been given certain exclusive rights by the author. Topics in Theoretical Economics is produced by The Berkeley Electronic Press (bepress).