Results 1  10
of
196
Toward a Theory of Discounted Repeated Games with Imperfect Monitoring,”Econometrica
, 1990
"... Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at ..."
Abstract

Cited by 243 (2 self)
 Add to MetaCart
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
Rational Learning Leads to Nash Equilibrium
 Econometrica
, 1993
"... Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at ..."
Abstract

Cited by 215 (13 self)
 Add to MetaCart
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
Multiagent Learning Using a Variable Learning Rate
 Artificial Intelligence
, 2002
"... Learning to act in a multiagent environment is a difficult problem since the normal definition of an optimal policy no longer applies. The optimal policy at any moment depends on the policies of the other agents and so creates a situation of learning a moving target. Previous learning algorithms hav ..."
Abstract

Cited by 180 (8 self)
 Add to MetaCart
Learning to act in a multiagent environment is a difficult problem since the normal definition of an optimal policy no longer applies. The optimal policy at any moment depends on the policies of the other agents and so creates a situation of learning a moving target. Previous learning algorithms have one of two shortcomings depending on their approach. They either converge to a policy that may not be optimal against the specific opponents' policies, or they may not converge at all. In this article we examine this learning problem in the framework of stochastic games. We look at a number of previous learning algorithms showing how they fail at one of the above criteria. We then contribute a new reinforcement learning technique using a variable learning rate to overcome these shortcomings. Specifically, we introduce the WoLF principle, "Win or Learn Fast", for varying the learning rate. We examine this technique theoretically, proving convergence in selfplay on a restricted class of iterated matrix games. We also present empirical results on a variety of more general stochastic games, in situations of selfplay and otherwise, demonstrating the wide applicability of this method.
Algorithms for Sequential Decision Making
, 1996
"... Sequential decision making is a fundamental task faced by any intelligent agent in an extended interaction with its environment; it is the act of answering the question "What should I do now?" In this thesis, I show how to answer this question when "now" is one of a finite set of states, "do" is one ..."
Abstract

Cited by 175 (8 self)
 Add to MetaCart
Sequential decision making is a fundamental task faced by any intelligent agent in an extended interaction with its environment; it is the act of answering the question "What should I do now?" In this thesis, I show how to answer this question when "now" is one of a finite set of states, "do" is one of a finite set of actions, "should" is maximize a longrun measure of reward, and "I" is an automated planning or learning system (agent). In particular,
The Complexity of Stochastic Games
 Information and Computation
, 1992
"... We consider the complexity of stochastic games  simple games of chance played by two players. We show that the problem of deciding which player has the greatest chance of winning the game is in the class NP " coNP. 1 Introduction We consider the complexity of a natural combinatorial problem, tha ..."
Abstract

Cited by 155 (2 self)
 Add to MetaCart
We consider the complexity of stochastic games  simple games of chance played by two players. We show that the problem of deciding which player has the greatest chance of winning the game is in the class NP " coNP. 1 Introduction We consider the complexity of a natural combinatorial problem, that of deciding the outcome of a special kind of stochastic game. A simple stochastic game (SSG) is a directed graph with three types of vertices, called max, min and average vertices. There is a special start vertex and two special sink vertices, called the 0sink and the 1sink. For simplicity, we assume that all vertices have exactly two (not necessarily distinct) neighbors, except for the sink vertices, which have no neighbors. The graph models a game between two players, 0 and 1. In the game, a token is initially placed on the start vertex, and at each step of the game the token is moved from a vertex to one of its neighbors, according to the following rules: At a min vertex, player 0 cho...
Sequential optimality and coordination in multiagent systems
 In International Joint Conference on Artificial Intelligence
, 1999
"... Coordination of agent activities is a key problem in multiagent systems. Set in a larger decision theoretic context, the existence of coordination problems leads to difficulty in evaluating the utility of a situation. This in turn makes defining optimal policies for sequential decision processes pro ..."
Abstract

Cited by 144 (3 self)
 Add to MetaCart
Coordination of agent activities is a key problem in multiagent systems. Set in a larger decision theoretic context, the existence of coordination problems leads to difficulty in evaluating the utility of a situation. This in turn makes defining optimal policies for sequential decision processes problematic. We propose a method for solving sequential multiagent decision problems by allowing agents to reason explicitly about specific coordination mechanisms. We define an extension of value iteration in which the system’s state space is augmented with the state of the coordination mechanism adopted, allowing agents to reason about the short and long term prospects for coordination, the long term consequences of (mis)coordination, and make decisions to engage or avoid coordination problems based on expected value. We also illustrate the benefits of mechanism generalization. 1
Complexity Results about Nash Equilibria
, 2002
"... Noncooperative game theory provides a normative framework for analyzing strategic interactions. ..."
Abstract

Cited by 130 (10 self)
 Add to MetaCart
Noncooperative game theory provides a normative framework for analyzing strategic interactions.
Alternating refinement relations
 In Proceedings of the Ninth International Conference on Concurrency Theory (CONCUR’98), volume 1466 of LNCS
, 1998
"... Abstract. Alternating transition systems are a general model for composite systems which allow the study of collaborative as well as adversarial relationships between individual system components. Unlike in labeled transition systems, where each transition corresponds to a possible step of the syste ..."
Abstract

Cited by 123 (16 self)
 Add to MetaCart
Abstract. Alternating transition systems are a general model for composite systems which allow the study of collaborative as well as adversarial relationships between individual system components. Unlike in labeled transition systems, where each transition corresponds to a possible step of the system (which may involve some or all components), in alternating transition systems, each transition corresponds to a possible move in a game between the components. In this paper, we study refinement relations between alternating transition systems, such as “Does the implementation refine the set £ of specification components without constraining the components not in £? ” In particular, we generalize the definitions of the simulation and trace containment preorders from labeled transition systems to alternating transition systems. The generalizations are called alternating simulation and alternating trace containment. Unlike existing refinement relations, they allow the refinement of individual components within the context of a composite system description. We show that, like ordinary simulation, alternating simulation can be checked in polynomial time using a fixpoint computation algorithm. While ordinary trace containment is PSPACEcomplete, we establish alternating trace containment to be EXPTIMEcomplete. Finally, we present logical characterizations for the two preorders in terms of ATL, a temporal logic capable of referring to games between system components. 1
Dynamic Programming for Partially Observable Stochastic Games
 IN PROCEEDINGS OF THE NINETEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE
, 2004
"... We develop an exact dynamic programming algorithm for partially observable stochastic games (POSGs). The algorithm is a synthesis of dynamic programming for partially observable Markov decision processes (POMDPs) and iterated elimination of dominated strategies in normal form games. ..."
Abstract

Cited by 119 (23 self)
 Add to MetaCart
We develop an exact dynamic programming algorithm for partially observable stochastic games (POSGs). The algorithm is a synthesis of dynamic programming for partially observable Markov decision processes (POMDPs) and iterated elimination of dominated strategies in normal form games.
A Survey of Computational Complexity Results in Systems and Control
, 2000
"... The purpose of this paper is twofold: (a) to provide a tutorial introduction to some key concepts from the theory of computational complexity, highlighting their relevance to systems and control theory, and (b) to survey the relatively recent research activity lying at the interface between these fi ..."
Abstract

Cited by 116 (21 self)
 Add to MetaCart
The purpose of this paper is twofold: (a) to provide a tutorial introduction to some key concepts from the theory of computational complexity, highlighting their relevance to systems and control theory, and (b) to survey the relatively recent research activity lying at the interface between these fields. We begin with a brief introduction to models of computation, the concepts of undecidability, polynomial time algorithms, NPcompleteness, and the implications of intractability results. We then survey a number of problems that arise in systems and control theory, some of them classical, some of them related to current research. We discuss them from the point of view of computational complexity and also point out many open problems. In particular, we consider problems related to stability or stabilizability of linear systems with parametric uncertainty, robust control, timevarying linear systems, nonlinear and hybrid systems, and stochastic optimal control.