Results 1  10
of
34
Markov games as a framework for multiagent reinforcement learning
 IN PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING
, 1994
"... In the Markov decision process (MDP) formalization of reinforcement learning, a single adaptive agent interacts with an environment defined by a probabilistic transition function. In this solipsistic view, secondary agents can only be part of the environment and are therefore fixed in their behavior ..."
Abstract

Cited by 533 (11 self)
 Add to MetaCart
In the Markov decision process (MDP) formalization of reinforcement learning, a single adaptive agent interacts with an environment defined by a probabilistic transition function. In this solipsistic view, secondary agents can only be part of the environment and are therefore fixed in their behavior. The framework of Markov games allows us to widen this view to include multiple adaptive agents with interacting or competing goals. This paper considers a step in this direction in which exactly two agents with diametrically opposed goals share an environment. It describes a Qlearninglike algorithm for finding optimal policies and demonstrates its application to a simple twoplayer game in which the optimal policy is probabilistic.
Algorithms for Sequential Decision Making
, 1996
"... Sequential decision making is a fundamental task faced by any intelligent agent in an extended interaction with its environment; it is the act of answering the question "What should I do now?" In this thesis, I show how to answer this question when "now" is one of a finite set of ..."
Abstract

Cited by 182 (8 self)
 Add to MetaCart
(Show Context)
Sequential decision making is a fundamental task faced by any intelligent agent in an extended interaction with its environment; it is the act of answering the question "What should I do now?" In this thesis, I show how to answer this question when "now" is one of a finite set of states, "do" is one of a finite set of actions, "should" is maximize a longrun measure of reward, and "I" is an automated planning or learning system (agent). In particular,
A framework for sequential planning in multiagent settings
 Journal of Artificial Intelligence Research
, 2005
"... This paper extends the framework of partially observable Markov decision processes (POMDPs) to multiagent settings by incorporating the notion of agent models into the state space. Agents maintain beliefs over physical states of the environment and over models of other agents, and they use Bayesian ..."
Abstract

Cited by 95 (26 self)
 Add to MetaCart
This paper extends the framework of partially observable Markov decision processes (POMDPs) to multiagent settings by incorporating the notion of agent models into the state space. Agents maintain beliefs over physical states of the environment and over models of other agents, and they use Bayesian update to maintain their beliefs over time. The solutions map belief states to actions. Models of other agents may include their belief states and are related to agent types considered in games of incomplete information. We express the agents ’ autonomy by postulating that their models are not directly manipulable or observable by other agents. We show that important properties of POMDPs, such as convergence of value iteration, the rate of convergence, and piecewise linearity and convexity of the value functions carry over to our framework. Our approach complements a more traditional approach to interactive settings which uses Nash equilibria as a solution paradigm. We seek to avoid some of the drawbacks of equilibria which may be nonunique and are not able to capture offequilibrium behaviors. We do so at the cost of having to represent, process and continually revise models of other agents. Since the agent’s beliefs may be arbitrarily nested the optimal solutions to decision making problems are only asymptotically computable. However, approximate belief updates and approximately optimal plans are computable. We illustrate our framework using a simple application domain, and we show examples of belief updates and value functions. 1.
A unified analysis of valuefunctionbased reinforcementlearning algorithms. Neural Computation
, 1997
"... Reinforcement learning is the problem of generating optimal behavior in a sequential decisionma.king environment given the opportunity of interacting,vith it. Many algorithms for solving reinforcementlearning problems work by computing improved estimates of the optimal value function. \Ve extend p ..."
Abstract

Cited by 34 (7 self)
 Add to MetaCart
(Show Context)
Reinforcement learning is the problem of generating optimal behavior in a sequential decisionma.king environment given the opportunity of interacting,vith it. Many algorithms for solving reinforcementlearning problems work by computing improved estimates of the optimal value function. \Ve extend prior analyses of reinforcementlearning algorithms and present a powerful new theorem that can provide a unified analysis of valuefunctionbased reinforcementlearning algorithms. The usefulness of the theorem lies in how it allows the convergence of a complex asynchronous reinforcementlearning algorithm to be proven by verifying that a Himplcr HynchronouH algorithm convergeH. \Ve illuHtrate the application of the theorem by analyzing the convergence of Qlearningl modelbased reinforcement learning, Qlearning with multistate updates, Qlearning for:\farkov games, and risksensitive reinforcement learning. 1
Combinatorial Games under Auction Play
, 1997
"... A Richman game is a combinatorial game in which, rather than alternating moves, the two players bid for the privilege of making the next move. The theory of such games is a hybrid between the classical theory of games [von Neumann, Morgenstern, Aumann, . . . ] and the combinatorial theory of games [ ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
A Richman game is a combinatorial game in which, rather than alternating moves, the two players bid for the privilege of making the next move. The theory of such games is a hybrid between the classical theory of games [von Neumann, Morgenstern, Aumann, . . . ] and the combinatorial theory of games [Berlekamp, Conway, Guy, . . . ]. We expand upon our previous work by considering games with infinitely many positions, and several variants including the Poorman variant in which the high bidder pays the bank (rather than the other player). The algorithmic complexity of our procedure for computing optimal moves is found to be polynomial in several important cases.
Gametheoretic agent programming in Golog
 Proceedings ECAI2004
, 2004
"... We present the agent programming language GTGolog, which integrates explicit agent programming in Golog with gametheoretic multiagent planning in Markov games. It is a generalization of DTGolog to a multiagent setting, where we have two competing single agents or two competing teams of agents. ..."
Abstract

Cited by 8 (6 self)
 Add to MetaCart
(Show Context)
We present the agent programming language GTGolog, which integrates explicit agent programming in Golog with gametheoretic multiagent planning in Markov games. It is a generalization of DTGolog to a multiagent setting, where we have two competing single agents or two competing teams of agents. The language allows for specifying a control program for a single agent or a team of agents in a highlevel logical language. The control program is then completed by an interpreter in an optimal way against another single agent or another team of agents, by viewing it as a generalization of a Markov game, and computing a Nash strategy. We illustrate the usefulness of this approach along a robotic soccer example. We also report on a first prototype implementation of a simple GTGolog interpreter.
Adaptive multiagent programming in GTGolog
 In Proceedings KI2006, Vol. 4314 of LNCS/LNAI
, 2007
"... Abstract. We present a novel approach to adaptive multiagent programming, which is based on an integration of the agent programming language GTGolog with adaptive dynamic programming techniques. GTGolog combines explicit agent programming in Golog with multiagent planning in stochastic games. A dr ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
Abstract. We present a novel approach to adaptive multiagent programming, which is based on an integration of the agent programming language GTGolog with adaptive dynamic programming techniques. GTGolog combines explicit agent programming in Golog with multiagent planning in stochastic games. A drawback of this framework, however, is that the transition probabilities and reward values of the domain must be known in advance and then cannot change anymore. But such data is often not available in advance and may also change over the time. The adaptive generalization of GTGolog in this paper is directed towards letting the agents themselves explore and adapt these data, which is more useful for realistic applications. We use highlevel programs for generating both abstract states and optimal policies, which benefits from the deep integration between action theory and highlevel programs in the Golog framework. 1
Gametheoretic agent programming in Golog under partial observability
 In Proc. KI2006, LNCS / LNAI
, 2007
"... Abstract. We present the agent programming language POGTGolog, which integrates explicit agent programming in Golog with gametheoretic multiagent planning in partially observable stochastic games. It deals with the case of one team of cooperative agents under partial observability, where the agent ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Abstract. We present the agent programming language POGTGolog, which integrates explicit agent programming in Golog with gametheoretic multiagent planning in partially observable stochastic games. It deals with the case of one team of cooperative agents under partial observability, where the agents may have different initial belief states and not necessarily the same rewards. POGTGolog allows for specifying a partial control program in a highlevel logical language, which is then completed by an interpreter in an optimal way. To this end, we define a formal semantics of POGTGolog programs in terms of Nash equilibria, and we specify a POGTGolog interpreter that computes one of these Nash equilibria. We illustrate the usefulness of POGTGolog along a rugby scenario. 1
M.L.: Algorithms for informed cows
 In: AAAI 1997 Workshop on OnLine Search
, 1997
"... We extend the classic online search problem known as the cowpath problem to the case in which goal locations are selected according to one of a set of possible known probability distributions. We present a polynomialtime linearprogramming algorithm for this problem. ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
We extend the classic online search problem known as the cowpath problem to the case in which goal locations are selected according to one of a set of possible known probability distributions. We present a polynomialtime linearprogramming algorithm for this problem.
Relational Markov games
 In Proceedings JELIA2004, Vol. 3229 of LNCS/LNAI
, 2004
"... Abstract. Towards a compact and elaborationtolerant firstorder representation of Markov games, we introduce relational Markov games, which combine standard Markov games with firstorder action descriptions in a stochastic variant of the situation calculus. We focus on the zerosum twoagent case, ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
Abstract. Towards a compact and elaborationtolerant firstorder representation of Markov games, we introduce relational Markov games, which combine standard Markov games with firstorder action descriptions in a stochastic variant of the situation calculus. We focus on the zerosum twoagent case, where we have two agents with diametrically opposed goals. We also present a symbolic value iteration algorithm for computing Nash policy pairs in this framework. 1