Results 1 
6 of
6
Markov games as a framework for multiagent reinforcement learning
 IN PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING
, 1994
"... In the Markov decision process (MDP) formalization of reinforcement learning, a single adaptive agent interacts with an environment defined by a probabilistic transition function. In this solipsistic view, secondary agents can only be part of the environment and are therefore fixed in their behavior ..."
Abstract

Cited by 500 (10 self)
 Add to MetaCart
In the Markov decision process (MDP) formalization of reinforcement learning, a single adaptive agent interacts with an environment defined by a probabilistic transition function. In this solipsistic view, secondary agents can only be part of the environment and are therefore fixed in their behavior. The framework of Markov games allows us to widen this view to include multiple adaptive agents with interacting or competing goals. This paper considers a step in this direction in which exactly two agents with diametrically opposed goals share an environment. It describes a Qlearninglike algorithm for finding optimal policies and demonstrates its application to a simple twoplayer game in which the optimal policy is probabilistic.
Constrained Discounted Dynamic Programming
 MATH. OF OPERATIONS RESEARCH
, 1996
"... This paper deals with constrained optimization of Markov Decision Processes with a countable state space, compact action sets, continuous transition probabilities, and upper semicontinuous reward functions. The objective is to maximize the expected total discounted reward for one reward function, u ..."
Abstract

Cited by 18 (8 self)
 Add to MetaCart
This paper deals with constrained optimization of Markov Decision Processes with a countable state space, compact action sets, continuous transition probabilities, and upper semicontinuous reward functions. The objective is to maximize the expected total discounted reward for one reward function, under several inequality constraints on similar criteria with other reward functions. Sippose a
A Teaching Strategy for MemoryBased Control
, 1997
"... Combining different machine learning algorithms in the same system can produce benefits above and beyond what either method could achieve alone. This paper demonstrates that genetic algorithms can be used in conjunction with lazy learning to solve examples of a difficult class of delayed reinforceme ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
Combining different machine learning algorithms in the same system can produce benefits above and beyond what either method could achieve alone. This paper demonstrates that genetic algorithms can be used in conjunction with lazy learning to solve examples of a difficult class of delayed reinforcement learning problems better than either method alone. This class, the class of differential games, includes numerous important control problems that arise in robotics, planning, game playing, and other areas, and solutions for differential games suggest solution strategies for the general class of planning and control problems. We conducted a series of experiments applying three learning approacheslazy Qlearning, knearest neighbor (kNN), and a genetic algorithmto a particular differential game called a pursuit game. Our experiments demonstrate that kNN had great difficulty solving the problem, while a lazy version of Qlearning performed moderately well and the genetic algorithm pe...
Relational Markov games
 In Proceedings JELIA2004, Vol. 3229 of LNCS/LNAI
, 2004
"... Abstract. Towards a compact and elaborationtolerant firstorder representation of Markov games, we introduce relational Markov games, which combine standard Markov games with firstorder action descriptions in a stochastic variant of the situation calculus. We focus on the zerosum twoagent case, ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Abstract. Towards a compact and elaborationtolerant firstorder representation of Markov games, we introduce relational Markov games, which combine standard Markov games with firstorder action descriptions in a stochastic variant of the situation calculus. We focus on the zerosum twoagent case, where we have two agents with diametrically opposed goals. We also present a symbolic value iteration algorithm for computing Nash policy pairs in this framework. 1
MODELING SHORTEST PATH GAMES WITH PETRI NETS: A LYAPUNOV BASED THEORY
"... In this paper we introduce a new modeling paradigm for shortest path games representation with Petri nets. Whereas previous works have restricted attention to tracking the net using Bellman’s equation as a utility function, this work uses a Lyapunovlike function. In this sense, we change the tradit ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
In this paper we introduce a new modeling paradigm for shortest path games representation with Petri nets. Whereas previous works have restricted attention to tracking the net using Bellman’s equation as a utility function, this work uses a Lyapunovlike function. In this sense, we change the traditional cost function by a trajectorytracking function which is also an optimal costtotarget function. This makes a significant difference in the conceptualization of the problem domain, allowing the replacement of the Nash equilibrium point by the Lyapunov equilibrium point in game theory. We show that the Lyapunov equilibrium point coincides with the Nash equilibrium point. As a consequence, all properties of equilibrium and stability are preserved in game theory. This is the most important contribution of this work. The potential of this approach remains in its formal proof simplicity for the existence of an equilibrium point.
A Mixed Value and Policy Iteration Method for Stochastic Control with Universally Measurable Policies,” Lab. for Info. and Decision Systems Report LIDSP2905
, 2013
"... We consider the stochastic control model with Borel spaces and universally measurable policies. For this model the standard policy iteration is known to have difficult measurability issues and cannot be carried out in general. We present a mixed value and policy iteration method that circumvents thi ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
We consider the stochastic control model with Borel spaces and universally measurable policies. For this model the standard policy iteration is known to have difficult measurability issues and cannot be carried out in general. We present a mixed value and policy iteration method that circumvents this difficulty. The method allows the use of stationary policies in computing the optimal cost function, in a manner that resembles policy iteration. It can also be used to address similar difficulties of policy iteration in the context of upper and lower semicontinuous models. We analyze the convergence of the method in infinite horizon total cost problems, for the discounted case where the onestage costs are bounded, and for the undiscounted case where the onestage costs are nonpositive or nonnegative. For the undiscounted total cost problems with nonnegative onestage costs, we also give a new convergence theorem for value iteration, which shows that value iteration converges whenever it is initialized with a function that is above the optimal cost function and yet bounded by a multiple of the optimal cost function. This condition resembles Whittle’s bridging condition and is partly motivated by it. The theorem is also partly motivated by a result of Maitra