Results 1 -
3 of
3
Markov games as a framework for multi-agent reinforcement learning
- IN PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING
, 1994
"... In the Markov decision process (MDP) formalization of reinforcement learning, a single adaptive agent interacts with an environment defined by a probabilistic transition function. In this solipsistic view, secondary agents can only be part of the environment and are therefore fixed in their behavior ..."
Abstract
-
Cited by 417 (10 self)
- Add to MetaCart
In the Markov decision process (MDP) formalization of reinforcement learning, a single adaptive agent interacts with an environment defined by a probabilistic transition function. In this solipsistic view, secondary agents can only be part of the environment and are therefore fixed in their behavior. The framework of Markov games allows us to widen this view to include multiple adaptive agents with interacting or competing goals. This paper considers a step in this direction in which exactly two agents with diametrically opposed goals share an environment. It describes a Q-learning-like algorithm for finding optimal policies and demonstrates its application to a simple two-player game in which the optimal policy is probabilistic.
Algorithms for Sequential Decision Making
, 1996
"... Sequential decision making is a fundamental task faced by any intelligent agent in an extended interaction with its environment; it is the act of answering the question "What should I do now?" In this thesis, I show how to answer this question when "now" is one of a finite set of states, "do" is one ..."
Abstract
-
Cited by 158 (7 self)
- Add to MetaCart
Sequential decision making is a fundamental task faced by any intelligent agent in an extended interaction with its environment; it is the act of answering the question "What should I do now?" In this thesis, I show how to answer this question when "now" is one of a finite set of states, "do" is one of a finite set of actions, "should" is maximize a long-run measure of reward, and "I" is an automated planning or learning system (agent). In particular,
Report for Publication of the Activity of the Working Group Neural and Computational Learning (NeuroCOLT 8556)
, 1997
"... joint meetings and individual visits. We begin this section of the report by giving an overview of the achievements of the NeuroCOLT project. Our intention is that by reading this section alone the reader will be able to gain a clear idea of the main contributions NeuroCOLT partners have made both ..."
Abstract
- Add to MetaCart
joint meetings and individual visits. We begin this section of the report by giving an overview of the achievements of the NeuroCOLT project. Our intention is that by reading this section alone the reader will be able to gain a clear idea of the main contributions NeuroCOLT partners have made both to the scientific community and and to scientific advancement of the topics covered by the project. Highlighting the key contributions has inevitably meant that many individual pieces of research are not mentioned in this section. In order to ensure as complete a picture as possible is given, the final section of this part gives a detailed account of the work of the individual sites. A summary of the activities including tables of statistical information of the visits made, Technical Reports produced are given in the middle section. An Overview of NeuroCOLT's Achievements 2 1 An Overview of NeuroCOLT's Achievements Adaptive Learning techniques hold out enormous pr

