Results 1  10
of
212
Reinforcement learning: a survey
 Journal of Artificial Intelligence Research
, 1996
"... This paper surveys the field of reinforcement learning from a computerscience perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem ..."
Abstract

Cited by 1693 (27 self)
 Add to MetaCart
(Show Context)
This paper surveys the field of reinforcement learning from a computerscience perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trialanderror interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word "reinforcement." The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.
Planning and acting in partially observable stochastic domains
 ARTIFICIAL INTELLIGENCE
, 1998
"... In this paper, we bring techniques from operations research to bear on the problem of choosing optimal actions in partially observable stochastic domains. We begin by introducing the theory of Markov decision processes (mdps) and partially observable mdps (pomdps). We then outline a novel algorithm ..."
Abstract

Cited by 1089 (42 self)
 Add to MetaCart
(Show Context)
In this paper, we bring techniques from operations research to bear on the problem of choosing optimal actions in partially observable stochastic domains. We begin by introducing the theory of Markov decision processes (mdps) and partially observable mdps (pomdps). We then outline a novel algorithm for solving pomdps offline and show how, in some cases, a finitememory controller can be extracted from the solution to a pomdp. We conclude with a discussion of how our approach relates to previous work, the complexity of finding exact solutions to pomdps, and of some possibilities for finding approximate solutions.
The complexity of computing a Nash equilibrium
, 2006
"... We resolve the question of the complexity of Nash equilibrium by showing that the problem of computing a Nash equilibrium in a game with 4 or more players is complete for the complexity class PPAD. Our proof uses ideas from the recentlyestablished equivalence between polynomialtime solvability of n ..."
Abstract

Cited by 324 (23 self)
 Add to MetaCart
(Show Context)
We resolve the question of the complexity of Nash equilibrium by showing that the problem of computing a Nash equilibrium in a game with 4 or more players is complete for the complexity class PPAD. Our proof uses ideas from the recentlyestablished equivalence between polynomialtime solvability of normalform games and graphical games, and shows that these kinds of games can implement arbitrary members of a PPADcomplete class of Brouwer functions. 1
Algorithms for Sequential Decision Making
, 1996
"... Sequential decision making is a fundamental task faced by any intelligent agent in an extended interaction with its environment; it is the act of answering the question "What should I do now?" In this thesis, I show how to answer this question when "now" is one of a finite set of ..."
Abstract

Cited by 213 (9 self)
 Add to MetaCart
(Show Context)
Sequential decision making is a fundamental task faced by any intelligent agent in an extended interaction with its environment; it is the act of answering the question "What should I do now?" In this thesis, I show how to answer this question when "now" is one of a finite set of states, "do" is one of a finite set of actions, "should" is maximize a longrun measure of reward, and "I" is an automated planning or learning system (agent). In particular,
A Survey of Computational Complexity Results in Systems and Control
, 2000
"... The purpose of this paper is twofold: (a) to provide a tutorial introduction to some key concepts from the theory of computational complexity, highlighting their relevance to systems and control theory, and (b) to survey the relatively recent research activity lying at the interface between these fi ..."
Abstract

Cited by 188 (21 self)
 Add to MetaCart
The purpose of this paper is twofold: (a) to provide a tutorial introduction to some key concepts from the theory of computational complexity, highlighting their relevance to systems and control theory, and (b) to survey the relatively recent research activity lying at the interface between these fields. We begin with a brief introduction to models of computation, the concepts of undecidability, polynomial time algorithms, NPcompleteness, and the implications of intractability results. We then survey a number of problems that arise in systems and control theory, some of them classical, some of them related to current research. We discuss them from the point of view of computational complexity and also point out many open problems. In particular, we consider problems related to stability or stabilizability of linear systems with parametric uncertainty, robust control, timevarying linear systems, nonlinear and hybrid systems, and stochastic optimal control.
Valuefunction approximations for partially observable Markov decision processes
 Journal of Artificial Intelligence Research
, 2000
"... Partially observable Markov decision processes (POMDPs) provide an elegant mathematical framework for modeling complex decision and planning problems in stochastic domains in which states of the system are observable only indirectly, via a set of imperfect or noisy observations. The modeling advanta ..."
Abstract

Cited by 168 (1 self)
 Add to MetaCart
(Show Context)
Partially observable Markov decision processes (POMDPs) provide an elegant mathematical framework for modeling complex decision and planning problems in stochastic domains in which states of the system are observable only indirectly, via a set of imperfect or noisy observations. The modeling advantage of POMDPs, however, comes at a price — exact methods for solving them are computationally very expensive and thus applicable in practice only to very simple problems. We focus on efficient approximation (heuristic) methods that attempt to alleviate the computational problem and trade off accuracy for speed. We have two objectives here. First, we survey various approximation methods, analyze their properties and relations and provide some new insights into their differences. Second, we present a number of new approximation methods and novel refinements of existing techniques. The theoretical results are supported by experiments on a problem from the agent navigation domain. 1.
The Complexity of Mean Payoff Games on Graphs
 THEORETICAL COMPUTER SCIENCE
, 1996
"... We study the complexity of finding the values and optimal strategies of mean payoff games on graphs, a family of perfect information games introduced by Ehrenfeucht and Mycielski and considered by Gurvich, Karzanov and Khachiyan. We describe a pseudopolynomial time algorithm for the solution of suc ..."
Abstract

Cited by 149 (4 self)
 Add to MetaCart
(Show Context)
We study the complexity of finding the values and optimal strategies of mean payoff games on graphs, a family of perfect information games introduced by Ehrenfeucht and Mycielski and considered by Gurvich, Karzanov and Khachiyan. We describe a pseudopolynomial time algorithm for the solution of such games, the decision problem for which is in NP " coNP. Finally, we describe a polynomial reduction from mean payoff games to the simple stochastic games studied by Condon. These games are also known to be in NP " coNP, but no polynomial or pseudopolynomial time algorithm is known for them.
Local Model Checking Games
 In: Proceedings of CONCUR
, 1995
"... Model checking is a very successful technique for verifying temporal properties of nite state concurrent systems. It is standard to view this method as essentially algorithmic, and consequently a very fruitful relationship between temporal logics and automata has been developed. In the case of bran ..."
Abstract

Cited by 98 (8 self)
 Add to MetaCart
(Show Context)
Model checking is a very successful technique for verifying temporal properties of nite state concurrent systems. It is standard to view this method as essentially algorithmic, and consequently a very fruitful relationship between temporal logics and automata has been developed. In the case of branching time logics the
The Computational Complexity of Probabilistic Planning
 Journal of Artificial Intelligence Research
, 1998
"... We examine the computational complexity of testing and finding small plans in probabilistic planning domains with both flat and propositional representations. The complexity of plan evaluation and existence varies with the plan type sought; we examine totally ordered plans, acyclic plans, and loopin ..."
Abstract

Cited by 92 (6 self)
 Add to MetaCart
(Show Context)
We examine the computational complexity of testing and finding small plans in probabilistic planning domains with both flat and propositional representations. The complexity of plan evaluation and existence varies with the plan type sought; we examine totally ordered plans, acyclic plans, and looping plans, and partially ordered plans under three natural definitions of plan value. We show that problems of interest are complete for a variety of complexity classes: PL, P, NP, coNP, PP, NP PP, coNP PP , and PSPACE. In the process of proving that certain planning problems are complete for NP PP , we introduce a new basic NP PP complete problem, EMajsat, which generalizes the standard Boolean satisfiability problem to computations involving probabilistic quantities; our results suggest that the development of good heuristics for EMajsat could be important for the creation of efficient algorithms for a wide variety of problems.
Modal and Temporal Logics for Processes
, 1996
"... this paper have been presented at the 4th European Summer School in Logic, Language and Information, University of Essex, 1992; at the Tempus Summer School for Algebraic and Categorical Methods in Computer Science, Masaryk University, Brno, 1993; and the Summer School in Logic Methods in Concurrency ..."
Abstract

Cited by 91 (2 self)
 Add to MetaCart
(Show Context)
this paper have been presented at the 4th European Summer School in Logic, Language and Information, University of Essex, 1992; at the Tempus Summer School for Algebraic and Categorical Methods in Computer Science, Masaryk University, Brno, 1993; and the Summer School in Logic Methods in Concurrency, Aarhus University, 1993. I would like to thank the organisers and the participants of these summer schools, and of the Banff higher order workshop. I would also like to thank Julian Bradfield for use of his Tex tree constructor for building derivation trees and Carron Kirkwood, Faron Moller, Perdita Stevens and David Walker for comments on earlier drafts.