Results 1  10
of
509
A domainindependent framework for modeling emotion
 Journal of Cognitive Systems Research
, 2004
"... The question is not whether intelligent machines can have any emotions, but whether machines can be intelligent without any emotions. – Marvin Minsky, (Minsky, 1986) p. 163 In every art form it is the emotional content that makes the difference between mere technical skill and true art. ..."
Abstract

Cited by 251 (31 self)
 Add to MetaCart
(Show Context)
The question is not whether intelligent machines can have any emotions, but whether machines can be intelligent without any emotions. – Marvin Minsky, (Minsky, 1986) p. 163 In every art form it is the emotional content that makes the difference between mere technical skill and true art.
The Communicative Multiagent Team Decision Problem: Analyzing Teamwork Theories and Models
 Journal of Artificial Intelligence Research
, 2002
"... Despite the significant progress in multiagent teamwork, existing research does not address the optimality of its prescriptions nor the complexity of the teamwork problem. Without a characterization of the optimalitycomplexity tradeoffs, it is impossible to determine whether the assumptions and app ..."
Abstract

Cited by 229 (23 self)
 Add to MetaCart
(Show Context)
Despite the significant progress in multiagent teamwork, existing research does not address the optimality of its prescriptions nor the complexity of the teamwork problem. Without a characterization of the optimalitycomplexity tradeoffs, it is impossible to determine whether the assumptions and approximations made by a particular theory gain enough efficiency to justify the losses in overall performance. To provide a tool for use by multiagent researchers in evaluating this tradeoff, we present a unified framework, the COMmunicative Multiagent Team Decision Problem (COMMTDP). The COMMTDP model combines and extends existing multiagent theories, such as decentralized partially observable Markov decision processes and economic team theory. In addition to their generality of representation, COMMTDPs also support the analysis of both the optimality of team performance and the computational complexity of the agents' decision problem. In analyzing complexity, we present a breakdown of the computational complexity of constructing optimal teams under various classes of problem domains, along the dimensions of observability and communication cost. In analyzing optimality, we exploit the COMMTDP's ability to encode existing teamwork theories and models to encode two instantiations of joint intentions theory taken from the literature. Furthermore, the COMMTDP model provides a basis for the development of novel team coordination algorithms. We derive a domainindependent criterion for optimal communication and provide a comparative analysis of the two joint intentions instantiations with respect to this optimal policy. We have implemented a reusable, domainindependent software package based on COMMTDPs to analyze teamwork coordination strategies, and we demons...
Valuefunction approximations for partially observable Markov decision processes
 Journal of Artificial Intelligence Research
, 2000
"... Partially observable Markov decision processes (POMDPs) provide an elegant mathematical framework for modeling complex decision and planning problems in stochastic domains in which states of the system are observable only indirectly, via a set of imperfect or noisy observations. The modeling advanta ..."
Abstract

Cited by 168 (1 self)
 Add to MetaCart
(Show Context)
Partially observable Markov decision processes (POMDPs) provide an elegant mathematical framework for modeling complex decision and planning problems in stochastic domains in which states of the system are observable only indirectly, via a set of imperfect or noisy observations. The modeling advantage of POMDPs, however, comes at a price — exact methods for solving them are computationally very expensive and thus applicable in practice only to very simple problems. We focus on efficient approximation (heuristic) methods that attempt to alleviate the computational problem and trade off accuracy for speed. We have two objectives here. First, we survey various approximation methods, analyze their properties and relations and provide some new insights into their differences. Second, we present a number of new approximation methods and novel refinements of existing techniques. The theoretical results are supported by experiments on a problem from the agent navigation domain. 1.
Reinforcement learning for RoboCupsoccer keepaway
 Adaptive Behavior
, 2005
"... 1 RoboCup simulated soccer presents many challenges to reinforcement learning methods, including a large state space, hidden and uncertain state, multiple independent agents learning simultaneously, and long and variable delays in the effects of actions. We describe our application of episodic SMD ..."
Abstract

Cited by 133 (35 self)
 Add to MetaCart
(Show Context)
1 RoboCup simulated soccer presents many challenges to reinforcement learning methods, including a large state space, hidden and uncertain state, multiple independent agents learning simultaneously, and long and variable delays in the effects of actions. We describe our application of episodic SMDP Sarsa(λ) with linear tilecoding function approximation and variable λ to learning higherlevel decisions in a keepaway subtask of RoboCup soccer. In keepaway, one team, “the keepers, ” tries to keep control of the ball for as long as possible despite the efforts of “the takers. ” The keepers learn individually when to hold the ball and when to pass to a teammate. Our agents learned policies that significantly outperform a range of benchmark policies. We demonstrate the generality of our approach by applying it to a number of task variations including different field sizes and different numbers of players on each team.
A framework for sequential planning in multiagent settings
 Journal of Artificial Intelligence Research
, 2005
"... This paper extends the framework of partially observable Markov decision processes (POMDPs) to multiagent settings by incorporating the notion of agent models into the state space. Agents maintain beliefs over physical states of the environment and over models of other agents, and they use Bayesian ..."
Abstract

Cited by 129 (32 self)
 Add to MetaCart
(Show Context)
This paper extends the framework of partially observable Markov decision processes (POMDPs) to multiagent settings by incorporating the notion of agent models into the state space. Agents maintain beliefs over physical states of the environment and over models of other agents, and they use Bayesian update to maintain their beliefs over time. The solutions map belief states to actions. Models of other agents may include their belief states and are related to agent types considered in games of incomplete information. We express the agents ’ autonomy by postulating that their models are not directly manipulable or observable by other agents. We show that important properties of POMDPs, such as convergence of value iteration, the rate of convergence, and piecewise linearity and convexity of the value functions carry over to our framework. Our approach complements a more traditional approach to interactive settings which uses Nash equilibria as a solution paradigm. We seek to avoid some of the drawbacks of equilibria which may be nonunique and are not able to capture offequilibrium behaviors. We do so at the cost of having to represent, process and continually revise models of other agents. Since the agent’s beliefs may be arbitrarily nested the optimal solutions to decision making problems are only asymptotically computable. However, approximate belief updates and approximately optimal plans are computable. We illustrate our framework using a simple application domain, and we show examples of belief updates and value functions. 1.
Scaling Reinforcement Learning toward RoboCup Soccer
, 2001
"... RoboCup simulated soccer presents many challenges to reinforcement learning methods, including a large state space, hidden and uncertain state, multiple agents, and long and variable delays in the eects of actions. We describe our application of episodic SMDP Sarsa() with linear tilecoding funct ..."
Abstract

Cited by 122 (23 self)
 Add to MetaCart
RoboCup simulated soccer presents many challenges to reinforcement learning methods, including a large state space, hidden and uncertain state, multiple agents, and long and variable delays in the eects of actions. We describe our application of episodic SMDP Sarsa() with linear tilecoding function approximation and variable to learning higherlevel decisions in a keepaway subtask of RoboCup soccer. In keepaway, one team, \the keepers," tries to keep control of the ball for as long as possible despite the eorts of \the takers." The keepers learn individually when to hold the ball and when to pass to a teammate, while the takers learn when to charge the ballholder and when to cover possible passing lanes. Our agents learned policies that signi cantly outperformed a range of benchmark policies. We demonstrate the generality of our approach by applying it to a number of task variations including dierent eld sizes and dierent numbers of players on each team.
Anytime pointbased approximations for large pomdps
 Journal of Artificial Intelligence Research
, 2006
"... The Partially Observable Markov Decision Process has long been recognized as a rich framework for realworld planning and control problems, especially in robotics. However exact solutions in this framework are typically computationally intractable for all but the smallest problems. A wellknown tech ..."
Abstract

Cited by 102 (7 self)
 Add to MetaCart
(Show Context)
The Partially Observable Markov Decision Process has long been recognized as a rich framework for realworld planning and control problems, especially in robotics. However exact solutions in this framework are typically computationally intractable for all but the smallest problems. A wellknown technique for speeding up POMDP solving involves performing value backups at specific belief points, rather than over the entire belief simplex. The efficiency of this approach, however, depends greatly on the selection of points. This paper presents a set of novel techniques for selecting informative belief points which work well in practice. The point selection procedure is combined with pointbased value backups to form an effective anytime POMDP algorithm called PointBased Value Iteration (PBVI). The first aim of this paper is to introduce this algorithm and present a theoretical analysis justifying the choice of belief selection technique. The second aim of this paper is to provide a thorough empirical comparison between PBVI and other stateoftheart POMDP methods, in particular the Perseus algorithm, in an effort to highlight their similarities and differences. Evaluation is performed using both standard POMDP domains and realistic robotic tasks.
Discovering hierarchy in reinforcement learning with hexq
 In Nineteenth International Conference on Machine Learning
, 2002
"... An open problem in reinforcement learning is discovering hierarchical structure. HEXQ, an algorithm which automatically attempts to decompose and solve a modelfree factored MDP hierarchically is described. By searching for aliased Markov subspace regions based on the state variables the algorithm ..."
Abstract

Cited by 95 (5 self)
 Add to MetaCart
(Show Context)
An open problem in reinforcement learning is discovering hierarchical structure. HEXQ, an algorithm which automatically attempts to decompose and solve a modelfree factored MDP hierarchically is described. By searching for aliased Markov subspace regions based on the state variables the algorithm uses temporal and state abstraction to construct a hierarchy of interlinked smaller MDPs. 1.
Protovalue functions: A laplacian framework for learning representation and control in markov decision processes
 Journal of Machine Learning Research
, 2006
"... This paper introduces a novel spectral framework for solving Markov decision processes (MDPs) by jointly learning representations and optimal policies. The major components of the framework described in this paper include: (i) A general scheme for constructing representations or basis functions by d ..."
Abstract

Cited by 92 (11 self)
 Add to MetaCart
(Show Context)
This paper introduces a novel spectral framework for solving Markov decision processes (MDPs) by jointly learning representations and optimal policies. The major components of the framework described in this paper include: (i) A general scheme for constructing representations or basis functions by diagonalizing symmetric diffusion operators (ii) A specific instantiation of this approach where global basis functions called protovalue functions (PVFs) are formed using the eigenvectors of the graph Laplacian on an undirected graph formed from state transitions induced by the MDP (iii) A threephased procedure called representation policy iteration comprising of a sample collection phase, a representation learning phase that constructs basis functions from samples, and a final parameter estimation phase that determines an (approximately) optimal policy within the (linear) subspace spanned by the (current) basis functions. (iv) A specific instantiation of the RPI framework using leastsquares policy iteration (LSPI) as the parameter estimation method (v) Several strategies for scaling the proposed approach to large discrete and continuous state spaces, including the Nyström extension for outofsample interpolation of eigenfunctions, and the use of Kronecker sum factorization to construct compact eigenfunctions in product spaces such as factored MDPs (vi) Finally, a series of illustrative discrete and continuous control tasks, which both illustrate the concepts and provide a benchmark for evaluating the proposed approach. Many challenges remain to be addressed in scaling the proposed framework to large MDPs, and several elaboration of the proposed framework are briefly summarized at the end.
A Survey of Research in Distributed, Continual Planning
, 2000
"... Complex, realworld domains require a rethinking of traditional approaches to AI planning. Planning and executing the resulting plans in a dynamic environment requires a continual approachinwhich planning and execution are interleaved, there may be uncertaintyin the current and projected world ..."
Abstract

Cited by 90 (2 self)
 Add to MetaCart
Complex, realworld domains require a rethinking of traditional approaches to AI planning. Planning and executing the resulting plans in a dynamic environment requires a continual approachinwhich planning and execution are interleaved, there may be uncertaintyin the current and projected world state, and replanning may be required when the situation changes or planned actions fail. Furthermore, complex planning and execution problems may require multiple computational agents and human planners to collaborate on a solution. In this article, we describe a new paradigm for planning in complex, dynamic environments, whichweterm distributed,continual planning (DCP). We argue that developing DCP systems will be necessary in order for planning applications to be successful in these environments. We give a historical overview of research leading up to the current state of the art in DCP, and describe research in distributed and continual planning. The increasing emphasis on r...