Results 11  20
of
1,678
Multiagent Learning Using a Variable Learning Rate
 Artificial Intelligence
, 2002
"... Learning to act in a multiagent environment is a difficult problem since the normal definition of an optimal policy no longer applies. The optimal policy at any moment depends on the policies of the other agents and so creates a situation of learning a moving target. Previous learning algorithms hav ..."
Abstract

Cited by 225 (9 self)
 Add to MetaCart
Learning to act in a multiagent environment is a difficult problem since the normal definition of an optimal policy no longer applies. The optimal policy at any moment depends on the policies of the other agents and so creates a situation of learning a moving target. Previous learning algorithms have one of two shortcomings depending on their approach. They either converge to a policy that may not be optimal against the specific opponents' policies, or they may not converge at all. In this article we examine this learning problem in the framework of stochastic games. We look at a number of previous learning algorithms showing how they fail at one of the above criteria. We then contribute a new reinforcement learning technique using a variable learning rate to overcome these shortcomings. Specifically, we introduce the WoLF principle, "Win or Learn Fast", for varying the learning rate. We examine this technique theoretically, proving convergence in selfplay on a restricted class of iterated matrix games. We also present empirical results on a variety of more general stochastic games, in situations of selfplay and otherwise, demonstrating the wide applicability of this method.
Treebased batch mode reinforcement learning
 Journal of Machine Learning Research
, 2005
"... Reinforcement learning aims to determine an optimal control policy from interaction with a system or from observations gathered from a system. In batch mode, it can be achieved by approximating the socalled Qfunction based on a set of fourtuples (xt,ut,rt,xt+1) where xt denotes the system state a ..."
Abstract

Cited by 224 (40 self)
 Add to MetaCart
Reinforcement learning aims to determine an optimal control policy from interaction with a system or from observations gathered from a system. In batch mode, it can be achieved by approximating the socalled Qfunction based on a set of fourtuples (xt,ut,rt,xt+1) where xt denotes the system state at time t, ut the control action taken, rt the instantaneous reward obtained and xt+1 the successor state of the system, and by determining the control policy from this Qfunction. The Qfunction approximation may be obtained from the limit of a sequence of (batch mode) supervised learning problems. Within this framework we describe the use of several classical treebased supervised learning methods (CART, Kdtree, tree bagging) and two newly proposed ensemble algorithms, namely extremely and totally randomized trees. We study their performances on several examples and find that the ensemble methods based on regression trees perform well in extracting relevant information about the optimal control policy from sets of fourtuples. In particular, the totally randomized trees give good results while ensuring the convergence of the sequence, whereas by relaxing the convergence constraint even better accuracy results are provided by the extremely randomized trees.
Learning About a New Technology: Pineapple
 Yale University
, 2000
"... This paper investigates the role of social learning in the diffusion of a new agricultural technology in Ghana. We use unique data on farmers ’ communication patterns to define each individual’s information neighborhood, the set of others from whom he might learn. Our empirical strategy is to test w ..."
Abstract

Cited by 224 (8 self)
 Add to MetaCart
This paper investigates the role of social learning in the diffusion of a new agricultural technology in Ghana. We use unique data on farmers ’ communication patterns to define each individual’s information neighborhood, the set of others from whom he might learn. Our empirical strategy is to test whether farmers adjust their inputs to align with those of their information neighbors who were surprisingly successful in previous periods. We present evidence that farmers adopt surprisingly successful information neighbors ’ practices, conditional on many potentially confounding factors including common growing conditions, credit arrangements, clan membership, and religion. The relationship of these input adjustments to experience further supports their interpretation as resulting from social learning. In adThe authors have benefittedfromtheadviceofRichardAkresh,Federico Bandi, Alan Bester, Dirk Bergemann,
Algorithms for Sequential Decision Making
, 1996
"... Sequential decision making is a fundamental task faced by any intelligent agent in an extended interaction with its environment; it is the act of answering the question "What should I do now?" In this thesis, I show how to answer this question when "now" is one of a finite set of ..."
Abstract

Cited by 213 (9 self)
 Add to MetaCart
(Show Context)
Sequential decision making is a fundamental task faced by any intelligent agent in an extended interaction with its environment; it is the act of answering the question "What should I do now?" In this thesis, I show how to answer this question when "now" is one of a finite set of states, "do" is one of a finite set of actions, "should" is maximize a longrun measure of reward, and "I" is an automated planning or learning system (agent). In particular,
Automating the Construction of Internet Portals with Machine Learning
 Information Retrieval
, 2000
"... Domainspecific internet portals are growing in popularity because they gather content from the Web and organize it for easy access, retrieval and search. For example, www.campsearch.com allows complex queries by age, location, cost and specialty over summer camps. This functionality is not possible ..."
Abstract

Cited by 211 (4 self)
 Add to MetaCart
Domainspecific internet portals are growing in popularity because they gather content from the Web and organize it for easy access, retrieval and search. For example, www.campsearch.com allows complex queries by age, location, cost and specialty over summer camps. This functionality is not possible with general, Webwide search engines. Unfortunately these portals are difficult and timeconsuming to maintain. This paper advocates the use of machine learning techniques to greatly automate the creation and maintenance of domainspecific Internet portals. We describe new research in reinforcement learning, information extraction and text classification that enables efficient spidering, the identification of informative text segments, and the population of topic hierarchies. Using these techniques, we have built a demonstration system: a portal for computer science research papers. It already contains over 50,000 papers and is publicly available at www.cora.justresearch.com. These techniques are ...
A Stochastic Model of HumanMachine Interaction for Learning Dialog Strategies
 IEEE Trans. on Speech and Audio Processing
, 2000
"... ..."
(Show Context)
Probabilistic Algorithms in Robotics
 AI Magazine vol
"... This article describes a methodology for programming robots known as probabilistic robotics. The probabilistic paradigm pays tribute to the inherent uncertainty in robot perception, relying on explicit representations of uncertainty when determining what to do. This article surveys some of the progr ..."
Abstract

Cited by 199 (6 self)
 Add to MetaCart
(Show Context)
This article describes a methodology for programming robots known as probabilistic robotics. The probabilistic paradigm pays tribute to the inherent uncertainty in robot perception, relying on explicit representations of uncertainty when determining what to do. This article surveys some of the progress in the field, using indepth examples to illustrate some of the nuts and bolts of the basic approach. Our central conjecture is that the probabilistic approach to robotics scales better to complex realworld applications than approaches that ignore a robot’s uncertainty. 1
Probabilistic Algorithms and the Interactive Museum TourGuide Robot Minerva
, 2000
"... This paper describes Minerva, an interactive tourguide robot that was successfully deployed in a Smithsonian museum. Minerva's software is pervasively probabilistic, relying on explicit representations of uncertainty in perception and control. This article describes ..."
Abstract

Cited by 192 (39 self)
 Add to MetaCart
(Show Context)
This paper describes Minerva, an interactive tourguide robot that was successfully deployed in a Smithsonian museum. Minerva's software is pervasively probabilistic, relying on explicit representations of uncertainty in perception and control. This article describes
Stochastic Dynamic Programming with Factored Representations
, 1997
"... Markov decision processes(MDPs) have proven to be popular models for decisiontheoretic planning, but standard dynamic programming algorithms for solving MDPs rely on explicit, statebased specifications and computations. To alleviate the combinatorial problems associated with such methods, we propo ..."
Abstract

Cited by 190 (10 self)
 Add to MetaCart
(Show Context)
Markov decision processes(MDPs) have proven to be popular models for decisiontheoretic planning, but standard dynamic programming algorithms for solving MDPs rely on explicit, statebased specifications and computations. To alleviate the combinatorial problems associated with such methods, we propose new representational and computational techniques for MDPs that exploit certain types of problem structure. We use dynamic Bayesian networks (with decision trees representing the local families of conditional probability distributions) to represent stochastic actions in an MDP, together with a decisiontree representation of rewards. Based on this representation, we develop versions of standard dynamic programming algorithms that directly manipulate decisiontree representations of policies and value functions. This generally obviates the need for statebystate computation, aggregating states at the leaves of these trees and requiring computations only for each aggregate state. The key to these algorithms is a decisiontheoretic generalization of classic regression analysis, in which we determine the features relevant to predicting expected value. We demonstrate the method empirically on several planning problems,
Cooperative MultiAgent Learning: The State of the Art
 Autonomous Agents and MultiAgent Systems
, 2005
"... Cooperative multiagent systems are ones in which several agents attempt, through their interaction, to jointly solve tasks or to maximize utility. Due to the interactions among the agents, multiagent problem complexity can rise rapidly with the number of agents or their behavioral sophistication. ..."
Abstract

Cited by 182 (8 self)
 Add to MetaCart
(Show Context)
Cooperative multiagent systems are ones in which several agents attempt, through their interaction, to jointly solve tasks or to maximize utility. Due to the interactions among the agents, multiagent problem complexity can rise rapidly with the number of agents or their behavioral sophistication. The challenge this presents to the task of programming solutions to multiagent systems problems has spawned increasing interest in machine learning techniques to automate the search and optimization process. We provide a broad survey of the cooperative multiagent learning literature. Previous surveys of this area have largely focused on issues common to specific subareas (for example, reinforcement learning or robotics). In this survey we attempt to draw from multiagent learning work in a spectrum of areas, including reinforcement learning, evolutionary computation, game theory, complex systems, agent modeling, and robotics. We find that this broad view leads to a division of the work into two categories, each with its own special issues: applying a single learner to discover joint solutions to multiagent problems (team learning), or using multiple simultaneous learners, often one per agent (concurrent learning). Additionally, we discuss direct and indirect communication in connection with learning, plus open issues in task decomposition, scalability, and adaptive dynamics. We conclude with a presentation of multiagent learning problem domains, and a list of multiagent learning resources. 1