Results 11  20
of
94
Kernelbased least squares policy iteration for reinforcement learning
 IEEE Transactions on Neural Networks
, 2007
"... Abstract—In this paper, we present a kernelbased least squares policy iteration (KLSPI) algorithm for reinforcement learning (RL) in large or continuous state spaces, which can be used to realize adaptive feedback control of uncertain dynamic systems. By using KLSPI, nearoptimal control policies c ..."
Abstract

Cited by 23 (0 self)
 Add to MetaCart
(Show Context)
Abstract—In this paper, we present a kernelbased least squares policy iteration (KLSPI) algorithm for reinforcement learning (RL) in large or continuous state spaces, which can be used to realize adaptive feedback control of uncertain dynamic systems. By using KLSPI, nearoptimal control policies can be obtained without much a priori knowledge on dynamic models of control plants. In KLSPI, Mercer kernels are used in the policy evaluation of a policy iteration process, where a new kernelbased least squares temporaldifference algorithm called KLSTDQ is proposed for efficient policy evaluation. To keep the sparsity and improve the generalization ability of KLSTDQ solutions, a kernel sparsification procedure based on approximate linear dependency (ALD) is performed. Compared to the previous works on approximate RL methods, KLSPI makes two progresses to eliminate the main difficulties of existing results. One is the better convergence and (near) optimality guarantee by using the KLSTDQ algorithm for policy evaluation with high precision. The other is the automatic feature selection using the ALDbased kernel sparsification. Therefore, the KLSPI algorithm provides a general RL method with generalization performance and convergence guarantee for largescale Markov decision problems (MDPs). Experimental results on a typical RL task for a stochastic chain problem demonstrate that KLSPI can consistently achieve better learning efficiency and policy quality than the previous least squares policy iteration (LSPI) algorithm. Furthermore, the KLSPI method was also evaluated on two nonlinear feedback control problems, including a ship heading control problem and the swing up control of a doublelink underactuated pendulum called acrobot. Simulation results illustrate that the proposed method can optimize controller performance using little a priori information of uncertain dynamic systems. It is also demonstrated that KLSPI can be applied to online learning control by incorporating an initial controller to ensure online performance. Index Terms—Approximate dynamic programming, kernel methods, least squares, Markov decision problems (MDPs), reinforcement learning (RL).
Design and analysis of optimization algorithms using computational statistics
 Applied Numerical Analysis & Computational Mathematics (ANACM
, 2004
"... We propose a highly flexible sequential methodology for the experimental analysis of optimization algorithms. The proposed technique employs computational statistic methods to investigate the interactions among optimization problems, algorithms, and environments. The workings of the proposed techniq ..."
Abstract

Cited by 19 (11 self)
 Add to MetaCart
(Show Context)
We propose a highly flexible sequential methodology for the experimental analysis of optimization algorithms. The proposed technique employs computational statistic methods to investigate the interactions among optimization problems, algorithms, and environments. The workings of the proposed technique are illustrated on the parameterization and comparison of both a population–based and a direct search algorithm, on a well– known benchmark problem, as well as on a simplified model of a real–world problem. Experimental results are reported and conclusions are derived. c ○ 2004 WILEYVCH Verlag GmbH & Co. KGaA, Weinheim 1
Hierarchical Multiagent Reinforcement Learning
, 2004
"... In this paper, we investigate the use of hierarchical reinforcement learning (HRL) to speed up the acquisition of cooperative multiagent tasks. We introduce a hierarchical multiagent reinforcement learning (RL) framework and propose a hierarchical multiagent RL algorithm called Cooperative HRL. In o ..."
Abstract

Cited by 19 (5 self)
 Add to MetaCart
(Show Context)
In this paper, we investigate the use of hierarchical reinforcement learning (HRL) to speed up the acquisition of cooperative multiagent tasks. We introduce a hierarchical multiagent reinforcement learning (RL) framework and propose a hierarchical multiagent RL algorithm called Cooperative HRL. In our approach, agents are cooperative and homogeneous (use the same task decomposition). Learning is decentralized, with each agent learning three interrelated skills: how to perform subtasks, which order to do them in, and how to coordinate with other agents. We define cooperative subtasks to be those subtasks in which coordination among agents significantly improves the performance of the overall task. Those levels of the hierarchy which include cooperative subtasks are called cooperation levels. Since coordination at high levels allows for increased cooperation skills as agents do not get confused by lowlevel details, we usually define cooperative subtasks at the high levels of the hierarchy. This hierarchical approach allows agents to learn coordination faster by sharing information at the level of cooperative subtasks, rather than attempting to learn
Machine Learning in Games: A Survey
 MACHINES THAT LEARN TO PLAY GAMES, CHAPTER 2
, 2000
"... This paper provides a survey of previously published work on machine learning in game playing. The material is organized around a variety of problems that typically arise in game playing and that can be solved with machine learning methods. This approach, we believe, allows both, researchers in g ..."
Abstract

Cited by 18 (3 self)
 Add to MetaCart
This paper provides a survey of previously published work on machine learning in game playing. The material is organized around a variety of problems that typically arise in game playing and that can be solved with machine learning methods. This approach, we believe, allows both, researchers in game playing to find appropriate learning techniques for helping to solve their problems as well as machine learning researchers to identify rewarding topics for further research in gameplaying domains. The paper covers learning techniques that range from neural networks to decision tree learning in games that range from poker to chess.
The decentralised coordination of selfadaptive components for autonomic distributed systems
"... I, the undersigned, declare that this work has not previously been submitted to this or any other University, and that unless otherwise stated, it is entirely my own work. ..."
Abstract

Cited by 17 (2 self)
 Add to MetaCart
(Show Context)
I, the undersigned, declare that this work has not previously been submitted to this or any other University, and that unless otherwise stated, it is entirely my own work.
Lyapunov Design for Safe Reinforcement Learning
 Journal of Machine Learning Research
"... Lyapunov design methods are used widely in control engineering to design controllers that achieve qualitative objectives, such as stabilizing a system or maintaining a system's state in a desired operating range. We propose a method for constructing safe, reliable reinforcement learning agen ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
(Show Context)
Lyapunov design methods are used widely in control engineering to design controllers that achieve qualitative objectives, such as stabilizing a system or maintaining a system's state in a desired operating range. We propose a method for constructing safe, reliable reinforcement learning agents based on Lyapunov design principles. In our approach, an agent learns to control a system by switching among a number of given, baselevel controllers. These controllers are designed using Lyapunov domain knowledge so that any switching policy is safe and enjoys basic performance guarantees. Our approach thus ensures qualitatively satisfactory agent behavior for virtually any reinforcement learning algorithm and at all times, including while the agent is learning and taking exploratory actions. We demonstrate the process of designing safe agents for four dierent control problems. In simulation experiments, we nd that our theoretically motivated designs also enjoy a number of practical benets, including reasonable performance initially and throughout learning, and accelerated learning. Keywords: Reinforcement Learning, Lyapunov Functions, Safety, Stability 1.
Individual Qlearning in normal form games
 In submission
, 2004
"... Abstract. The singleagent multiarmed bandit problem can be solved by an agent that learns the values of each action using reinforcement learning. However, the multiagent version of the problem, the iterated normal form game, presents a more complex challenge, since the rewards available to each a ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
Abstract. The singleagent multiarmed bandit problem can be solved by an agent that learns the values of each action using reinforcement learning. However, the multiagent version of the problem, the iterated normal form game, presents a more complex challenge, since the rewards available to each agent depend on the strategies of the others. We consider the behavior of valuebased learning agents in this situation, and show that such agents cannot generally play at a Nash equilibrium, although if smooth best responses are used, a Nash distribution can be reached. We introduce a particular valuebased learning algorithm, which we call individual Qlearning, and use stochastic approximation to study the asymptotic behavior, showing that strategies will converge to Nash distribution almost surely in 2player zerosum games and 2player partnership games. Playerdependent learning rates are then considered, and it is shown that this extension converges in some games for which many algorithms, including the basic algorithm initially considered, fail to converge.
Risksensitive reinforcement learning applied to control under constraints
 Journal of Artificial Intelligence Research
, 2005
"... In this paper, we consider Markov Decision Processes (MDPs) with error states. Error states are those states entering which is undesirable or dangerous. We define the risk with respect to a policy as the probability of entering such a state when the policy is pursued. We consider the problem of find ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
(Show Context)
In this paper, we consider Markov Decision Processes (MDPs) with error states. Error states are those states entering which is undesirable or dangerous. We define the risk with respect to a policy as the probability of entering such a state when the policy is pursued. We consider the problem of finding good policies whose risk is smaller than some userspecified threshold, and formalize it as a constrained MDP with two criteria. The first criterion corresponds to the value function originally given. We will show that the risk can be formulated as a second criterion function based on a cumulative return, whose definition is independent of the original value function. We present a model free, heuristic reinforcement learning algorithm that aims at finding good deterministic policies. It is based on weighting the original value function and the risk. The weight parameter is adapted in order to find a feasible solution for the constrained problem that has a good performance with respect to the value function. The algorithm was successfully applied to the control of a feed tank with stochastic inflows that lies upstream of a distillation column. This control task was originally formulated as an optimal control problem with chance constraints, and it was solved under certain assumptions on the model to obtain an optimal solution. The power of our learning algorithm is that it can be used even when some of these restrictive assumptions are relaxed. 1.
Learning to communicate and act using hierarchical reinforcement learning
 In AAMAS2004 — Proceedings of the Third International Joint Conference on Autonomous Agents and Multi Agent Systems
, 2004
"... In this paper, we address the issue of rational communication behavior among autonomous agents. The goal is for agents to learn a policy to optimize the communication needed for proper coordination, given the communication cost. We extend our previously reported cooperative hierarchical reinforcemen ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
(Show Context)
In this paper, we address the issue of rational communication behavior among autonomous agents. The goal is for agents to learn a policy to optimize the communication needed for proper coordination, given the communication cost. We extend our previously reported cooperative hierarchical reinforcement learning (HRL) algorithm to include communication decisions and propose a new multiagent HRL algorithm, called COMCooperative HRL. In this algorithm, we define cooperative subtasks to be those subtasks in which coordination among agents significantly improves the performance of the overall task. Those levels of the hierarchy which include cooperative subtasks are called cooperation levels. Coordination skills among agents are learned faster by sharing information at the cooperation levels, rather than the level of primitive actions. We add a communication level to the hierarchical decomposition of the problem below each cooperation level. Before making a decision at a cooperative subtask, agents decide if it is worthwhile to perform a communication action. A communication action has a certain cost and provides each agent at a certain cooperation level with the actions selected by the other agents at the same level. We demonstrate the efficacy of the COMCooperative HRL algorithm as well as the relation between the communication cost and the learned communication policy using a multiagent taxi domain. 1.
Learning to Communicate and Act in Cooperative Multiagent Systems Using Hierarchical Reinforcement Learning
"... In this paper, we address the issue of rational communication behavior among autonomous agents. We extend our previously reported cooperative hierarchical reinforcement learning (HRL) algorithm to include communication decision and propose a new multiagent HRL algorithm, called COMCooperative HRL. ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
In this paper, we address the issue of rational communication behavior among autonomous agents. We extend our previously reported cooperative hierarchical reinforcement learning (HRL) algorithm to include communication decision and propose a new multiagent HRL algorithm, called COMCooperative HRL. In this algorithm, at specific levels of the hierarchy, called cooperation levels, a group of subtasks, in which coordination among agents has significant effect on the performance of the overall task, are defined as cooperative subtasks. Coordination skills among agents are learned faster by sharing information at cooperation levels, rather than the level of primitive actions. We add a communication level to the hierarchical decomposition of the problem, below each cooperation level. A communication action has a certain cost and is used by each agent to obtain the actions selected by the cooperative subtasks of the other agents. Before making a decision at a cooperative subtask, agents decide if it is worthwhile to perform a communication action in order to acquire the actions chosen by the cooperative subtasks of the other agents. Using this algorithm, agents learn a policy to balance the amount of communication needed for proper coordination, and communication cost. We demonstrate the efficacy of the COMCooperative HRL algorithm as well as the relation between communication cost and the learned communication policy, using a multiagent taxi domain.