Results 1 - 10
of
14
Batch reinforcement learning in a complex domain
- In The Sixth International Joint Conference on Autonomous Agents and Multiagent Systems
, 2007
"... Temporal difference reinforcement learning algorithms are perfectly suited to autonomous agents because they learn directly from an agent’s experience based on sequential actions in the environment. However, their most common algorithmic variants are relatively inefficient in their use of experience ..."
Abstract
-
Cited by 10 (5 self)
- Add to MetaCart
Temporal difference reinforcement learning algorithms are perfectly suited to autonomous agents because they learn directly from an agent’s experience based on sequential actions in the environment. However, their most common algorithmic variants are relatively inefficient in their use of experience data, which in many agent-based settings can be scarce. In particular, they make just one learning “update” for each atomic experience. Batch reinforcement learning algorithms, on the other hand, aim to achieve greater data efficiency by saving experience data and using it in aggregate to make updates to the learned policy. Their success has been demonstrated in the past on simple domains like grid worlds and low-dimensional control applications like pole balancing. In this paper, we compare and contrast batch reinforcement learning algorithms with on-line algorithms based on their empirical performance in a complex, continuous, noisy, multiagent domain, namely RoboCup soccer Keepaway. We find that the two batch methods we consider, Experience Replay and Fitted Q Iteration, both yield significant gains in sample complexity, while achieving high asymptotic performance.
Feature selection and policy optimization for distributed instruction placement using reinforcement learning
- in The 17th International Conference on Parallel Architectures and Compilation Techniques
, 2008
"... Communication overheads are one of the fundamental challenges in a multiprocessor system. As the number of processors on a chip increases, communication overheads and the distribution of computation and data become increasingly important performance factors. Explicit Dataflow Graph Execution (EDGE) ..."
Abstract
-
Cited by 7 (5 self)
- Add to MetaCart
Communication overheads are one of the fundamental challenges in a multiprocessor system. As the number of processors on a chip increases, communication overheads and the distribution of computation and data become increasingly important performance factors. Explicit Dataflow Graph Execution (EDGE) processors, in which instructions communicate with one another directly on a distributed substrate, give the compiler control over communication overheads at a fine granularity. Prior work shows that compilers can effectively reduce fine-grained communication overheads in EDGE architectures using a spatial instruction placement algorithm with a heuristic-based cost function. While this algorithm is effective, the cost function must be painstakingly tuned. Heuristics tuned to perform well across a variety of applications leave users with little ability to tune
Transfer learning for policy search methods
- In ICML workshop on Structural Knowledge Transfer for Machine Learning
, 2006
"... An ambitious goal of transfer learning is to learn a task faster after training on a different, but related, task. In this paper we extend a previously successful temporal difference (Sutton & Barto, 1998) approach to transfer in reinforcement learning (Sutton & Barto, 1998) tasks to work with polic ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
An ambitious goal of transfer learning is to learn a task faster after training on a different, but related, task. In this paper we extend a previously successful temporal difference (Sutton & Barto, 1998) approach to transfer in reinforcement learning (Sutton & Barto, 1998) tasks to work with policy search. In particular, we show how to construct a mapping to translate a population of policies trained via genetic algorithms (GAs) (Goldberg, 1989) from a source task to a target task. Empirical results in robot soccer Keepaway, a standard RL benchmark domain (Stone et al., 2006), demonstrate that transfer via inter-task mapping can markedly reduce the time required to learn a second, more complex, task.
An empirical analysis of value function-based and policy search reinforcement learning
- AAMAS ’09: Proceedings of the 8th international
, 2009
"... In several agent-oriented scenarios in the real world, an autonomous agent that is situated in an unknown environment must learn through a process of trial and error to take actions that result in long-term benefit. Reinforcement Learning (or sequential decision making) is a paradigm well-suited to ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
In several agent-oriented scenarios in the real world, an autonomous agent that is situated in an unknown environment must learn through a process of trial and error to take actions that result in long-term benefit. Reinforcement Learning (or sequential decision making) is a paradigm well-suited to this requirement. Value function-based methods and policy search methods are contrasting approaches to solve reinforcement learning tasks. While both classes of methods benefit from independent theoretical analyses, these often fail to extend to the practical situations in which the methods are deployed. We conduct an empirical study to examine the strengths and weaknesses of these approaches by introducing a suite of test domains that can be varied for problem size, stochasticity, function approximation, and partial observability. Our results indicate clear patterns in the domain characteristics for which each class of methods excels. We investigate whether their strengths can be combined, and develop an approach to achieve that purpose. The effectiveness of this approach is also demonstrated on the challenging benchmark task of robot soccer Keepaway. We highlight several lines of inquiry that emanate from this study.
Critical factors in the empirical performance of temporal difference and evolutionary methods for reinforcement learning
- AUTON AGENT MULTI-AGENT SYST
, 2009
"... ..."
Evolving neural networks for fractured domains
- In Proceedings of the Genetic and Evolutionary Computation Conference
, 2008
"... Evolution of neural networks, or neuroevolution, bas been successful on many low-level control problems such as pole balancing, vehicle control, and collision warning. However, high-level strategy problems that require the integration of multiple sub-behaviors have remained difficult for neuroevolut ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Evolution of neural networks, or neuroevolution, bas been successful on many low-level control problems such as pole balancing, vehicle control, and collision warning. However, high-level strategy problems that require the integration of multiple sub-behaviors have remained difficult for neuroevolution to solve. This paper proposes the hypothesis that such problems are difficult because they are fractured: the correct action varies discontinuously as the agent moves from state to state. This hypothesis is evaluated on several examples of fractured high-level reinforcement learning domains. Standard neuroevolution methods such as NEAT indeed have difficulty solving them. However, a modification of NEAT that uses radial basis function (RBF) nodes to make precise local mutations to network output is able to do much better. These results provide a better understanding of the different types of reinforcement learning problems and the limitations of current neuroevolution methods. Thus, they lay the groundwork for creating the next generation of neuroevolution algorithms that can learn strategic high-level behavior in fractured domains.
Common genetic encoding for both direct and indirect encodings of networks
- In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 2007
, 2007
"... In this paper we present a Common Genetic Encoding (CGE) for networks that can be applied to both direct and indirect encoding methods. As a direct encoding method, CGE allows the implicit evaluation of an encoded phenotype without the need to decode the phenotype from the genotype. On the other han ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
In this paper we present a Common Genetic Encoding (CGE) for networks that can be applied to both direct and indirect encoding methods. As a direct encoding method, CGE allows the implicit evaluation of an encoded phenotype without the need to decode the phenotype from the genotype. On the other hand, one can easily decode the structure of a phenotype network, since its topology is implicitly encoded in the genotype’s gene-order. Furthermore, we illustrate how CGE can be used for the indirect encoding of networks. CGE has useful properties that makes it suitable for evolving neural networks. A formal definition of the encoding is given, and some of the important properties of the encoding are proven such as its closure under mutation operators, its completeness in representing any phenotype network, and the existence of an algorithm that can evaluate any given phenotype without running into an infinite loop.
Evolving neural networks for strategic decision-making problems
- Neural Networks, Special Issue on Goal-Directed Neural Systems
, 2009
"... Evolution of neural networks, or neuroevolution, has been a successful approach to many low-level control problems such as pole balancing, vehicle control, and collision warning. However, certain types of problems – such as those involving strategic decision-making – have remained difficult for neur ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Evolution of neural networks, or neuroevolution, has been a successful approach to many low-level control problems such as pole balancing, vehicle control, and collision warning. However, certain types of problems – such as those involving strategic decision-making – have remained difficult for neuroevolution to solve. This paper evaluates the hypothesis that such problems are difficult because they are fractured: The correct action varies discontinuously as the agent moves from state to state. A method for measuring fracture using the concept of function variation is proposed, and based on this concept, two methods for dealing with fracture are examined: neurons with local receptive fields, and refinement based on a cascaded network architecture. Experiments in several benchmark domains are performed to evaluate how different levels of fracture affect the performance of neuroevolution methods, demonstrating that these two modifications improve performance significantly. These results form a promising starting point for expanding neuroevolution to strategic tasks. 1.
Temporal Difference and Policy Search Methods for Reinforcement Learning: An Empirical Comparison
"... Reinforcement learning (RL) methods have become popular in recent years because of their ability to solve complex tasks with minimal feedback. Both genetic algorithms (GAs) and temporal difference (TD) methods have proven effective at solving difficult RL problems, but few rigorous comparisons have ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Reinforcement learning (RL) methods have become popular in recent years because of their ability to solve complex tasks with minimal feedback. Both genetic algorithms (GAs) and temporal difference (TD) methods have proven effective at solving difficult RL problems, but few rigorous comparisons have been conducted. Thus, no general guidelines describing the methods ’ relative strengths and weaknesses are available. This paper summarizes a detailed empirical comparison between a GA and a TD method in Keepaway, a standard RL benchmark domain based on robot soccer. The results from this study help isolate the factors critical to the performance of each learning method and yield insights into their general strengths and weaknesses.

