Results 1  10
of
130
Intrinsically motivated learning of hierarchical collections of skills
, 2004
"... Humans and other animals often engage in activities for their own sakes rather than as steps toward solving practical problems. Psychologists call these intrinsically motivated behaviors. What we learn during intrinsically motivated behavior is essential for our development as competent autonomous e ..."
Abstract

Cited by 115 (17 self)
 Add to MetaCart
Humans and other animals often engage in activities for their own sakes rather than as steps toward solving practical problems. Psychologists call these intrinsically motivated behaviors. What we learn during intrinsically motivated behavior is essential for our development as competent autonomous entities able to efficiently solve a wide range of practical problems as they arise. In this paper we present initial results from a computational study of intrinsically motivated learning aimed at allowing artificial agents to construct and extend hierarchies of reusable skills that are needed for competent autonomy. At the core of the model are recent theoretical and algorithmic advances in computational reinforcement learning, specifically, new concepts related to skills and new learning algorithms for learning with skill hierarchies. 1
Protovalue functions: A laplacian framework for learning representation and control in markov decision processes
 Journal of Machine Learning Research
, 2006
"... This paper introduces a novel spectral framework for solving Markov decision processes (MDPs) by jointly learning representations and optimal policies. The major components of the framework described in this paper include: (i) A general scheme for constructing representations or basis functions by d ..."
Abstract

Cited by 67 (9 self)
 Add to MetaCart
This paper introduces a novel spectral framework for solving Markov decision processes (MDPs) by jointly learning representations and optimal policies. The major components of the framework described in this paper include: (i) A general scheme for constructing representations or basis functions by diagonalizing symmetric diffusion operators (ii) A specific instantiation of this approach where global basis functions called protovalue functions (PVFs) are formed using the eigenvectors of the graph Laplacian on an undirected graph formed from state transitions induced by the MDP (iii) A threephased procedure called representation policy iteration comprising of a sample collection phase, a representation learning phase that constructs basis functions from samples, and a final parameter estimation phase that determines an (approximately) optimal policy within the (linear) subspace spanned by the (current) basis functions. (iv) A specific instantiation of the RPI framework using leastsquares policy iteration (LSPI) as the parameter estimation method (v) Several strategies for scaling the proposed approach to large discrete and continuous state spaces, including the Nyström extension for outofsample interpolation of eigenfunctions, and the use of Kronecker sum factorization to construct compact eigenfunctions in product spaces such as factored MDPs (vi) Finally, a series of illustrative discrete and continuous control tasks, which both illustrate the concepts and provide a benchmark for evaluating the proposed approach. Many challenges remain to be addressed in scaling the proposed framework to large MDPs, and several elaboration of the proposed framework are briefly summarized at the end.
Learning from Observation Using Primitives
, 2004
"... This paper describes the use of task primitives in robot learning from observation. A framework has been developed that uses observed data to initially learn a task and then the agent goes on to increase its performance through repeated task performance (learning from practice). Data that is collect ..."
Abstract

Cited by 59 (5 self)
 Add to MetaCart
This paper describes the use of task primitives in robot learning from observation. A framework has been developed that uses observed data to initially learn a task and then the agent goes on to increase its performance through repeated task performance (learning from practice). Data that is collected while a human performs a task is parsed into small parts of the task called primitives. Modules are created for each primitive that encode the movements required during the performance of the primitive, and when and where the primitives are performed. The feasibility of this method is currently being tested with agents that learn to play a virtual and an actual air hockey game. 1
Rulebased Evolutionary Online Learning Systems: LEARNING BOUNDS, CLASSIFICATION, AND PREDICTION
, 2004
"... Rulebased evolutionary online learning systems, often referred to as Michiganstyle learning classifier systems (LCSs), were proposed nearly thirty years ago (Holland, 1976; Holland, 1977) originally calling them cognitive systems. LCSs combine the strength of reinforcement learning with the genera ..."
Abstract

Cited by 43 (9 self)
 Add to MetaCart
Rulebased evolutionary online learning systems, often referred to as Michiganstyle learning classifier systems (LCSs), were proposed nearly thirty years ago (Holland, 1976; Holland, 1977) originally calling them cognitive systems. LCSs combine the strength of reinforcement learning with the generalization capabilities of genetic algorithms promising a flexible, online generalizing, solely reinforcement dependent learning system. However, despite several initial successful applications of LCSs and their interesting relations with animal learning and cognition, understanding of the systems remained somewhat obscured. Questions concerning learning complexity or convergence remained unanswered. Performance in different problem types, problem structures, concept spaces, and hypothesis spaces stayed nearly unpredictable. This thesis has the following three major objectives: (1) to establish a facetwise theory approach for LCSs that promotes system analysis, understanding, and design; (2) to analyze, evaluate, and enhance the XCS classifier system (Wilson, 1995) by the means of the facetwise approach establishing a fundamental XCS learning theory; (3) to identify both the major advantages of an LCSbased learning approach as well as the most promising potential application areas. Achieving these three objectives leads to a rigorous understanding
Building portable options: Skill transfer in reinforcement learning
 Proceedings of the 20th International Joint Conference on Artificial Intelligence
, 2007
"... The options framework provides methods for reinforcement learning agents to build new highlevel skills. However, since options are usually learned in the same state space as the problem the agent is solving, they cannot be used in other tasks that are similar but have different state spaces. We int ..."
Abstract

Cited by 41 (9 self)
 Add to MetaCart
The options framework provides methods for reinforcement learning agents to build new highlevel skills. However, since options are usually learned in the same state space as the problem the agent is solving, they cannot be used in other tasks that are similar but have different state spaces. We introduce the notion of learning options in agentspace, the space generated by a feature set that is present and retains the same semantics across successive problem instances, rather than in problemspace. Agentspace options can be reused in later tasks that share the same agentspace but have different problemspaces. We present experimental results demonstrating the use of agentspace options in building transferrable skills, and show that they perform best when used in conjunction with problemspace options. 1
Dynamic Abstraction in Reinforcement Learning via Clustering
 In Proceedings of the TwentyFirst International Conference on Machine Learning
, 2004
"... We consider a graph theoretic approach for automatic construction of options in a dynamic environment. A map of the environment is generated online by the learning agent, representing the topological structure of the state transitions. A clustering algorithm is then used to partition the stat ..."
Abstract

Cited by 40 (0 self)
 Add to MetaCart
We consider a graph theoretic approach for automatic construction of options in a dynamic environment. A map of the environment is generated online by the learning agent, representing the topological structure of the state transitions. A clustering algorithm is then used to partition the state space to di#erent regions. Policies for reaching the di#erent parts of the space are separately learned and added to the model in a form of options (macroactions). The options are used for accelerating the QLearning algorithm. We extend the basic algorithm and consider building a map that includes preliminary indication of the location of "interesting " regions of the state space, where the value gradient is significant and additional exploration might be beneficial. Experiments indicate significant speedups, especially in the initial learning phase.
Internal Models and Anticipations in Adaptive Learning Systems
 In Proceedings of the Workshop on Adaptive Behavior in Anticipatory Learning Systems
"... The explicit investigation of anticipations in relation to adaptive behavior is a recent approach. This chapter first provides psychological background that motivates and inspires the study of anticipations in the adaptive behavior field. Next, a basic framework for the study of anticipations in ada ..."
Abstract

Cited by 34 (7 self)
 Add to MetaCart
The explicit investigation of anticipations in relation to adaptive behavior is a recent approach. This chapter first provides psychological background that motivates and inspires the study of anticipations in the adaptive behavior field. Next, a basic framework for the study of anticipations in adaptive behavior is suggested. Different anticipatory mechanisms are identified and characterized. First fundamental distinctions are drawn between implicit anticipatory behavior, payoff anticipatory behavior, sensory anticipatory behavior, and state anticipatory behavior. A case study allows further insights into the drawn distinctions.
Hierarchical Apprenticeship Learning, with Application to Quadruped Locomotion
"... We consider apprenticeship learning—learning from expert demonstrations—in the setting of large, complex domains. Past work in apprenticeship learning requires that the expert demonstrate complete trajectories through the domain. However, in many problems even an expert has difficulty controlling th ..."
Abstract

Cited by 34 (3 self)
 Add to MetaCart
We consider apprenticeship learning—learning from expert demonstrations—in the setting of large, complex domains. Past work in apprenticeship learning requires that the expert demonstrate complete trajectories through the domain. However, in many problems even an expert has difficulty controlling the system, which makes this approach infeasible. For example, consider the task of teaching a quadruped robot to navigate over extreme terrain; demonstrating an optimal policy (i.e., an optimal set of foot locations over the entire terrain) is a highly nontrivial task, even for an expert. In this paper we propose a method for hierarchical apprenticeship learning, which allows the algorithm to accept isolated advice at different hierarchical levels of the control task. This type of advice is often feasible for experts to give, even if the expert is unable to demonstrate complete trajectories. This allows us to extend the apprenticeship learning paradigm to much larger, more challenging domains. In particular, in this paper we apply the hierarchical apprenticeship learning algorithm to the task of quadruped locomotion over extreme terrain, and achieve, to the best of our knowledge, results superior to any previously published work. 1
Hierarchical Reinforcement Learning Based on Subgoal Discovery and Subpolicy Specialization
 Proceedings of the 8th Conference on Intelligent Autonomous Systems, IAS8
, 2004
"... We introduce a new method for hierarchical reinforcement learning. Highlevel policies automatically discover subgoals; lowlevel policies learn to specialize on different subgoals. Subgoals are represented as desired abstract observations which cluster raw input data. Highlevel value functions c ..."
Abstract

Cited by 32 (4 self)
 Add to MetaCart
We introduce a new method for hierarchical reinforcement learning. Highlevel policies automatically discover subgoals; lowlevel policies learn to specialize on different subgoals. Subgoals are represented as desired abstract observations which cluster raw input data. Highlevel value functions cover the state space at a coarse level; lowlevel value functions cover only parts of the state space at a finegrained level. Experiments show that this method outperforms several flat reinforcement learning methods in a deterministic task and in a stochastic task.
A tutorial on Bayesian optimization of expensive cost functions, withapplicationtoactiveusermodeling andhierarchical reinforcement learning
, 2009
"... We present a tutorial on Bayesian optimization, a method of finding the maximum of expensive cost functions. Bayesian optimization employs the Bayesian technique of setting a prior over the objective function and combining it with evidence to get a posterior function. This permits a utilitybased se ..."
Abstract

Cited by 29 (2 self)
 Add to MetaCart
We present a tutorial on Bayesian optimization, a method of finding the maximum of expensive cost functions. Bayesian optimization employs the Bayesian technique of setting a prior over the objective function and combining it with evidence to get a posterior function. This permits a utilitybased selection of the next observation to make on the objective function, which must take into account both exploration (sampling from areas of high uncertainty) and exploitation (sampling areas likely to offer improvement over the current best observation). We also present two detailed extensions of Bayesian optimization, with experiments—active user modelling with preferences, and hierarchical reinforcement learning— and a discussion of the pros and cons of Bayesian optimization based on our experiences. 1