Results 1  10
of
20
Autonomously learning an action hierarchy using a learned qualitative state representation
 In Proceedings of the 21st International Joint Conference on Artificial Intelligence
, 2009
"... There has been intense interest in hierarchical reinforcement learning as a way to make Markov decision process planning more tractable, but there has been relatively little work on autonomously learning the hierarchy, especially in continuous domains. In this paper we present a method for learning ..."
Abstract

Cited by 15 (5 self)
 Add to MetaCart
There has been intense interest in hierarchical reinforcement learning as a way to make Markov decision process planning more tractable, but there has been relatively little work on autonomously learning the hierarchy, especially in continuous domains. In this paper we present a method for learning a hierarchy of actions in a continuous environment. Our approach is to learn a qualitative representation of the continuous environment and then to define actions to reach qualitative states. Our method learns one or more options to perform each action. Each option is learned by first learning a dynamic Bayesian network (DBN). We approach this problem from a developmental robotics perspective. The agent receives no extrinsic reward and has no external direction for what to learn. We evaluate our work using a simulation with realistic physics that consists of a robot playing with blocks at a table. 1
Active Learning of Dynamic Bayesian Networks in Markov Decision Processes
"... Abstract. Several recent techniques for solving Markov decision processes use dynamic Bayesian networks to compactly represent tasks. The dynamic Bayesian network representation may not be given, in which case it is necessary to learn it if one wants to apply these techniques. We develop an algorith ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
Abstract. Several recent techniques for solving Markov decision processes use dynamic Bayesian networks to compactly represent tasks. The dynamic Bayesian network representation may not be given, in which case it is necessary to learn it if one wants to apply these techniques. We develop an algorithm for learning dynamic Bayesian network representations of Markov decision processes using data collected through exploration in the environment. To accelerate data collection we develop a novel scheme for active learning of the networks. We assume that it is not possible to sample the process in arbitrary states, only along trajectories, which prevents us from applying existing active learning techniques. Our active learning scheme selects actions that maximize the total entropy of distributions used to evaluate potential refinements of the networks. 1
Discovering Options from Example Trajectories
"... We present a novel technique for automated problem decomposition to address the problem of scalability in reinforcement learning. Our technique makes use of a set of nearoptimal trajectories to discover options and incorporates them into the learning process, dramatically reducing the time it takes ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
We present a novel technique for automated problem decomposition to address the problem of scalability in reinforcement learning. Our technique makes use of a set of nearoptimal trajectories to discover options and incorporates them into the learning process, dramatically reducing the time it takes to solve the underlying problem. We run a series of experiments in two different domains and show that our method offers up to 30 fold speedup over the baseline. 1.
Hierarchical Solution of Large Markov Decision Processes
"... This paper presents an algorithm for finding approximately optimal policies in very large Markov decision processes by constructing a hierarchical model and then solving it. This strategy sacrifices optimality for the ability to address a large class of very large problems. Our algorithm works effic ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
This paper presents an algorithm for finding approximately optimal policies in very large Markov decision processes by constructing a hierarchical model and then solving it. This strategy sacrifices optimality for the ability to address a large class of very large problems. Our algorithm works efficiently on enumeratedstates and factored MDPs by constructing a hierarchical structure that is no larger than both the reduced model of the MDP and the regression tree for the goal in that MDP, and then using that structure to solve for a policy. 1
Autonomous Qualitative Learning of Distinctions and Actions in a Developing Agent
"... How can an agent bootstrap up from a pixellevel representation to autonomously learn highlevel states and actions using only domain general knowledge? This thesis attacks a piece of this problem and assumes that an agent has a set of continuous variables describing the environment and a set of con ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
How can an agent bootstrap up from a pixellevel representation to autonomously learn highlevel states and actions using only domain general knowledge? This thesis attacks a piece of this problem and assumes that an agent has a set of continuous variables describing the environment and a set of continuous motor primitives, and poses a solution for the problem of how an agent can learn a set of useful states and effective higherlevel actions through autonomous experience with the environment. There exist methods for learning models of the environment, and there also exist methods for planning. However, for autonomous learning, these methods have been used almost exclusively in discrete environments. This thesis proposes attacking the problem of learning highlevel states and actions in continuous environments by using a qualitative representation to bridge the gap between continuous and discrete variable representations. In this approach, the agent begins with a broad discretization and initially can only tell if the value of each variable is increasing, decreasing, or remaining steady. The agent then simultaneously learns a qualitative representation (discretization) and a set of predictive models of the environment. The agent then converts these models into plans to form actions. The agent then uses those learned actions to explore the environment. The method is evaluated using a simulated robot with realistic physics. The robot is sitting at a table that contains one or two blocks, as well as other distractor objects that are out of reach. The agent autonomously explores the environment without being given a task. After learning, the agent is given various tasks to determine if it learned the necessary states and actions to complete them. The results show that the agent was able to use this method to autonomously learn to perform the tasks.
Automatic Induction of MAXQ Hierarchies
"... Scaling up reinforcement learning to large domains requires leveraging the structure in the domain. Hierarchical reinforcement learning has been one of the ways in which the domain structure is exploited to constrain the value function space of the learner, and speed up learning[10, 3, 1]. In the MA ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Scaling up reinforcement learning to large domains requires leveraging the structure in the domain. Hierarchical reinforcement learning has been one of the ways in which the domain structure is exploited to constrain the value function space of the learner, and speed up learning[10, 3, 1]. In the MAXQ framework, for example, a task hierarchy is defined, and a set of relevant features to represent the completion function for each tasksubtask pair are given [3], resulting in decomposed subtaskspecific value functions that are easier to learn than the global value function. The MAXQ decomposition facilitates learning separate value functions for subtasks. The task hierarchy is represented as a directed acyclic graph. The leaf nodes are the primitive subtasks. Each composite subtask defines a semiMarkov Decision Process (SMDP) with a set of actions (which may include primitive actions or other subtasks), a set of state variables, a termination predicate which defines a set of exit states for the subtask, and a pseudoreward function defined over the exits. Several researchers have focused on the problem of automatically inducing temporally extended actions and tasksubtask hierarchies [4, 7, 8, 9, 2, 11, 6, 5]. Discovering tasksubtask
Incremental Structure Learning in Factored MDPs with Continuous States and Actions
"... Learning factored transition models of structured environments has been shown to provide significant leverage when computing optimal policies for tasks within those environments. Previous work has focused on learning the structure of factored Markov Decision Processes (MDPs) with finite sets of stat ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Learning factored transition models of structured environments has been shown to provide significant leverage when computing optimal policies for tasks within those environments. Previous work has focused on learning the structure of factored Markov Decision Processes (MDPs) with finite sets of states and actions. In this work we present an algorithm for online incremental learning of transition models of factored MDPs that have continuous, multidimensional state and action spaces. We use incremental density estimation techniques and informationtheoretic principles to learn a factored model of the transition dynamics of an FMDP online from a single, continuing trajectory of experience. 1
What do You Want to do Today? RelevantInformation Bookkeeping in GoalOriented Behaviour
 in Artificial Life XII: The 12th International Conference on the Synthesis and Simulation of Living Systems
, 2010
"... We extend existing models and methods for the informational treatment of the perceptionaction loop to the case of goaloriented behaviour and introduce the notion of relevant goal information as the amount of information an agent necessarily has to maintain about its goal. Starting from the hypothes ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
We extend existing models and methods for the informational treatment of the perceptionaction loop to the case of goaloriented behaviour and introduce the notion of relevant goal information as the amount of information an agent necessarily has to maintain about its goal. Starting from the hypothesis that organisms use information economically, we study the structure of this information and how goalinformation parsimony can guide behaviour. It is shown how these methods lead to a general definition and quantification of subgoals and how the biologically motivated hypothesis of information parsimony gives rise to the emergence of properties such as leastcommitment and goalconcealing.
Fast Approximate Hierarchical Solution of MDPs
"... In this thesis, we present an efficient algorithm for creating and solving hierarchical models of large Markov decision processes (MDPs). As the size of the MDP increases, finding an exact solution becomes intractable, so we expect only to find an approximate solution. We also assume that the hierar ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
In this thesis, we present an efficient algorithm for creating and solving hierarchical models of large Markov decision processes (MDPs). As the size of the MDP increases, finding an exact solution becomes intractable, so we expect only to find an approximate solution. We also assume that the hierarchies we create are not necessarily applicable to more than one problem so that we must be able to construct and solve the hierarchical model in less time than it would have taken to simply solve the original, flat model. Our approach works in two stages. We first create the hierarchical MDP by forming clusters of states that can transition easily among themselves. We then solve the hierarchical MDP. We use a quick bottomup pass based on a deterministic approximation of expected costs to move from one state to another to derive a policy from the top down, which avoids solving lowlevel MDPs for multiple objectives. The