Results 1  10
of
83
Approximate Policy Iteration with a Policy Language Bias
 Journal of Artificial Intelligence Research
, 2003
"... We explore approximate policy iteration (API), replacing the usual costfunction learning step with a learning step in policy space. We give policylanguage biases that enable solution of very large relational Markov decision processes (MDPs) that no previous technique can solve. ..."
Abstract

Cited by 135 (17 self)
 Add to MetaCart
(Show Context)
We explore approximate policy iteration (API), replacing the usual costfunction learning step with a learning step in policy space. We give policylanguage biases that enable solution of very large relational Markov decision processes (MDPs) that no previous technique can solve.
Generalizing plans to new environments in relational MDPs
 In International Joint Conference on Artificial Intelligence (IJCAI03
, 2003
"... A longstanding goal in planning research is the ability to generalize plans developed for some set of environments to a new but similar environment, with minimal or no replanning. Such generalization can both reduce planning time and allow us to tackle larger domains than the ones tractable for dire ..."
Abstract

Cited by 110 (2 self)
 Add to MetaCart
(Show Context)
A longstanding goal in planning research is the ability to generalize plans developed for some set of environments to a new but similar environment, with minimal or no replanning. Such generalization can both reduce planning time and allow us to tackle larger domains than the ones tractable for direct planning. In this paper, we present an approach to the generalization problem based on a new framework of relational Markov Decision Processes (RMDPs). An RMDP can model a set of similar environments by representing objects as instances of different classes. In order to generalize plans to multiple environments, we define an approximate value function specified in terms of classes of objects and, in a multiagent setting, by classes of agents. This classbased approximate value function is optimized relative to a sampled subset of environments, and computed using an efficient linear programming method. We prove that a polynomial number of sampled environments suffices to achieve performance close to the performance achievable when optimizing over the entire space. Our experimental results show that our method generalizes plans successfully to new, significantly larger, environments, with minimal loss of performance relative to environmentspecific planning. We demonstrate our approach on a real strategic computer war game. 1
Learning to Take Actions
, 1998
"... We formalize a model for supervised learning of action strategies in dynamic stochastic domains and show that PAClearning results on Occam algorithms hold in this model as well. We then identify a class of rulebased action strategies for which polynomial time learning is possible. The representati ..."
Abstract

Cited by 57 (8 self)
 Add to MetaCart
(Show Context)
We formalize a model for supervised learning of action strategies in dynamic stochastic domains and show that PAClearning results on Occam algorithms hold in this model as well. We then identify a class of rulebased action strategies for which polynomial time learning is possible. The representation of strategies is a generalization of decision lists; strategies include rules with existentially quantified conditions, simple recursive predicates, and small internal state, but are syntactically restricted. We also study the learnability of hierarchically composed strategies where a subroutine already acquired can be used as a basic action in a higher level strategy. We prove some positive results in this setting, but also show that in some cases the hierarchical learning problem is computationally hard. 1 Introduction We formalize a model for supervised learning of action strategies in dynamic stochastic domains, and study the learnability of strategies represented by rulebased syste...
Inductive policy selection for firstorder MDPs
 In UAI
, 2002
"... We select policies for large Markov Decision Processes (MDPs) with compact firstorder representations. We find policies that generalize well as the number of objects in the domain grows, potentially without bound. Existing dynamicprogramming approaches based on flat, propositional, or firstorder ..."
Abstract

Cited by 48 (15 self)
 Add to MetaCart
(Show Context)
We select policies for large Markov Decision Processes (MDPs) with compact firstorder representations. We find policies that generalize well as the number of objects in the domain grows, potentially without bound. Existing dynamicprogramming approaches based on flat, propositional, or firstorder representations either are impractical here or do not naturally scale as the number of objects grows without bound. We implement and evaluate an alternative approach that induces firstorder policies using training data constructed by solving small problem instances using PGraphplan (Blum & Langford, 1999). Our policies are represented as ensembles of decision lists, using a taxonomic concept language. This approach extends the work of Martin and Geffner (2000) to stochastic domains, ensemble learning, and a wider variety of problems. Empirically, we find “good ” policies for several stochastic firstorder MDPs that are beyond the scope of previous approaches. We also discuss the application of this work to the relational reinforcementlearning problem. 1
Exploiting FirstOrder Regression in Inductive Policy Selection
 Proceedings of the Twentieth Conference on Uncertainty in Artificial Intelligence (UAI’04
, 2004
"... We consider the problem of computing optimal generalised policies for relational Markov decision processes. We describe an approach combining some of the benefits of purely inductive techniques with those of symbolic dynamic programming methods. The latter reason about the optimal value function usi ..."
Abstract

Cited by 46 (2 self)
 Add to MetaCart
(Show Context)
We consider the problem of computing optimal generalised policies for relational Markov decision processes. We describe an approach combining some of the benefits of purely inductive techniques with those of symbolic dynamic programming methods. The latter reason about the optimal value function using firstorder decisiontheoretic regression and formula rewriting, while the former, when provided with a suitable hypotheses language, are capable of generalising value functions or policies for small instances. Our idea is to use reasoning and in particular classical firstorder regression to automatically generate a hypotheses language dedicated to the domain at hand, which is then used as input by an inductive solver. This approach avoids the more complex reasoning of symbolic dynamic programming while focusing the inductive solver’s attention on concepts that are specifically relevant to the optimal value function for the domain considered. 1
Learning Declarative Control Rules for ConstraintBased Planning
 IN ICML
"... Despite the long history of research in using machine learning to speedup statespace planning, the techniques that have been developed are not yet in widespread use in practical planning systems. One limiting factor is that traditional domainindependent planning systems scale so poorly that ..."
Abstract

Cited by 42 (2 self)
 Add to MetaCart
Despite the long history of research in using machine learning to speedup statespace planning, the techniques that have been developed are not yet in widespread use in practical planning systems. One limiting factor is that traditional domainindependent planning systems scale so poorly that extensive learned control knowledge is required to raise their performance to an acceptable level. Therefore, work in this area has focused on learning large numbers of control rules that are specific to the details of the underlying planning algorithms, which can be extremely costly. In recent years, a new generation of planning systems with much improved speed and scalability has become available. These systems formulate planning as solving a large constraint satisfaction problem. This formulation opens up the possibility that domainspecific control knowledge can be added to the planner in a purely declarative manner via a set of additional constraints. In this paper we present the first positive results on automatically acquiring such highlevel, declarative constraints using machine learning techniques. In particular, we will show that a new heuristic method for generating training examples together with a rule induction algorithm can learn useful control rules in a variety of domains. Only a small number of rules are needed to reduce solution times by two orders of magnitude or more on larger problems, training times are short, and the learned rules can be exported to other planning systems.
Learning domainspecific control knowledge from random walks
 In Proceedings of the fourteenth international
, 2004
"... We describe and evaluate a system for learning domainspecific control knowledge. In particular, given a planning domain, the goal is to output a control policy that performs well on “long random walk ” problem distributions. The system is based on viewing planning domains as very large Markov decisi ..."
Abstract

Cited by 38 (4 self)
 Add to MetaCart
(Show Context)
We describe and evaluate a system for learning domainspecific control knowledge. In particular, given a planning domain, the goal is to output a control policy that performs well on “long random walk ” problem distributions. The system is based on viewing planning domains as very large Markov decision processes and then applying a recent variant of approximate policy iteration that is bootstrapped with a new technique based on random walks. We evaluate the system on the AIPS2000 planning domains (among others) and show that often the learned policies perform well on problems drawn from the long–randomwalk distribution. In addition, we show that these policies often perform well on the original problem distributions from the domains involved. Our evaluation also uncovers limitations of our current system that point to future challenges.
K.: Relational reinforcement learning: An overview
 In: Proceedings of the ICML’04 Workshop on Relational Reinforcement Learning
, 2004
"... Relational Reinforcement Learning (RRL) is both a young and an old eld. In this paper, we trace the history of the eld to related disciplines, outline some current work and promising new directions, and survey the research issues and opportunities that lie ahead. 1. ..."
Abstract

Cited by 37 (3 self)
 Add to MetaCart
(Show Context)
Relational Reinforcement Learning (RRL) is both a young and an old eld. In this paper, we trace the history of the eld to related disciplines, outline some current work and promising new directions, and survey the research issues and opportunities that lie ahead. 1.
Learning Recursive Control Programs from Problem Solving
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... In this paper, we propose a new representation for physical control  teleoreactive logic programs  along with an interpreter that uses them to achieve goals. In addition, we present a new learning method that acquires recursive forms of these structures from traces of successful problem solvin ..."
Abstract

Cited by 36 (12 self)
 Add to MetaCart
In this paper, we propose a new representation for physical control  teleoreactive logic programs  along with an interpreter that uses them to achieve goals. In addition, we present a new learning method that acquires recursive forms of these structures from traces of successful problem solving. We report
Learning Generalized Policies in Planning Using Concept Languages
, 2000
"... In this paper we are concerned with the problem of learning how to solve planning problems in one domain given a number of solved instances. This problem is formulated as the problem of inferring a function that operates over all instances in the domain and maps states and goals into actions. ..."
Abstract

Cited by 34 (1 self)
 Add to MetaCart
In this paper we are concerned with the problem of learning how to solve planning problems in one domain given a number of solved instances. This problem is formulated as the problem of inferring a function that operates over all instances in the domain and maps states and goals into actions. We call such functions generalized policies and the question that we address is how to learn suitable representations of generalized policies from data. This question has been addressed recently by Roni Khardon [16]. Khardon represents generalized policies using an ordered list of existentially quantified rules that are inferred from a training set using a version of Rivest's learning algorithm [22]. Here, we follow Khardon's approach but represent generalized policies in a different way using a concept language. We show through a number of experiments in the blocksworld that the concept language yields a better policy using a smaller set of examples and no background knowle...