## Learning to Take Actions (1998)

Citations: | 53 - 8 self |

### BibTeX

@MISC{Khardon98learningto,

author = {Roni Khardon},

title = {Learning to Take Actions},

year = {1998}

}

### Years of Citing Articles

### OpenURL

### Abstract

We formalize a model for supervised learning of action strategies in dynamic stochastic domains and show that PAC-learning results on Occam algorithms hold in this model as well. We then identify a class of rule-based action strategies for which polynomial time learning is possible. The representation of strategies is a generalization of decision lists; strategies include rules with existentially quantified conditions, simple recursive predicates, and small internal state, but are syntactically restricted. We also study the learnability of hierarchically composed strategies where a subroutine already acquired can be used as a basic action in a higher level strategy. We prove some positive results in this setting, but also show that in some cases the hierarchical learning problem is computationally hard. 1 Introduction We formalize a model for supervised learning of action strategies in dynamic stochastic domains, and study the learnability of strategies represented by rule-based syste...

### Citations

11505 |
D.S.: Computers and intractability – a guide to the theory of NP-completeness. Freemann
- Garey, Johnson
- 1979
(Show Context)
Citation Context ... 6.4 The problem H-PRS is NP-complete. Proof: First observe that the problem is in NP since we can guess a 1-PPRS and check whether it is consistent with E. We reduce the satisfiability problem 3SAT (=-=Garey and Johnson, 1979-=-) to H-PRS. We are given a 3-CNF expression, with m clauses, f = c 1 c 2 : : :c m , on n variables x 1 ; : : : ; x n , and translate it into a set of m+ 2n example runs, where we use 2n+ 3 variables. ... |

2641 | Mining association rules between sets of items in large databases
- Agrawal, Imielinski, et al.
- 1993
(Show Context)
Citation Context ...ntrols complexity by considering only rules that cover a non-negligible part of the examples. This pruning method can be performed using ideas developed recently for data mining of association rules (=-=Agrawal, Imielinski, and Swami, 1993-=-). Other techniques make use of structural information to improve the running time. A significant improvement can be gained in domains where object types are important. In such cases the type informat... |

1754 | A theory of the learnable
- Valiant
- 1984
(Show Context)
Citation Context ...defined relative to its learning interface. This allows for relaxed definitions describing plausible scenarios that admit efficient solutions. Technically, our framework is based on the PAC approach (=-=Valiant, 1984-=-) of learning from examples but applied to problems where the agent has to act in the world and achieve goals, similar to what is done in the study of planning (Allen, Hendler, and Tate, 1990). The fo... |

1616 |
Unified theories of cognition
- Newell
- 1990
(Show Context)
Citation Context ...that can be learned. Finally, formal definitions are given. 4.1 Production Rule Systems Production rule systems (Anderson, 1983; Klahr, Langley, and Neches, 1986; Laird, Rosenbloom, and Newell, 1986; =-=Newell, 1990-=-) are composed of a collection of condition-action rules C ! A, where C is usually a conjunction (over some relevant predicates), and A is used to denote an action. Actions in PRS denote either a real... |

1540 | Intelligence without representation
- BROOKS
- 1991
(Show Context)
Citation Context ...tuitive notion of competence for artificial agents. Our approach is also reminiscent of Schoppers' (1987) universal plans, as well as some other works on "reactive agents" (Georgeff and Lans=-=ky, 1987; Brooks, 1991-=-; Maes, 1991; Nilsson, 1994). A universal plan describes an algorithm for a particular domain so that the action in each situation is in some sense pre-compiled and can be taken instantly. Our action ... |

1405 | Reinforcement learning: A survey
- Kaelbling, Littman, et al.
- 1996
(Show Context)
Citation Context ...structural domains akin to work in planning (Allen, Hendler, and Tate, 1990), and the world model is that of a randomized partially observable state machine similar to work in reinforcement learning (=-=Kaelbling, Littman, and Moore, 1996-=-). Rule based systems have been studied from various viewpoints and with several learning paradigms (Newell and Simon, 1972; Anderson, 1983; Laird, Rosenbloom, and Newell, 1986; Klahr, Langley, and Ne... |

1328 | Learning to predict by the method of temporal differences
- Sutton
- 1988
(Show Context)
Citation Context ...ed to find an optimal strategy for acting in the world. Our formalization makes the task easier in both these aspects. Several interesting theoretical and empirical results have been obtained for RL (=-=Sutton, 1988-=-; Sutton, 1990; Watkins and Dayan, 1992; Kaelbling, 1993; Fiechter, 1994; Tesauro, 1995). In particular, the success of Tesauro's (1992; 1995) backgammon playing program is remarkable. However, the un... |

1204 | Human problem solving - Newell, Simon - 1972 |

1117 | The architecture of cognition
- Anderson
- 1983
(Show Context)
Citation Context ...ilar to work in reinforcement learning (Kaelbling, Littman, and Moore, 1996). Rule based systems have been studied from various viewpoints and with several learning paradigms (Newell and Simon, 1972; =-=Anderson, 1983-=-; Laird, Rosenbloom, and Newell, 1986; Klahr, Langley, and Neches, 1986; Grefenstette, Ramsey, and Schultz, 1990; Baum, 1996) and are closely related to neural systems (see e.g. (Valiant, 1996)). Our ... |

879 | Learning Logical Definitions from Relations
- Quinlan
- 1990
(Show Context)
Citation Context ...(Muggleton, 1994; Muggleton and De Radet, 1994). However, the models differ in details that are crucial. One source of difference is the structure of examples. An example in the standard form of ILP (=-=Quinlan, 1990-=-; Dzeroski, Muggleton, and Russell, 1992; Muggleton, 1994; Mooney and Califf, 1995) includes a single ground instance of a relation and the rest of the information on this example is provided through ... |

687 | Learning quickly when irrelevant attributes abound: A new linear threshold algorithm - Littlestone - 1988 |

616 |
An Introduction to Computational Learning Theory
- Kearns, Vazirani
- 1994
(Show Context)
Citation Context ...ient solutions. We next discuss our model in light of related work emphasizing some differences and thus perhaps clarifying when it might be useful. From a learning theory perspective (Valiant, 1984; =-=Kearns and Vazirani, 1994) our-=- work extends the scope of problems studied into the domain of goal directed agents, acting in structural nonstationary domains. An important aspect of the current paper is the adoption of a "PAC... |

540 | Learning regular sets from queries and counterexamples - Angluin - 1987 |

510 |
The complexity of theorem proving procedures
- Cook
- 1971
(Show Context)
Citation Context ...artial order planning declarative information is given, and search in plan space is performed to find a plan (Weld, 1994). However, the problems involved in these approaches are computationally hard (=-=Cook, 1971-=-; Bylander, 1994). Recently, the approach has been generalized to handle stochastic domains, but as this is a generalization of the planning problem similar computational difficulties arise. Since the... |

498 | Integrated architectures for learning, planning and reacting based on approximating dynamic programming
- Sutton
- 1990
(Show Context)
Citation Context ...optimal strategy for acting in the world. Our formalization makes the task easier in both these aspects. Several interesting theoretical and empirical results have been obtained for RL (Sutton, 1988; =-=Sutton, 1990-=-; Watkins and Dayan, 1992; Kaelbling, 1993; Fiechter, 1994; Tesauro, 1995). In particular, the success of Tesauro's (1992; 1995) backgammon playing program is remarkable. However, the unsupervised lea... |

487 | Inductive logic programming: theory and methods - Muggleton, Raedt - 1994 |

479 |
Reactive reasoning and planning
- Georgeff, Lansky
- 1987
(Show Context)
Citation Context ...s intended to capture an intuitive notion of competence for artificial agents. Our approach is also reminiscent of Schoppers' (1987) universal plans, as well as some other works on "reactive agen=-=ts" (Georgeff and Lansky, 1987-=-; Brooks, 1991; Maes, 1991; Nilsson, 1994). A universal plan describes an algorithm for a particular domain so that the action in each situation is in some sense pre-compiled and can be taken instantl... |

414 |
Temporal difference learning and TD-Gammon
- Tesauro
- 1995
(Show Context)
Citation Context ...sk easier in both these aspects. Several interesting theoretical and empirical results have been obtained for RL (Sutton, 1988; Sutton, 1990; Watkins and Dayan, 1992; Kaelbling, 1993; Fiechter, 1994; =-=Tesauro, 1995-=-). In particular, the success of Tesauro's (1992; 1995) backgammon playing program is remarkable. However, the unsupervised learning problem being solved in RL is very general, and the formulation doe... |

392 | Learning Decision Lists - Rivest - 1987 |

349 | Programs with Common Sense
- McCarthy
- 1968
(Show Context)
Citation Context ...ive, where knowledge about the world is encoded in declarative form. In order to achieve goals, one proves that they are true in some world state, and as a side effect derives a plan for these goals (=-=McCarthy, 1958-=-). 1 For ease of reference, we use the word teacher in a loose sense to refer to the example provider. The model is not intended to capture notions of teaching as such. Similarly, in partial order pla... |

349 | Universal plans for reactive robots in unpredictable environments - Schoppers - 1987 |

316 |
Learning in Embedded Systems
- Kaelbling
- 1993
(Show Context)
Citation Context ...d. Our formalization makes the task easier in both these aspects. Several interesting theoretical and empirical results have been obtained for RL (Sutton, 1988; Sutton, 1990; Watkins and Dayan, 1992; =-=Kaelbling, 1993-=-; Fiechter, 1994; Tesauro, 1995). In particular, the success of Tesauro's (1992; 1995) backgammon playing program is remarkable. However, the unsupervised learning problem being solved in RL is very g... |

314 |
Quantitative results concerning the utility of explanation-based learning
- Minton
- 1990
(Show Context)
Citation Context ...arch. Our work is also closely related to work in Explanation Based Learning (EBL) and speedup learning (Rosenbloom and Laird, 1986; DeJong and Mooney, 1986; Mitchell, Keller, and KedarCabelli, 1986; =-=Minton, 1990; Veloso e-=-t al., 1995). Generally speaking this line of work tries to compile declarative knowledge into a more procedural form via some form of learning. In EBL, solved problems are "explained", name... |

312 | Cryptographic limitations on learning boolean formulae and nite automata
- Kearns, Valiant
- 1989
(Show Context)
Citation Context ...al, then we will make at most twice the number of steps that are needed. 7 For example, it is sufficient that the language can compute n-wise iterated multiplication of numbers modulo some integer N (=-=Kearns and Valiant, 1994; Kearns a-=-nd Vazirani, 1994), to render it not learnable. VanLehn (1987) argues for similar restrictions on cognitive grounds in his "show-work" mode of learning. Repeat: Compute the support predicate... |

307 | The computational complexity of propositional STRIPS planning
- BYLANDER
- 1994
(Show Context)
Citation Context ... planning declarative information is given, and search in plan space is performed to find a plan (Weld, 1994). However, the problems involved in these approaches are computationally hard (Cook, 1971; =-=Bylander, 1994-=-). Recently, the approach has been generalized to handle stochastic domains, but as this is a generalization of the planning problem similar computational difficulties arise. Since the planning proble... |

294 |
Chunking in Soar: The Anatomy of a General Learning Mechanism
- Laird, Rosenbloom, et al.
- 1986
(Show Context)
Citation Context ...reinforcement learning (Kaelbling, Littman, and Moore, 1996). Rule based systems have been studied from various viewpoints and with several learning paradigms (Newell and Simon, 1972; Anderson, 1983; =-=Laird, Rosenbloom, and Newell, 1986-=-; Klahr, Langley, and Neches, 1986; Grefenstette, Ramsey, and Schultz, 1990; Baum, 1996) and are closely related to neural systems (see e.g. (Valiant, 1996)). Our formalization combines these in a way... |

288 | An introduction to least commitment planning
- Weld
- 1994
(Show Context)
Citation Context ...rovider. The model is not intended to capture notions of teaching as such. Similarly, in partial order planning declarative information is given, and search in plan space is performed to find a plan (=-=Weld, 1994-=-). However, the problems involved in these approaches are computationally hard (Cook, 1971; Bylander, 1994). Recently, the approach has been generalized to handle stochastic domains, but as this is a ... |

247 |
Situated agents can have goals
- Maes
- 1990
(Show Context)
Citation Context ... of competence for artificial agents. Our approach is also reminiscent of Schoppers' (1987) universal plans, as well as some other works on "reactive agents" (Georgeff and Lansky, 1987; Broo=-=ks, 1991; Maes, 1991-=-; Nilsson, 1994). A universal plan describes an algorithm for a particular domain so that the action in each situation is in some sense pre-compiled and can be taken instantly. Our action strategies a... |

243 | Kaelbling. Learning policies for partially observable environments: Scaling up
- Littman, Cassandra, et al.
- 1995
(Show Context)
Citation Context ...r changes in the world; that is, in some sense the source of problems is stationary. Similar assumptions have been made in several works in reinforcement learning (e.g. (Sutton, 1990; Fiechter, 1994; =-=Littman, Cassandra, and Kaelbling, 1995-=-)). The quality of a strategy is the probability that a random run is successful. Formally, Definition 2.7 (quality of a strategy) The quality Q(s; D) of a strategy s, with respect to a world W , and ... |

220 |
Classifier systems and genetic algorithms
- Booker, Goldberg, et al.
- 1990
(Show Context)
Citation Context ...le based systems are particularly interesting on this account since several algorithms for using and learning such systems have already been studied. In particular, algorithms for classifier systems (=-=Booker, Goldberg, and Holland, 1989-=-) applied to problem of acting in a dynamic world (Grefenstette, Ramsey, and Schultz, 1990; Baum, 1996) are rule based and can therefore be used in combination with our algorithm in this manner. Recen... |

209 | Teleo-reactive programs for agent control
- Nilsson
- 1994
(Show Context)
Citation Context ...ine is applied again for one time step. In this case hierarchical strategies are learnable under the same restrictions as above. This result is applicable to the hierarchical teleo-reactive programs (=-=Nilsson, 1994-=-). In particular the main part of the strategy is equivalent to such a program and therefore such programs are learnable if the number of quantified variables is bounded. In the second control structu... |

199 | Efficient distribution-free learning of probabilistic concepts
- Kearns, Schapire
- 1994
(Show Context)
Citation Context ...machines may appear hard to learn at the outset. An internal state machine can make the impression that the output of the teacher is random. (cf. probabilistic concepts, and hidden variable problems (=-=Kearns and Schapire, 1994-=-).) However, since the number of states is small, we can get a learning result for this class. Lemma 5.10 The number of k-IS-IP-SD-PPRS action strategies is bounded by M = m 2 \Delta m 3 !, where m 0 ... |

192 | Computational limitations on learning from examples
- Pitt, Valiant
- 1988
(Show Context)
Citation Context ...However, as the following theorem shows, without this information the problem is hard. The following theorem shows that H-PRS is NP-complete; the hardness of learning follows from standard arguments (=-=Pitt and Valiant, 1988-=-; Haussler, 1989). Theorem 6.4 The problem H-PRS is NP-complete. Proof: First observe that the problem is in NP since we can guess a 1-PPRS and check whether it is consistent with E. We reduce the sat... |

157 | Learning by analogical reasoning in general problem solving
- Veloso
- 1992
(Show Context)
Citation Context ...as in Corollary 5.6 was implemented. This algorithm is applied to small planning domains that have been studied before including a four-operator version of the blocks world, and the logistics domain (=-=Veloso, 1992). The exp-=-eriments demonstrate that our results are indeed applicable, that rule based strategies are useful for such domains, and that the algorithm is even robust to some extent to "noise" in the ex... |

148 |
Macro-operators: A weak method for learning
- Korf
- 1985
(Show Context)
Citation Context ...tochastic domains. These include standard results from learning theory, but more to the point the results in (Tadepalli, 1991; Tadepalli and Natarajan, 1996) showing the learnability of macro tables (=-=Korf, 1985-=-) and intersection closed strategies use Occam algorithms and therefore hold in the extended model. 4 Representation of Strategies We use a rule based representation of strategies motivated by work on... |

147 | Learning sequential decision rules using simulation models and competition
- Grefenstette, Ramsey, et al.
(Show Context)
Citation Context ...d systems have been studied from various viewpoints and with several learning paradigms (Newell and Simon, 1972; Anderson, 1983; Laird, Rosenbloom, and Newell, 1986; Klahr, Langley, and Neches, 1986; =-=Grefenstette, Ramsey, and Schultz, 1990-=-; Baum, 1996) and are closely related to neural systems (see e.g. (Valiant, 1996)). Our formalization combines these in a way that allows for efficient solutions. We next discuss our model in light of... |

143 | Situated action: A symbolic interpretation
- Vera, Simon
- 1993
(Show Context)
Citation Context ...", and its action A 5 The merits of explicit symbolic reasoning on the one hand, and reactive operation on the other, have been debated. (For example, see (Brooks, 1991; Hayes, Ford, and Agnew, 1=-=994; Vera and Simon, 1993-=-; Maes, 1991; Ginsberg, 1989; Chapman, 1989; Schoppers, 1989).) However, as recently argued (Vera and Simon, 1993; Hayes, Ford, and Agnew, 1994), neither approach can succeed on its own; ultimately a ... |

117 |
Learning Disjunctions of Conjunctions
- VALIANT
- 1985
(Show Context)
Citation Context ...inlan, 1990), and the representation is similar to the first order decision lists studied by Mooney and Califf (1995). Moreover, our arguments are similar to the ones in (De Radet and Dzeroski, 1994; =-=Valiant, 1985-=-) and can yield positive results on learning first order decision lists in the ILP context. On the other hand, several sophisticated methods for learning have been applied in ILP that may be useful in... |

114 | Learning to fly - Sammut, Hurst, et al. - 1992 |

86 |
Circuits of the Mind
- Valiant
- 1994
(Show Context)
Citation Context ...imilar to EBL have been studied in this framework (Anderson, 1983; Newell, 1990). A similar view on knowledge representation and dynamics of reasoning evolves from Valiant's study of neural circuits (=-=Valiant, 1994-=-). There, a procedural description of one item in terms of others is enforced by the structure of the system. Furthermore, computational considerations suggest the use of working memory, called an ima... |

79 | Universal planning: An (almost) universal bad idea
- Ginsberg
- 1989
(Show Context)
Citation Context ...f explicit symbolic reasoning on the one hand, and reactive operation on the other, have been debated. (For example, see (Brooks, 1991; Hayes, Ford, and Agnew, 1994; Vera and Simon, 1993; Maes, 1991; =-=Ginsberg, 1989-=-; Chapman, 1989; Schoppers, 1989).) However, as recently argued (Vera and Simon, 1993; Hayes, Ford, and Agnew, 1994), neither approach can succeed on its own; ultimately a system must have some reacti... |

78 |
Learning conjunctive concepts in structural domains
- Haussler
- 1989
(Show Context)
Citation Context ...les, and the size of the internal state machine; thus when these parameters are fixed to small constants we may expect efficient learnability. We note that similar restrictions have been used before (=-=Haussler, 1989-=-). One reason for this is that if the number of variables is not restricted then even the binding problem, namely testing whether the condition of a rule holds on some input, is computationally hard. ... |

78 |
The SOAR papers: research on integrated intelligence
- Rosenbloom, Laird, et al.
- 1993
(Show Context)
Citation Context ...) includes a third component of declarative memory and the operation of the system is intuitively similar, though it differs a lot in details. In outline we will follow the Soar system (Newell, 1990; =-=Rosenbloom, Laird, and Newell, 1993-=-), though not in full detail. is executed. That is, either the actuator is operated, or the predicate mentioned as A is added to the working memory. The above cycle is repeated forever or until the go... |

73 | PAClearnability of determinate logic programs
- Dieroski, Muggleton, et al.
- 1992
(Show Context)
Citation Context ...4; Muggleton and De Radet, 1994). However, the models differ in details that are crucial. One source of difference is the structure of examples. An example in the standard form of ILP (Quinlan, 1990; =-=Dzeroski, Muggleton, and Russell, 1992-=-; Muggleton, 1994; Mooney and Califf, 1995) includes a single ground instance of a relation and the rest of the information on this example is provided through the background knowledge. In contrast an... |

73 | Learning action strategies for planning domains
- Khardon
- 1999
(Show Context)
Citation Context ...dentify a subset of rule based strategies that can be learned, and some limits to this learnability when strategies are hierarchically composed. An empirical evaluation of these ideas is reported in (=-=Khardon, 1997-=-) where the learning algorithm is shown to be useful for learning action strategies in small planning domains that have been studied before. We discuss these experiments briefly in the concluding sect... |

70 | Induction of first-order decision lists: Results on learning the past tense of English verbs
- Mooney, Califf
- 1995
(Show Context)
Citation Context ...fer in details that are crucial. One source of difference is the structure of examples. An example in the standard form of ILP (Quinlan, 1990; Dzeroski, Muggleton, and Russell, 1992; Muggleton, 1994; =-=Mooney and Califf, 1995-=-) includes a single ground instance of a relation and the rest of the information on this example is provided through the background knowledge. In contrast an example in our model describes a complete... |

66 | Learning one subprocedure per lesson - VanLehn - 1987 |

66 | First order jk-clausal theories are PAClearnable - Raedt, Džeroski - 1994 |

60 | Q-Learning
- Watins, Dayan
- 1992
(Show Context)
Citation Context ...gy for acting in the world. Our formalization makes the task easier in both these aspects. Several interesting theoretical and empirical results have been obtained for RL (Sutton, 1988; Sutton, 1990; =-=Watkins and Dayan, 1992-=-; Kaelbling, 1993; Fiechter, 1994; Tesauro, 1995). In particular, the success of Tesauro's (1992; 1995) backgammon playing program is remarkable. However, the unsupervised learning problem being solve... |

57 | Learning to reason - Khardon, Roth - 1997 |