## Finding Structure in Reinforcement Learning (1995)

### Cached

### Download Links

- [ktm.ius.cs.cmu.edu]
- [www-2.cs.cmu.edu]
- [www.ri.cmu.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | Advances in Neural Information Processing Systems 7 |

Citations: | 104 - 4 self |

### BibTeX

@INPROCEEDINGS{Thrun95findingstructure,

author = {Sebastian Thrun and Anton Schwartz},

title = {Finding Structure in Reinforcement Learning},

booktitle = {Advances in Neural Information Processing Systems 7},

year = {1995},

pages = {385--392},

publisher = {MIT Press}

}

### Years of Citing Articles

### OpenURL

### Abstract

Reinforcement learning addresses the problem of learning to select actions in order to maximize one's performance in unknown environments. To scale reinforcement learning to complex real-world tasks, such as typically studied in AI, one must ultimately be able to discover the structure in the world, in order to abstract away the myriad of details and to operate in more tractable problem spaces. This paper presents the SKILLS algorithm. SKILLS discovers skills, which are partially defined action policies that arise in the context of multiple, related tasks. Skills collapse whole action sequences into single operators. They are learned by minimizing the compactness of action policies, using a description length argument on their representation. Empirical results in simple grid navigation tasks illustrate the successful discovery of structure in reinforcement learning. 1 Introduction Reinforcement learning comprises a family of incremental planning algorithms that construct reactive con...

### Citations

1408 |
Learning from Delayed Rewards
- Watkins
- 1989
(Show Context)
Citation Context .... Therefore, in order to learn an optimal ��, one has to solve a temporal credit assignment problem [11]. To date, the single most widely used algorithm for learning from delayed pay-off is QLearn=-=ing [14]. Q--=-Learning solves the problem of learning �� by learning a value function, denoted by Q : S \Theta A \Gamma! !. Q maps states s 2 S and actions a 2 A to scalar values. After learning, Q(s; a) ranks ... |

563 | Learning to act using real-time dynamic programming
- Barto, Bradtke, et al.
- 1995
(Show Context)
Citation Context ...gh experimentation, to act so as to maximize one's pay-off in an unknown environment. Throughout this paper we will assume that the environment of the learner is a partially controllable Markov chain =-=[1]-=-. At any instant in time the learner can observe the state of the environment, denoted by s 2 S, and apply an action, a 2 A. Actions change the state of the environment, and also produce a scalar pay-... |

384 | Practical issues in temporal difference learning
- Tesauro
- 1992
(Show Context)
Citation Context ... of Q. Recently, various researchers have applied reinforcement learning in combination with generalizing function approximators, such as nearest neighbor methods or artificial neural networks (e.g., =-=[2, 4, 12, 13]-=-). In order to apply the SKILLS algorithm together with generalizing function approximators, the notions of skill domains and description length have to be modified. For example, the membership functi... |

273 | Reward, motivation, and reinforcement learning
- Dayan, Balleine
- 2002
(Show Context)
Citation Context ...which are either known in advance [15] or determined at random [6], or based on a pyramid of different levels of perceptual resolution, which produces a whole spectrum of problem solving capabilities =-=[3]-=-. For all these approaches, drastically improved problem solving capabilities have been reported, which are far beyond that of plain, unstructured reinforcement learning. This paper exclusively focuse... |

268 | Generalization in reinforcement learning: Safely approximating the value function
- Boyan, Moore
- 1994
(Show Context)
Citation Context ... of Q. Recently, various researchers have applied reinforcement learning in combination with generalizing function approximators, such as nearest neighbor methods or artificial neural networks (e.g., =-=[2, 4, 12, 13]-=-). In order to apply the SKILLS algorithm together with generalizing function approximators, the notions of skill domains and description length have to be modified. For example, the membership functi... |

257 |
Temporal Credit Assignment in Reinforcement Learning
- Sutton
- 1984
(Show Context)
Citation Context ...oner in time, and r t refers to the expected pay-off at time t. In general, pay-off might be delayed. Therefore, in order to learn an optimal ��, one has to solve a temporal credit assignment prob=-=lem [11]. To-=- date, the single most widely used algorithm for learning from delayed pay-off is QLearning [14]. Q-Learning solves the problem of learning �� by learning a value function, denoted by Q : S \Theta... |

214 | On the convergence of stochastic iterative dynamic programming `algorithms
- Jaakkola, Jordan, et al.
- 1994
(Show Context)
Citation Context ...has been shown 1 to converge to a value function Q opt (s; a) which measures the future discounted pay-off one can expect to receive upon applying action a in state s, and acting optimally thereafter =-=[5, 14]. Th-=-e greedy policy ��(s) = argmaxsa Q opt (s;sa) maximizes R. 1 under certain conditions concerning the exploration scheme, the environment and the learning rate 3 Skills Suppose the learner faces a ... |

166 | Transfer of learning by composing solutions of elemental sequential tasks. Machine Learning 8:323–339
- Singh
- 1992
(Show Context)
Citation Context ... learning environments, skills can be used to transfer knowledge from previously learned tasks to new tasks. In particular, if the learner faces tasks with increasing complexity, as proposed by Singh =-=[10]-=-, learning skills could conceivable reduce the learning time in complex tasks, and hence scale reinforcement learning techniques to more complex tasks. Using function approximators. In this paper, per... |

101 | Hierarchical learning in stochastic domains: Preliminary results
- Kaelbling
- 1993
(Show Context)
Citation Context ...mple, abstractions have been built upon previously learned, simpler tasks [9, 10], previously learned low-level behaviors [7], subgoals, which are either known in advance [15] or determined at random =-=[6]-=-, or based on a pyramid of different levels of perceptual resolution, which produces a whole spectrum of problem solving capabilities [3]. For all these approaches, drastically improved problem solvin... |

72 | Issues in Using Function Approximation for Reinforcement Learning
- Thrun, Schwartz
- 1993
(Show Context)
Citation Context ... of Q. Recently, various researchers have applied reinforcement learning in combination with generalizing function approximators, such as nearest neighbor methods or artificial neural networks (e.g., =-=[2, 4, 12, 13]-=-). In order to apply the SKILLS algorithm together with generalizing function approximators, the notions of skill domains and description length have to be modified. For example, the membership functi... |

66 | Reinforcement learning with a hierarchy of abstract models
- Singh
- 1992
(Show Context)
Citation Context ...l skills is up to an order of magnitude larger than the time it takes to find close-to-optimal policies. Figure 2: Skill found in a more complex grid navigation task. Similar findings are reported in =-=[9]-=-. This is because discovering skills is much harder than learning control. Initially, nothing is know about the structure of the state space, and unless reasonably accurate Q-tables are available, SKI... |

44 |
Acquiring robot skills via reinforcement learning
- Gullapalli, Franklin, et al.
- 1994
(Show Context)
Citation Context |

33 |
Learning multiple goal behavior via task decomposition and dynamic policy merging
- Whitehead, Karlsson, et al.
- 1993
(Show Context)
Citation Context ...orated into learning. For example, abstractions have been built upon previously learned, simpler tasks [9, 10], previously learned low-level behaviors [7], subgoals, which are either known in advance =-=[15]-=- or determined at random [6], or based on a pyramid of different levels of perceptual resolution, which produces a whole spectrum of problem solving capabilities [3]. For all these approaches, drastic... |

12 | Two methods for hierarchy learning in reinforcement environments - Ring - 1993 |

1 |
Self-supervised Learningby Reinforcementand Artificial Neural Networks
- Lin
- 1992
(Show Context)
Citation Context ...igin of the abstraction, and the way it is incorporated into learning. For example, abstractions have been built upon previously learned, simpler tasks [9, 10], previously learned low-level behaviors =-=[7]-=-, subgoals, which are either known in advance [15] or determined at random [6], or based on a pyramid of different levels of perceptual resolution, which produces a whole spectrum of problem solving c... |

1 | Self-supervised Learning by Reinforcementand Artificial Neural Networks - Lin - 1992 |