#### DMCA

## DEA: An Architecture for Goal Planning and Classification (2000)

Venue: | Neural Computation |

Citations: | 4 - 0 self |

### Citations

3938 |
Dynamic Programming
- Bellman
- 1957
(Show Context)
Citation Context ...t. We define t 0 as the time of the first activation modification : t 0 = minft ? 0; c(t) 6= c(0)g. If we make a Markovian assumption 1 on c t , we obtain the following standard equilibrium equation (=-=Bellman 1957-=-, Bertsekas & Tsitsiklis 1989) : 8 ? ! ? : d 1 = 1 8i 6= 1; d i = max ff P jsP (i)=ff (t 0 ! 1; a j (t 0 ) = 1 j a i (0) = 1) d j (3) To estimate those probabilities we make the assumption that for an... |

1714 | Reinforcement learning: A survey, - Kaelbling, Littman, et al. - 1996 |

1314 | Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. - Hubel, Wiesel - 1962 |

1043 | Soar: An architecture for general intelligence. - Laird, Newell, et al. - 1987 |

563 | Integrated architectures for learning, planning, and reacting based on approximating dynamic programming
- Sutton
- 1990
(Show Context)
Citation Context ...ory (Howard 1960, Putterman 1994). They focus on the necessity to efficiently estimate the parameters of the model, given a decomposition of it into a set of situations (Barto, Sutton & Watkins 1989, =-=Sutton 1990-=-, Moore & Atkeson 1993, Kaelbling 1993). Some algorithms using MDP strategies repeatedly divide the situations to obtain a description with an appropriate resolution (Chapman & Kaelbling 1991, McCallu... |

378 | Prioritized sweeping: Reinforcement learning with less data and less time - Moore, Atkeson - 1993 |

351 |
Learning in embedded systems.
- Kaelbling
- 1993
(Show Context)
Citation Context ...They focus on the necessity to efficiently estimate the parameters of the model, given a decomposition of it into a set of situations (Barto, Sutton & Watkins 1989, Sutton 1990, Moore & Atkeson 1993, =-=Kaelbling 1993-=-). Some algorithms using MDP strategies repeatedly divide the situations to obtain a description with an appropriate resolution (Chapman & Kaelbling 1991, McCallum 1996b, Moore & Atkeson 1995). We des... |

322 |
Reinforcement learning with selective perception and hidden state.
- McCallum
- 1996
(Show Context)
Citation Context ...the net can have an arbitrarily high failure rate even if it determines almost all the necessary actions along the trajectory to the goal. We compare DEA to U-Tree algorithm developed by A. McCallum (=-=McCallum 1996-=-b). The DEA algorithm applies splits one after another, whereas U-Tree simultaneously applies all splits considered useful. Thus we also tested a modified version of U-Tree we call U-Tree * which appl... |

255 | The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces - Moore, Atkeson - 1995 |

249 | Classifier systems and genetic algorithms. - BOOKER, GOLDBERG, et al. - 1989 |

205 | Learning and Sequential Decision Making," - Barto, Sutton, et al. - 1989 |

153 | Input generalization in delayed reinforcement learning: An algorithm and performance comparisons. IJCAI-91
- Chapman, Kaelbling
- 1991
(Show Context)
Citation Context ...tkins 1989, Sutton 1990, Moore & Atkeson 1993, Kaelbling 1993). Some algorithms using MDP strategies repeatedly divide the situations to obtain a description with an appropriate resolution (Chapman & =-=Kaelbling 1991-=-, McCallum 1996b, Moore & Atkeson 1995). We describe in this article the Differential Efficiency Algorithm, used to construct a planning network. This work is inspired by the biological model of the c... |

120 | Classification and Regression Trees, Wadsworth Statistics/Probability, Chapman and Hall/CRC, - Breiman, Friedman, et al. - 1984 |

116 | Control of selective perception using Bayes nets and decision theory - Rimey, Brown - 1994 |

88 | Learning to Use Selective Attention and Short-Term Memory in Sequential Tasks.
- McCallum
- 1996
(Show Context)
Citation Context ...on 1990, Moore & Atkeson 1993, Kaelbling 1993). Some algorithms using MDP strategies repeatedly divide the situations to obtain a description with an appropriate resolution (Chapman & Kaelbling 1991, =-=McCallum 1996-=-b, Moore & Atkeson 1995). We describe in this article the Differential Efficiency Algorithm, used to construct a planning network. This work is inspired by the biological model of the cortex proposed ... |

64 |
An Adaptive Neural Network: The Cerebral Cortex,
- Burnod
- 1988
(Show Context)
Citation Context ...Atkeson 1995). We describe in this article the Differential Efficiency Algorithm, used to construct a planning network. This work is inspired by the biological model of the cortex proposed by Burnod (=-=Burnod 1989-=-). Formally, we show that, despite the connectionnist motivation, this model can be reduced to a standard MDP model, applied to a set of situations built iteratively. We show also that on particular p... |

63 | The Effect of Representation and Knowledge on Goal-Directed Exploration with Reinforcement-Learning Algorithms.
- Koenig, Simmons
- 1996
(Show Context)
Citation Context ...red by such an exploration, even in a deterministic universe, can grow exponentially with the number of potential situations (a typical example of such a universe is the "Reset state space" =-=(Koenig & Simmons 1996)). This p-=-roblem can be partially solved by the restriction of this random exploration to a biased one (Kaelbling, Littman & Moore 1996). The usage of such "reflex" actions during the learning period ... |

46 | A hierarchical neuronal network for planning behavior, - Dehaene, Changeux - 1997 |

7 | Neural network models of cortical functions based on the computational properties of the cerebral cortex - Guigon, Grandguillaume, et al. - 1994 |

1 | Slug : A connectionnist architecture for inferring the structure of finite-state environments - Mozer - 1991 |

1 |
Six Etudes de Psychologie, Editions Denoel
- Piaget
- 1964
(Show Context)
Citation Context ...n to a biased one (Kaelbling, Littman & Moore 1996). The usage of such "reflex" actions during the learning period is justified to some extent by the existence of such behavior in mammals an=-=d humans (Piaget 1964-=-). When this random step is over, one can estimate the strengths of the interunit connections by considering the record of the successive states of the universe. Using those estimations, the learning ... |

1 |
Neural networks and related mehods for classification
- Ripley
- 1994
(Show Context)
Citation Context ...d can be decomposed into two main families of algorithms. The parametric models are designed to specific problems, while the nonparametric ones, like nearest neighbors, neural nets or decision trees (=-=Ripley 1994-=-) can be used to solve problems which have no a priori model. The latter seem to be closer to the natural capabilities of animal brains. The planning problem has been studied from various points of vi... |