## Applications of the self-organising map to reinforcement learning (2002)

Venue: | Neural Networks |

Citations: | 16 - 0 self |

### BibTeX

@ARTICLE{Smith02applicationsof,

author = {Andrew James Smith},

title = {Applications of the self-organising map to reinforcement learning},

journal = {Neural Networks},

year = {2002},

volume = {15},

pages = {1107--1124}

}

### Years of Citing Articles

### OpenURL

### Abstract

Running Title: Applying the SOM to reinforcement learning

### Citations

5107 | Neural Networks for Pattern Recognition - Bishop - 1995 |

2874 | Learning internal representations by error propagation", Parallel Distributed Processing : explorations in the microstructure of cognition 1 - Rumelhart, Hinton, et al. - 1987 |

2773 |
Dynamic Programming
- Bellman
- 1957
(Show Context)
Citation Context ...example, 1This technique can be used to perform linear or non-linear principle component analysis. See Bishop (1995)(pg 314) for a discussion. 2Later developing into the field of Dynamic Programming (=-=Bellman, 1957-=-). 1sApplying the SOM to reinforcement learning 2 Q-learning (see section 2) is an incremental, iterative, interactive algorithm with a simple update rule, implementation and representation. These com... |

1373 |
Self-Organization and Associative Memory
- Kohonen
- 1988
(Show Context)
Citation Context ...consider some of the model’s limitations including issues pertaining to scalability. 3 The Self-Organising Map The model proposed in this account is based on the ubiquitous Self-Organising Map (SOM) (=-=Kohonen, 1987-=-, 1995), and so this algorithm is now briefly reviewed. For a more detailed reminder of the model see Rojas (1996) and for a book of varied applications and analyses see Oja and Kaski (1999). The inte... |

1365 |
Learning from Delayed Rewards
- Watkins
- 1989
(Show Context)
Citation Context ...Bellman, 1957) which have at their heart a process of estimating the expected return following either each state or each state-action pair (Sutton and Barto, 1998). A popular technique is Q-learning (=-=Watkins, 1989-=-) which estimates the return: E(rt + γrt+1 + γ 2 rt+2 + ··· + γ h rt+h)|s,a,π (1) for every state-action pair, (s,a), of the MDP. Here, rt is the reward received at time t where t is the time that act... |

1356 | Reinforcement Learning: A Survey - Kaelbling, Littman, et al. - 1996 |

892 | Reinforcement Learning
- Sutton, Bartow
- 1998
(Show Context)
Citation Context ... offers a suite of techniques based on dynamic programming (Bellman, 1957) which have at their heart a process of estimating the expected return following either each state or each state-action pair (=-=Sutton and Barto, 1998-=-). A popular technique is Q-learning (Watkins, 1989) which estimates the return: E(rt + γrt+1 + γ 2 rt+2 + ··· + γ h rt+h)|s,a,π (1) for every state-action pair, (s,a), of the MDP. Here, rt is the rew... |

377 | Practical issues in temporal difference learning - Tesauro - 1992 |

296 | On-line Q-learning using connectionist systems
- Rummery, Niranjan
- 1994
(Show Context)
Citation Context ...This paper presents a new model which adapts Q-learning 3 so that not only is an optimal 3 We note that there are other approaches to estimating equation (1) for each state-action pair such as SARSA (=-=Rummery and Niranjan, 1994-=-) for example, or Monte Carlo techniques (Sutton and Barto, 1998)[chapter 5], and in fact for the purposes of this paper any of these techniques could be substituted for Q-learning without loss of gen... |

288 | Improving elevator performance using reinforcement learning - Crites, Barto - 1996 |

281 | A New Approach to Manipulator Control: The Cerebellar Model ArticuLation Controller (CMAC)", Trans. ASME
- Albus
- 1975
(Show Context)
Citation Context ...although the model may be extendible to the more general case of real-valued, delayed reward. Other noteworthy approaches to RL generalisation include the work of Prescott (1994) which uses the CMAC (=-=Albus, 1975-=-) approach to decomposing the state-space and then linear output units to generate a Normally distributed action (similar to Williams (1988), Gullapalli (1990) and Rummery (1995)), and also the coarse... |

252 |
Temporal Credit Assignment in Reinforcement Learning
- Sutton
- 1984
(Show Context)
Citation Context ...man (1990) (and its extension proposed in Ziemke (1996)), the Q-AHC algorithm of Rummery (1995)[chapter 5] based on the Adaptive Heuristic Critic (AHC) and actor-critic models of (Barto et al., 1983; =-=Sutton, 1984-=-), as well as the approach of Prescott (1994)[chapter 6] based on the work of Albus (1975) and Williams (1988). A key motivation for studying continuous action spaces as opposed to fixing discrete act... |

239 | TD-Gammon, a self-teaching backgammon program, achieves master level play - Tesauro - 1994 |

195 | Neural Networks: a systematic introduction - Rojas - 1996 |

193 | Reinforcement Learning for Robots Using Neural Networks - Lin - 1993 |

182 |
Neuronlike elements that can solve difficult learning control problems
- Barto, Sutton, et al.
- 1983
(Show Context)
Citation Context ...m of Ackley and Littman (1990) (and its extension proposed in Ziemke (1996)), the Q-AHC algorithm of Rummery (1995)[chapter 5] based on the Adaptive Heuristic Critic (AHC) and actor-critic models of (=-=Barto et al., 1983-=-; Sutton, 1984), as well as the approach of Prescott (1994)[chapter 6] based on the work of Albus (1975) and Williams (1988). A key motivation for studying continuous action spaces as opposed to fixin... |

174 |
An Analogue Approach to the Travelling Salesman Problem Using an Elastic
- Durbin, Willshaw
- 1987
(Show Context)
Citation Context ...the MLP, may be favoured since the response of these models to the curse of dimensionality is known to be more robust than local representation models such as radial basis functions, the elastic net (=-=Durbin and Willshaw, 1987-=-), and the SOM for example. This intuition is partly borne out in the comparison of the proposed model with MLP based generalisation (based on the SRV units of Gullapalli (1990), and the Q-AHC model o... |

143 | Self-Organising Maps - Kohonen - 1995 |

132 | Animal intelligence - Thorndike - 1911 |

100 | Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces. Adaptive Behavior 6(2):163 - Santamaria, Sutton, et al. - 1997 |

65 | A stochastic reinforcement learning algorithm for learning real-valued functions - Gullapalli - 1990 |

48 | Problem solving with reinforcement learning - Rummery - 1995 |

41 | Toward a theory of reinforcement-learning connectionist systems - Williams - 1988 |

38 | Generalization and scaling in reinforcement learning - Ackley, Littman - 1990 |

30 |
Automatic programming of behaviour-based robots using 37 reinforcement learning
- Mahadevan, Connell
- 1992
(Show Context)
Citation Context ...ome form of generalisation is required. There have been many different approaches to generalisation of large or continuous state spaces in RL problems including hand decomposition of the state space (=-=Mahadevan and Connell, 1991-=-), tile-coding techniques such as CMACS (Albus (1975), see also Prescott (1994)), coarse coding (see Santamaria et al. (1997) for a review), the use of the SOM (Sehad and Touzet (1994); Touzet (1997);... |

24 | Neural reinforcement learning for behaviour synthesis,” Robot - Touzet - 1997 |

22 | A study of the application of Kohonen-type neural networks to the travelling salesman problem - Farata, Walker - 1991 |

7 | Explorations in reinforcement and model-based learning - Prescott - 1994 |

6 |
Dynamic Generalisation of Continuous Action Spaces in Reinforcement Learning: A Neurally Inspired Approach
- Smith
- 2001
(Show Context)
Citation Context ...inuous reward functions and non-contiguous state-action mappings (in terms of SOM topology). 5 First proposed in Ritter et al. (1990), although the account is taken from Wedel and Polani (1996). See (=-=Smith, 2001-=-b,a) for a full description, and Wedel and Polani (1996) for an extended application. 6 The reward function here is C(1) discontinuous because a small transition in input space can lead to a different... |

3 | Critic-based learning of actions with self-organising feature maps - Wedel, Polani - 1996 |

2 |
Extending Kohonen’s SelfOrganising Mapping Algorithm to Learn Ballistic Movements
- Ritter, Schulten
- 1987
(Show Context)
Citation Context ...oaches is therefore questionable. The SOM has also been used for continuous action space RL. Of significant note is the work of Wedel and Polani (1996) which presents an extension to the Motoric Map (=-=Ritter and Schulten, 1987-=-). Their idea is to map the input space with a SOM, and then attach a single action to each unit which must learn an appropriate action for that state. Their innovation, which they call covariance lea... |

2 | Self-organising map for reinforcement learning: Obstacle avoidance with Khepera - Sehad, Touzet - 1994 |

1 |
Applying the SOM to reinforcement learning 34
- Peng, Williams
- 1996
(Show Context)
Citation Context ...ld. But it has already been established that this problem is easily solved using general RL techniques in which there are just two discrete actions corresponding to a right and left push of the cart (=-=Peng and Williams, 1996-=-), and so the usefulness of adaptable real-valued actions in this context is again questionable. If latency is introduced into the system so that the agent can only sample the state of the environment... |

1 | Neuronal Netze - Ritter, Martinetz, et al. - 1990 |

1 |
Dynamic actions in reinforcement learning. http://www.dai.ed.ac.uk/homes/andys/PAPERS/papers.html
- Smith
- 2001
(Show Context)
Citation Context ...inuous reward functions and non-contiguous state-action mappings (in terms of SOM topology). 5 First proposed in Ritter et al. (1990), although the account is taken from Wedel and Polani (1996). See (=-=Smith, 2001-=-b,a) for a full description, and Wedel and Polani (1996) for an extended application. 6 The reward function here is C(1) discontinuous because a small transition in input space can lead to a different... |

1 | Applying the SOM to reinforcement learning 35 - Tani, Fukumura - 1994 |