## Value function approximation on nonlinear manifolds for robot motor control (2007)

Venue: | In Proceedings of the IEEE Conference on Robots and Automation (ICRA |

Citations: | 6 - 1 self |

### BibTeX

@INPROCEEDINGS{Sugiyama07valuefunction,

author = {Masashi Sugiyama and Hirotaka Hachiya Christopher Towell and Sethu Vijayakumar},

title = {Value function approximation on nonlinear manifolds for robot motor control},

booktitle = {In Proceedings of the IEEE Conference on Robots and Automation (ICRA},

year = {2007}

}

### OpenURL

### Abstract

Abstract — The least squares approach works efficiently in value function approximation, given appropriate basis functions. Because of its smoothness, the Gaussian kernel is a popular and useful choice as a basis function. However, it does not allow for discontinuity which typically arises in realworld reinforcement learning tasks. In this paper, we propose a new basis function based on geodesic Gaussian kernels, which exploits the non-linear manifold structure induced by the Markov decision processes. The usefulness of the proposed method is successfully demonstrated in a simulated robot arm control and Khepera robot navigation. I.

### Citations

9946 | Statistical Learning Theory
- Vapnik
- 1998
(Show Context)
Citation Context ... of a natural ordering of basis functions. In the machine learning community, Gaussian kernels seem to be more popular than Fourier functions or wavelets because of their locality and smoothness [3], =-=[9]-=-, [10]. Furthermore, Gaussian kernels have ‘centers’, which alleviates the difficulty of basis subset choice, e.g., uniform allocation [2] or sample-dependent allocation [11]. In this paper, we theref... |

4165 | Reinforcement Learning - An Introduction
- Sutton, Barto
- 1998
(Show Context)
Citation Context ...TRODUCTION Value function approximation is an essential ingredient of reinforcement learning (RL), especially in the context of solving Markov Decision Processes (MDPs) using policy iteration methods =-=[1]-=-. In problems with large discrete state space or continuous state spaces, it becomes necessary to use function approximation methods to represent the value functions. A least squares approach using a ... |

3732 |
Self-organizing maps
- Kohonen
- 1997
(Show Context)
Citation Context ...ents in a fixed environment with several obstacles (see Fig.9(a)). Then we constructed a graph from the gathered samples by discretizing the continuous state space using the Self-Organizing Map (SOM) =-=[16]-=-. The number of nodes (states) in the graph is set to 696 (equivalent with the SOM map size of 24 × 29); this value is computed by the standard rule-of-thumb formula 5 √ n [17], where n is the number ... |

2238 | Learning with Kernels
- Schölkopf, Smola
- 2002
(Show Context)
Citation Context ... natural ordering of basis functions. In the machine learning community, Gaussian kernels seem to be more popular than Fourier functions or wavelets because of their locality and smoothness [3], [9], =-=[10]-=-. Furthermore, Gaussian kernels have ‘centers’, which alleviates the difficulty of basis subset choice, e.g., uniform allocation [2] or sample-dependent allocation [11]. In this paper, we therefore de... |

1942 |
lectures on wavelets
- Daubechies, Ten
- 1992
(Show Context)
Citation Context ...sing a linear combination of predetermined under-complete basis functions has shown to be promising in this task [2]. Fourier functions (trigonometric polynomials), Gaussian kernels [3], and wavelets =-=[4]-=- are popular basis function choices for general function approximation problems. Both Fourier bases (global functions) and Gaussian kernels (localized functions) have certain smoothness properties tha... |

1612 | A note on two problems in connexion with graphs
- Dijkstra
(Show Context)
Citation Context ...pproximation. Our definition of Gaussian kernels on graphs employs the shortest paths between states rather than the Euclidean distance, which can be computed efficiently using the Dijkstra algorithm =-=[12]-=-, [13]. Moreover, an effective use of Gaussian kernels opens up the possibility to exploit the recent advances in using Gaussian processes for temporal difference learning [11]. When basis functions d... |

1047 |
Spectral Graph Theory
- Chung
- 1997
(Show Context)
Citation Context ...7) where SP(s, s ′ ) denotes the shortest path from state s to state s ′ . The shortest path on a graph can be interpreted as a discrete approximation to the geodesic distance on a nonlinear manifold =-=[6]-=-. For this reason, we call Eq.(7) a geodesic Gaussian kernel (GGK). Shortest paths on graphs can be efficiently computed using the Dijkstra algorithm [12]. With its naive implementation, computational... |

591 |
Fibonacci heaps and their uses in improved network optimization algorithms
- Fredman, Tarjan
(Show Context)
Citation Context ...mation. Our definition of Gaussian kernels on graphs employs the shortest paths between states rather than the Euclidean distance, which can be computed efficiently using the Dijkstra algorithm [12], =-=[13]-=-. Moreover, an effective use of Gaussian kernels opens up the possibility to exploit the recent advances in using Gaussian processes for temporal difference learning [11]. When basis functions defined... |

332 | Regularization theory and neural networks architectures - Girosi, Jones, et al. - 1995 |

325 | Least-squares policy iteration
- Lagoudakis, Parr
- 2003
(Show Context)
Citation Context ...tion approximation methods to represent the value functions. A least squares approach using a linear combination of predetermined under-complete basis functions has shown to be promising in this task =-=[2]-=-. Fourier functions (trigonometric polynomials), Gaussian kernels [3], and wavelets [4] are popular basis function choices for general function approximation problems. Both Fourier bases (global funct... |

112 | Computing the shortest path: A* search meets graph theory
- Goldberg, Harrelson
- 2005
(Show Context)
Citation Context ...lly sparse (i.e., ℓ ≪ n 2 ), using the Fibonacci heap provides significant computational gains. Furthermore, there exist various approximation algorithms which are computationally very efficient (see =-=[15]-=- and and references therein). Analogous to OGKs, we need to extend GGKs to the stateaction space for using them in the LSPI method. A naive way is to just employ Eq.(6), but this can cause a ‘shift’ i... |

88 | Reinforcement learning with gaussian processes
- Engel, Mannor, et al.
- 2005
(Show Context)
Citation Context ...cality and smoothness [3], [9], [10]. Furthermore, Gaussian kernels have ‘centers’, which alleviates the difficulty of basis subset choice, e.g., uniform allocation [2] or sample-dependent allocation =-=[11]-=-. In this paper, we therefore define Gaussian kernels on graphs (which we call geodesic Gaussian kernel), and propose using them for value function approximation. Our definition of Gaussian kernels on... |

79 | Diffusion wavelets
- Coifman, Maggioni
(Show Context)
Citation Context ...on graphs are given as minor eigenvectors of the graph-Laplacian matrix. However, their global nature implies that the overall accuracy of this method tends to be degraded by local noise. The article =-=[7]-=- defined diffusion wavelets, which posses natural multi-resolution structure on graphs. The paper [8] showed that diffusion wavelets could be employed in value function approximation, although the iss... |

55 |
Toolbox for Matlab 5
- Vesanto
(Show Context)
Citation Context ...lf-Organizing Map (SOM) [16]. The number of nodes (states) in the graph is set to 696 (equivalent with the SOM map size of 24 × 29); this value is computed by the standard rule-of-thumb formula 5 √ n =-=[17]-=-, where n is the number of samples. The connectivity of the graph is determined by the state transition probability computed from the samples, i.e., if there is a state transition from one node to ano... |

41 | Proto-value functions: Developmental reinforcement learning
- Mahadevan
- 2005
(Show Context)
Citation Context ...rious different scales and may also be employed for approximating smooth functions with local discontinuity. Typical value functions in RL tasks are predominantly smooth with some discontinuous parts =-=[5]-=-. To illustrate this, let us consider a toy RL task of guiding an agent to a goal in a grid world (see Fig.1(a)). In this task, a state corresponds to a two-dimensional Cartesian position of the agent... |

34 |
Value function approximation with diffusion wavelets and laplacian eigenfunctions
- Mahadevan, Maggioni
- 2006
(Show Context)
Citation Context ... implies that the overall accuracy of this method tends to be degraded by local noise. The article [7] defined diffusion wavelets, which posses natural multi-resolution structure on graphs. The paper =-=[8]-=- showed that diffusion wavelets could be employed in value function approximation, although the issue of choosing a suitable subset of basis functions from the over-complete set is not discussed—this ... |

11 | Geodesic gaussian kernels for value function approximation
- Sugiyama, Hachiya, et al.
- 2006
(Show Context)
Citation Context ... In this section, we introduce a novel way of constructing basis functions by incorporating the graph structure; while relation to the existing graph-based methods is discussed in the separate report =-=[14]-=-. A. MDP-Induced Graph Let G be a graph induced by an MDP, where states S are nodes of the graph and the transitions with non-zero transition probabilities from one node to another are edges. The edge... |