## The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces (1995)

### Cached

### Download Links

- [www.ri.cmu.edu]
- [www.ri.cmu.edu]
- [ftp.gmd.de]
- DBLP

### Other Repositories/Bibliography

Venue: | MACHINE LEARNING |

Citations: | 224 - 7 self |

### BibTeX

@INPROCEEDINGS{Moore95theparti-game,

author = {Andrew W. Moore and Christopher G. Atkeson},

title = {The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces},

booktitle = {MACHINE LEARNING},

year = {1995},

pages = {711--718},

publisher = {Morgan Kaufmann}

}

### Years of Citing Articles

### OpenURL

### Abstract

Parti-game is a new algorithm for learning feasible trajectories to goal regions in high dimensional continuous state-spaces. In high dimensions it is essential that learning does not plan uniformly over a state-space. Parti-game maintains a decision-tree partitioning of state-space and applies techniques from game-theory and computational geometry to efficiently and adaptively concentrate high resolution only on critical areas. The current version of the algorithm is designed to find feasible paths or trajectories to goal regions in high dimensional spaces. Future versions will be designed to find a solution that optimizes a real-valued criterion. Many simulated problems have been tested, ranging from two-dimensional to nine-dimensional state-spaces, including mazes, path planning, non-linear dynamics, and planar snake robots in restricted spaces. In all cases, a good solution is found in less than ten trials and a few minutes.

### Citations

2626 |
Dynamic Programming
- Bellman
- 1957
(Show Context)
Citation Context ...ng [19], [27], [29], [3] is a promising method for robots to program and improve themselves. This paper addresses one of reinforcement learning's biggest stumbling blocks: the curse of dimensionality =-=[5]-=-, in which costs increase exponentially with the number of state variables. These costs include both the computational effort required for planning and the physical amount of data that the control sys... |

1997 |
Robot Motion Planning
- Latombe
- 1991
(Show Context)
Citation Context ...Dayan's methods may eventually be applicable to other abstraction hierarchies. Geometric Decompositions have also been used fairly extensively in Robot Motion Planning (e.g. [7], [14]), summarized in =-=[17]-=-. The principal difference is that the Robot Motion Planning methods all assume that a model of the environment (typically in the form of a pre-programmed list of polygons) is supplied to the system i... |

1327 |
Learning from Delayed Rewards
- Watkins
- 1989
(Show Context)
Citation Context ... in less than ten trials and a few minutes. Keywords: Reinforcement Learning, Curse of Dimensionality, Learning Control, Robotics, kdtrees 1. Reinforcement Learning Reinforcement learning [19], [27], =-=[29]-=-, [3] is a promising method for robots to program and improve themselves. This paper addresses one of reinforcement learning's biggest stumbling blocks: the curse of dimensionality [5], in which costs... |

625 |
Tsitsiklis. Parallel and Distributed Computation: Numerical Methods
- Bertsekas, N
- 1989
(Show Context)
Citation Context ...upon the application other sticking detectors are possible, such as an obstacle sensor on a mobile robot. Algorithm (1) works by constructing a discrete, deterministic Markov decision task (MDT) [5], =-=[6]-=- in which the discrete MDT states correspond to cells. Actions correspond to neighbors thus: action k in cell i corresponds to starting at the center of cell i and greedily aiming at the center of cel... |

615 | Some studies in machine learning using the game of checkers - Samuel - 1959 |

589 | An algorithm for finding best matches in logarithmic expected time - Friedman, Bentley, et al. - 1977 |

477 | Integrated architectures for learning, planning, and reacting based on approximating dynamic programming
- Sutton
- 1990
(Show Context)
Citation Context ...f artificial intelligence, such as planning and supervised learning. Much work has been performed with discrete state-spaces: in particular a class of Markov decision tasks known as grid worlds [29], =-=[28]-=-. Most potentially useful applications of reinforcement learning, however, take place in multidimensional continuous state-spaces. The obvious way to transform such state-spaces into discrete problems... |

316 | Prioritized Sweeping: Reinforcement Learning with Less Data and Less Real Time - Moore, Atkeson - 1993 |

305 |
Learning in embedded Systems
- Kaelbling
- 1990
(Show Context)
Citation Context ...ed long term sum of state dependent rewards. The state transitions are typically stochastic. This more conventional reinforcement learning formalism is described in many references such as [4], [29], =-=[13]-=-, [3], [21]. Parti-game is restricted to a smaller class of tasks, but within that class is designed to attack reinforcement learning problems of much higher dimensionality than previous algorithms. I... |

254 | Feudal reinforcement learning
- Dayan, Hinton
- 1993
(Show Context)
Citation Context ...s paper. 2 A.W. MOORE AND C.G. ATKESON This paper studies the pitfalls of discretization during reinforcement learning and then introduces the parti-game algorithm. Some earlier work [26], [20], [8], =-=[10]-=- considered recursively partitioning state-space while learning from delayed rewards. The new ideas in the parti-game algorithm include (i) a game-theoretic splitting criterion to robustly choose spat... |

247 |
Temporal Credit Assignment in Reinforcement Learning," Doctoral Dissertation
- Sutton
- 1984
(Show Context)
Citation Context ... found in less than ten trials and a few minutes. Keywords: Reinforcement Learning, Curse of Dimensionality, Learning Control, Robotics, kdtrees 1. Reinforcement Learning Reinforcement learning [19], =-=[27]-=-, [29], [3] is a promising method for robots to program and improve themselves. This paper addresses one of reinforcement learning's biggest stumbling blocks: the curse of dimensionality [5], in which... |

124 |
Boxes: An Experiment in Adaptive Control
- Michie, Chambers
- 1968
(Show Context)
Citation Context ...ion is found in less than ten trials and a few minutes. Keywords: Reinforcement Learning, Curse of Dimensionality, Learning Control, Robotics, kdtrees 1. Reinforcement Learning Reinforcement learning =-=[19]-=-, [27], [29], [3] is a promising method for robots to program and improve themselves. This paper addresses one of reinforcement learning's biggest stumbling blocks: the curse of dimensionality [5], in... |

118 |
Multilevel Adaptive Methods for Partial Di erential Equations
- McCormick
- 1989
(Show Context)
Citation Context ...ate the convergence of solutions to partial differential equations. Adaptive multigrid methods that allow variations in resolution across the space typically use quad-tree or oct-tree data structures =-=[18]-=-. These approaches subdivide a cell by splitting in all dimensions simultaneously. [2] describe adaptive triangulation approaches. Multigrid approaches have been used for dynamic programming in solvin... |

109 |
Real-time learning and control using asynchronous dynamic programming (Technical Report 91-57
- Barto, J, et al.
- 1991
(Show Context)
Citation Context ...ss than ten trials and a few minutes. Keywords: Reinforcement Learning, Curse of Dimensionality, Learning Control, Robotics, kdtrees 1. Reinforcement Learning Reinforcement learning [19], [27], [29], =-=[3]-=- is a promising method for robots to program and improve themselves. This paper addresses one of reinforcement learning's biggest stumbling blocks: the curse of dimensionality [5], in which costs incr... |

103 |
A subdivision algorithm in configuration space for findpath with rotation
- Brooks, Lozano-Pérez
- 1983
(Show Context)
Citation Context ...reas both Kaelbling's and Dayan's methods may eventually be applicable to other abstraction hierarchies. Geometric Decompositions have also been used fairly extensively in Robot Motion Planning (e.g. =-=[7]-=-, [14]), summarized in [17]. The principal difference is that the Robot Motion Planning methods all assume that a model of the environment (typically in the form of a pre-programmed list of polygons) ... |

99 | Hierarchical Learning in Stochastic Domains: Preliminary Results
- Kaelbling
- 1993
(Show Context)
Citation Context ...ot assume continuous paths through state-space, and so they obtained only limited empirical success. In [10] a 2-dimensional hierarchical partitioning was used on a grid with 64 discrete squares, and =-=[12]-=- gives another hierarchical algorithm. These references both attempt a different goal than parti-game: they try to accelerate Q-learning [29] by providing it with a pre-programmed abstraction of the w... |

94 | Efficient Learning and Planning Within the Dyna Framework
- Ping, Williams
- 1993
(Show Context)
Citation Context ...tcome set has changed, then the JWC updates can take place incrementally in a series of finite time intervals interleaved with real time control decisions. Techniques like this are described in [28], =-=[23]-=-, [21], [3]. The following conjecture has not been proved but we expect few difficulties: If a solution exists from all real-valued states in all cells, according to Algorithm (2), THE PARTI-GAME ALGO... |

92 |
E.: Sorting and searching
- Knuth
- 1973
(Show Context)
Citation Context ...ns: J SP (i) = 8 ! : 0 if i = GOAL 1 + min k 2 NEIGHS(i) JSP (NEXT(i; k)) Otherwise (3) The equations are solved by a shortest-path method such as dynamic programming [5], [6] or Dijkstra's algorithm =-=[15]-=-. 3 The following policy is returned: Always aim for the neighbor with the lowest J SP . THE PARTI-GAME ALGORITHM 5 START GOAL 1 2 3 4 5 6 Figure 2. Approximately 65% of the starting states (those in ... |

79 |
Variable resolution dynamic programming: Efficiently learning action maps in multivariate real-valued state-spaces
- Moore
- 1991
(Show Context)
Citation Context ...iled in this paper. 2 A.W. MOORE AND C.G. ATKESON This paper studies the pitfalls of discretization during reinforcement learning and then introduces the parti-game algorithm. Some earlier work [26], =-=[20]-=-, [8], [10] considered recursively partitioning state-space while learning from delayed rewards. The new ideas in the parti-game algorithm include (i) a game-theoretic splitting criterion to robustly ... |

55 |
Multiresolution path planning for mobile robots
- Kambhampati, Davis
- 1986
(Show Context)
Citation Context ...both Kaelbling's and Dayan's methods may eventually be applicable to other abstraction hierarchies. Geometric Decompositions have also been used fairly extensively in Robot Motion Planning (e.g. [7], =-=[14]-=-), summarized in [17]. The principal difference is that the Robot Motion Planning methods all assume that a model of the environment (typically in the form of a pre-programmed list of polygons) is sup... |

28 |
Optimum Systems Control
- Sage
- 1968
(Show Context)
Citation Context ...s. In dynamics problems a proof might need to assume local linearizability of the dynamics within cells, and could use then use Linear Quadratic Gaussian (LQG) local control design (see, for example, =-=[24]-=-). For systems which can neither be characterized as geometric motion planning problems nor dynamics problems, it is also possible that optimality might be provable. Future research into this might in... |

24 |
Learning from Delayed Reinforcement in a Complex Domain
- Chapman, Kaelbling
- 1991
(Show Context)
Citation Context ...n this paper. 2 A.W. MOORE AND C.G. ATKESON This paper studies the pitfalls of discretization during reinforcement learning and then introduces the parti-game algorithm. Some earlier work [26], [20], =-=[8]-=-, [10] considered recursively partitioning state-space while learning from delayed rewards. The new ideas in the parti-game algorithm include (i) a game-theoretic splitting criterion to robustly choos... |

22 |
A self-learning automaton with variable resolution for high precision assembly by industrial robots
- Simons, Brussel, et al.
- 1982
(Show Context)
Citation Context ...e detailed in this paper. 2 A.W. MOORE AND C.G. ATKESON This paper studies the pitfalls of discretization during reinforcement learning and then introduces the parti-game algorithm. Some earlier work =-=[26]-=-, [20], [8], [10] considered recursively partitioning state-space while learning from delayed rewards. The new ideas in the parti-game algorithm include (i) a game-theoretic splitting criterion to rob... |

14 |
Problem-Solving Methods in Arti cial Intelligence
- Nilsson
- 1971
(Show Context)
Citation Context ...ssible that optimality might be provable. Future research into this might incorporate admissible heuristics: a classical method in AI for formally reasoning about the optimality of proposed solutions =-=[22]-=-. 6.6. Multiple goals Because it builds an explicit model of all the possible state transitions between cells, it is a trivial matter for parti-game to change to a new goal. We have performed a number... |

13 |
Variable resolution dynamic programming: E ciently learning action maps in multivariable real-valued state-spaces
- Moore
- 1991
(Show Context)
Citation Context ...iled in this paper.s2 A.W. MOORE AND C.G. ATKESON This paper studies the pitfalls of discretization during reinforcement learning and then introduces the parti-game algorithm. Some earlier work [26], =-=[20]-=-, [8], [10] considered recursively partitioning state-space while learning from delayed rewards. The new ideas in the parti-game algorithm include (i) a game-theoretic splitting criterion to robustly ... |

7 |
Neuronlike Adaptive elements that that can learn di cult Control Problems
- Barto, Sutton, et al.
- 1983
(Show Context)
Citation Context ...ing the state-space into a multidimensional grid, and treating each box within the grid as an atomic object. Although this can be effective (see, for instance, the pole balancing experiments of [19], =-=[4]-=-), the naive grid approach has a number of dangers which will be detailed in this paper. 2 A.W. MOORE AND C.G. ATKESON This paper studies the pitfalls of discretization during reinforcement learning a... |

6 |
Dynamic Programming Complexity and Applications
- Akian, Chancelier, et al.
- 1988
(Show Context)
Citation Context ...ons simultaneously. [2] describe adaptive triangulation approaches. Multigrid approaches have been used for dynamic programming in solving for the value function specified by Bellman's equation [11], =-=[1]-=-, [9]. As with the robot motion planning approaches described above, it is not yet clear 28 A.W. MOORE AND C.G. ATKESON Trajectory North Failed Trajectory North Succeeded Figure 25. We have had two ex... |

6 |
Assessing the Quality of Local Linear Models
- Schaal, Atkeson
- 1994
(Show Context)
Citation Context ...and computational sense. It is particularly easy given our working assumption of deterministic system dynamics, but even in stochastic cases, developing a local linear model from data may not be hard =-=[25]-=-. 6.4. Dealing with an unknown goal state There is no difficulty for parti-game in removing the assumption that the location of the goal state is known. Convergence will be considerably slowed down if... |

5 |
Lenmingfiom delayed rewards
- Watkins
- 1989
(Show Context)
Citation Context ... in less than ten trials and a few minutes. Keywords: Reinforcement Learning, Curse of Dimensionality, Learning Control, Robotics, kdtrees 1. Reinforcement Learning Reinforcement learning [19], [27], =-=[29]-=-, [3] is a promising method for robots to program and improve themselves. This paper addresses one of reinforcement learning's biggest stumbling blocks: the curse of dimensionality [5], in which costs... |

4 |
Multigrid methods for Hamilton–Jacobi–Bellman equations
- Hoppe
- 1986
(Show Context)
Citation Context ...imensions simultaneously. [2] describe adaptive triangulation approaches. Multigrid approaches have been used for dynamic programming in solving for the value function specified by Bellman's equation =-=[11]-=-, [1], [9]. As with the robot motion planning approaches described above, it is not yet clear 28 A.W. MOORE AND C.G. ATKESON Trajectory North Failed Trajectory North Succeeded Figure 25. We have had t... |

4 |
A Subdivision Algorithm in Con guration Space for Find-Path with Rotation
- Brooks, Lozano-Perez
- 1985
(Show Context)
Citation Context ...ereas both Kaelbling's and Dayan's methods mayeventually be applicable to other abstraction hierarchies. Geometric Decompositions have also been used fairly extensively in Robot Motion Planning (e.g. =-=[7]-=-, [14]), summarized in [17]. The principal di erence is that the Robot Motion Planning methods all assume that a model of the environment (typically in the form of a pre-programmed list of polygons) i... |

1 |
Multigrid algorithms and complexity results for discrete-time stochastic control and related fixed-point problems
- Chow
- 1990
(Show Context)
Citation Context ...imultaneously. [2] describe adaptive triangulation approaches. Multigrid approaches have been used for dynamic programming in solving for the value function specified by Bellman's equation [11], [1], =-=[9]-=-. As with the robot motion planning approaches described above, it is not yet clear 28 A.W. MOORE AND C.G. ATKESON Trajectory North Failed Trajectory North Succeeded Figure 25. We have had two experie... |

1 |
Complexity Analysis of Reinforcement Learning
- Koenig, Simmons
- 1993
(Show Context)
Citation Context ...e face of uncertainty" [13], [28] in which any unvisited state-action pair was assumed to be zero steps from the goal. In a deterministic problem, this strategy is guaranteed to find the optimal =-=path [16]-=-. The deterministic assumption also permitted Q-learning to use a learning rate ff = 1. Prioritized sweeping was allowed 200 backups per transition. The results are shown in Table 1. Parti-game has co... |