## Locally Weighted Learning for Control (1996)

Citations: | 160 - 17 self |

### BibTeX

@MISC{Atkeson96locallyweighted,

author = {Christopher G. Atkeson and Andrew W. Moore and Stefan Schaal},

title = {Locally Weighted Learning for Control},

year = {1996}

}

### Years of Citing Articles

### OpenURL

### Abstract

Lazy learning methods provide useful representations and training algorithms for learning about complex phenomena during autonomous adaptive control of complex systems. This paper surveys ways in which locally weighted learning, a type of lazy learning, has been applied by us to control tasks. We explain various forms that control tasks can take, and how this affects the choice of learning paradigm. The discussion section explores the interesting impact that explicitly remembering all previous experiences has on the problem of learning to control.

### Citations

2626 |
Dynamic Programming
- Bellman
- 1957
(Show Context)
Citation Context ...irical forward modelbxk+1 = b f(xk;buk). A generalpurpose solution can be obtained by discretizing state-space intoamultidimensional array of small cells, and performing a dynamic programming method (=-=Bellman, 1957-=-; Bertsekas and Tsitsiklis, 1989) such asvalue iteration or policy iteration to produce two things: 1. A value function, V (x), mapping cells onto the minimum possible sum of future costs if one start... |

1327 |
Learning from Delayed Rewards
- Watkins
- 1989
(Show Context)
Citation Context ...case. A large literature on such learning control problems has sprung up in recent years, with the general name of reinforcement learning. Overviews may be found in (Sutton, 1988; Barto et al., 1990; =-=Watkins, 1989-=-; Barto et al., 1995; Moore and Atkeson, 1993). In this paper we will restrict discussion to the applications of lazy learning to these problems. Again, we proceed by learning an empirical forward mod... |

1231 | Learning to predict by the methods of temporal differences. Machine Learning 3:9--44 - Sutton - 1988 |

532 | Learning to act using real-time dynamic programming
- Barto, Bradtke, et al.
- 1995
(Show Context)
Citation Context ...iterature on such learning control problems has sprung up in recent years, with the general name of reinforcement learning. Overviews may be found in (Sutton, 1988; Barto et al., 1990; Watkins, 1989; =-=Barto et al., 1995-=-; Moore and Atkeson, 1993). In this paper we will restrict discussion to the applications of lazy learning to these problems. Again, we proceed by learning an empirical forward modelbxk+1 = b f(xk;buk... |

530 | Active learning with statistical models
- Cohn, Ghahramani, et al.
- 1995
(Show Context)
Citation Context ...se data points which, given the uncertainty inherent in the prediction, are considered most likely to achieve the desired outcome. This can considerably reduce the exploration required (Moore, 1991a; =-=Cohn et al., 1995-=-). 2.5 ATemporally Independent Task: Billiards In order to explore the e cacy of lazy learning methods for the control of temporally independent tasks, the previously described approaches were impleme... |

477 | Integrated architectures for learning, planning, and reacting based on approximating dynamic programming
- Sutton
- 1990
(Show Context)
Citation Context ...se. Less expensive forms of dynamic programming would normally perform value iteration only at the end of each trial (as we do in the example in Section 3.6.1), or as an incremental parallel process (=-=Sutton, 1990-=-; Moore and Atkeson, 1993; Peng and Williams, 1993). 3.6.1 A Simulation Example: The Puck We illustrate this form of learning by means of a simple simulated example. Figure 12 depicts a frictionless p... |

473 | Toward memory-based reasoning - Stanfill, Waltz - 1986 |

455 | Locally weighted learning - Atkeson, Moore, et al. - 1997 |

454 | Computational Geometry
- Preparata, Shamos
- 1988
(Show Context)
Citation Context ...ly weighted regression can be obtained without explicitly visiting every point in the database. There are a surprisingly large number of algorithms available for doing this, mostly based on kd-trees (=-=Preparata and Shamos, 1985-=-; Omohundro, 1987; Moore, 1990; Grosse, 1989; Quinlan, 1993; Omohundro, 1991; Deng and Moore, 1995). 26sIs the curse of dimensionality a problem for lazy learning for control? The curse of dimensional... |

435 |
Iterative solution of nonlinear equations in several variables, volume 30
- Ortega, Rheinboldt
- 2000
(Show Context)
Citation Context ...a monotonic relationship between u and y, the sequence of actions that are chosen are closely related to the Secant method (Conte and De Boor, 1980) for numerically nding the zero of a function. See (=-=Ortega and Rheinboldt, 1970-=-) for a good discussion of the multidimensional generalization of the Secant method. An inverse model, represented using locally weighted regression and trained initially with a feedback learner, has ... |

406 | Projection pursuit regression
- Friedman, Stuetzle
- 1981
(Show Context)
Citation Context ...e.g., the distance function can be set to choose a projection direction, for example, but for multiple projection directions multiple distance functions must be used in additive locally weighted ts) (=-=Friedman and Stuetzle, 1981-=-). We expect locally weighted learning to degrade gracefully as the problem dimensionality increases. Lazy learning depends on having good representations already selected. Good representational choic... |

305 |
Learning in embedded Systems
- Kaelbling
- 1990
(Show Context)
Citation Context ...esolve this dilemma, a number of useful exploration heuristics can be used, all based on the idea that it is worth exploring only where there is little con dence in the empirical model (Sutton, 1990; =-=Kaelbling, 1993-=-; Moore and Atkeson, 1993; Cohn et al., 1995). 24s4 Lazy Learning of Models: Pros and Cons Lazy learning of models leads to new forms of autonomous control. The control algorithms explicitly perform e... |

299 | Forward models: Supervised learning with a distal teacher
- Jordan, Rumelhart
- 1992
(Show Context)
Citation Context ...d outcomes is misinterpreted by the inverse model. Even if the inverse model had interpreted the data correctly, any locally weighted averaging on u would have led to incorrect actions (Moore, 1991a; =-=Jordan and Rumelhart, 1992-=-). In subsequent sections on temporally dependent tasks, we will discuss how sometimes the action selected by the inverse function is too aggressive. 2.2 Control Using Forward Models The forward model... |

195 | Learning and sequential decision making
- Barto, Sutton, et al.
- 1989
(Show Context)
Citation Context ...ties in the general case. A large literature on such learning control problems has sprung up in recent years, with the general name of reinforcement learning. Overviews may be found in (Sutton, 1988; =-=Barto et al., 1990-=-; Watkins, 1989; Barto et al., 1995; Moore and Atkeson, 1993). In this paper we will restrict discussion to the applications of lazy learning to these problems. Again, we proceed by learning an empiri... |

157 | Predicting chaotic time series - Farmer, Sidorowich - 1987 |

153 |
Learning to predict by the methods of temporal di erences
- Sutton
- 1988
(Show Context)
Citation Context ...ational di culties in the general case. A large literature on such learning control problems has sprung up in recent years, with the general name of reinforcement learning. Overviews may be found in (=-=Sutton, 1988-=-; Barto et al., 1990; Watkins, 1989; Barto et al., 1995; Moore and Atkeson, 1993). In this paper we will restrict discussion to the applications of lazy learning to these problems. Again, we proceed b... |

140 | Is learning the n-th thing any easier than learning the first - Thrun - 1996 |

128 | Efficient algorithms for minimizing cross validation error
- Moore, Lee
- 1994
(Show Context)
Citation Context ... 1995). Cheap cross validation makes search for model parameters routine, and we have explored procedures that take advantage of this (Atkeson et al., 1995; Maron and Moore, 1994; Moore et al., 1992; =-=Moore and Lee, 1994-=-). We have extended the locally weighted learning approach to give information about the reliability of the predictions and local linearizations generated, based on the local density and distribution ... |

123 |
Parallel and Distributed Computation
- Bertsekas, Tsitsiklis
- 1989
(Show Context)
Citation Context ...modelbxk+1 = b f(xk;buk). A generalpurpose solution can be obtained by discretizing state-space intoamultidimensional array of small cells, and performing a dynamic programming method (Bellman, 1957; =-=Bertsekas and Tsitsiklis, 1989-=-) such asvalue iteration or policy iteration to produce two things: 1. A value function, V (x), mapping cells onto the minimum possible sum of future costs if one starts in that cell. 2. A policy, u(x... |

115 | Combining instance-based and model-based learning," presented at
- Quinlan
- 1993
(Show Context)
Citation Context ... point in the database. There are a surprisingly large number of algorithms available for doing this, mostly based on kd-trees (Preparata and Shamos, 1985; Omohundro, 1987; Moore, 1990; Grosse, 1989; =-=Quinlan, 1993-=-; Omohundro, 1991; Deng and Moore, 1995). 26sIs the curse of dimensionality a problem for lazy learning for control? The curse of dimensionality is the exponential dependence of needed resources on di... |

108 | Local Regression: Automatic Kernel Carpentry
- Hastie, Loader
- 1993
(Show Context)
Citation Context ...an be shown that the linearizations returned by locally weighted learning accomplish alow-bias estimate of the true gradient with fewer data points than required for a low-bias prediction of a query (=-=Hastie and Loader, 1993-=-). Automatic con dence estimations. Locally weighted regression can also be modied to return a con dence interval along with its prediction. This can be done heuristically with the local density of th... |

101 | Hoeffding races: Accelerating model selection search for classification and function approximation - Maron, Moore - 1994 |

94 | Efficient Learning and Planning Within the Dyna Framework
- Ping, Williams
- 1993
(Show Context)
Citation Context ...gramming would normally perform value iteration only at the end of each trial (as we do in the example in Section 3.6.1), or as an incremental parallel process (Sutton, 1990; Moore and Atkeson, 1993; =-=Peng and Williams, 1993-=-). 3.6.1 A Simulation Example: The Puck We illustrate this form of learning by means of a simple simulated example. Figure 12 depicts a frictionless puck on a bumpy surface, whose objective is to driv... |

92 |
Robot Juggling: Implementation of Memory-Based Learning
- Schaal, Atkeson
- 1994
(Show Context)
Citation Context ...ation about the reliability of the predictions and local linearizations generated, based on the local density and distribution of the data and an estimate of the local variance (Atkeson et al., 1995; =-=Schaal and Atkeson, 1994-=-a,b). This allows a robot to monitor its own skill level, protect itself from its ignorance by designing robust policies, and guide its exploratory behavior. Another attractive feature of locally weig... |

91 | Instance-Based Utile Distinctions for Reinforcement Learning with Hidden State - McCallum - 1995 |

91 | Discovering structure in multiple learning tasks: The TC algorithm - Thrun, O’Sullivan - 1996 |

89 | Exploiting Chaos to Predict the Future and Reduce Noise - Farmer, Sidorowich - 1988 |

85 |
Efficient algorithms with neural network behavior
- Omohundro
- 1987
(Show Context)
Citation Context ...e obtained without explicitly visiting every point in the database. There are a surprisingly large number of algorithms available for doing this, mostly based on kd-trees (Preparata and Shamos, 1985; =-=Omohundro, 1987-=-; Moore, 1990; Grosse, 1989; Quinlan, 1993; Omohundro, 1991; Deng and Moore, 1995). 26sIs the curse of dimensionality a problem for lazy learning for control? The curse of dimensionality is the expone... |

79 | Variable resolution dynamic programming: Efficiently learning action maps in multivariate real-valued state-spaces - Moore - 1991 |

66 |
Learning to Control an Unstable System with Forward Modeling
- Jordan, Jacobs
- 1990
(Show Context)
Citation Context ... can be eliminated much more quickly than approaches that incrementally update 25sparameters. Nonlinear parametric models can be trained by 1) exposing the model to a new data point only once (e.g., (=-=Jordan and Jacobs, 1990-=-; Kuperstein, 1988)), or 2) by storing the data in a database and cycling through the training data repeatedly. In case 1, much more data must be collected, since the training e ect of each data point... |

65 | Elementary Numerical Analysis - Conte, Boor - 1988 |

61 |
Multiresolution instance-based learning
- Deng, Moore
- 1995
(Show Context)
Citation Context ...e a surprisingly large number of algorithms available for doing this, mostly based on kd-trees (Preparata and Shamos, 1985; Omohundro, 1987; Moore, 1990; Grosse, 1989; Quinlan, 1993; Omohundro, 1991; =-=Deng and Moore, 1995-=-). 26sIs the curse of dimensionality a problem for lazy learning for control? The curse of dimensionality is the exponential dependence of needed resources on dimensionality found in many learning and... |

55 |
Neural model of adaptive hand-eye coordination for single postures
- Kuperstein
- 1988
(Show Context)
Citation Context ...ore quickly than approaches that incrementally update 25sparameters. Nonlinear parametric models can be trained by 1) exposing the model to a new data point only once (e.g., (Jordan and Jacobs, 1990; =-=Kuperstein, 1988-=-)), or 2) by storing the data in a database and cycling through the training data repeatedly. In case 1, much more data must be collected, since the training e ect of each data point is small. This le... |

53 | Reinforcement learning with highdimensional continuous actions - Baird, Klopf - 1993 |

51 | Using local trajectory optimizers to speed up global optimization in dynamic programming
- Atkeson
- 1994
(Show Context)
Citation Context ... the reliability of the predictions and local linearizations generated, based on the local density and distribution of the data and an estimate of the local variance (Atkeson et al., 1995; Schaal and =-=Atkeson, 1994-=-a,b). This allows a robot to monitor its own skill level, protect itself from its ignorance by designing robust policies, and guide its exploratory behavior. Another attractive feature of locally weig... |

50 |
Using local models to control movement
- Atkeson
- 1989
(Show Context)
Citation Context ...y dependent tasks in later sections. The next sections describe inverse and forward models. 2.1 Control Using Inverse Models An inverse model uses states and outcomes to predict the necessary action (=-=Atkeson, 1990-=-; Miller, 1989): u = b f ,1 (x; y) (2) This function speci es directly what action to take in each state, but does not specify what would happen given a state and an action. A lazy learner can represe... |

46 | Bumptrees for efficient function, constraint, and classification learning - Omohundro - 1991 |

44 |
Enhancing transfer in reinforcement learning by building stochastic models of robots actions
- Mahadevan
- 1992
(Show Context)
Citation Context ...idimensional discretization for computing the value function. Several researchers are investigating methods for reducing the cost of value iteration when a model has been learned (e.g. (Moore, 1991b; =-=Mahadevan, 1992-=-; Atkeson, 1994)). 3.6.2 Exploration The approach wehave described does not explicitly explore. If the learned model contains serious errors, a part of state space that wrongly looks unrewarding will ... |

42 | An Empirical Investigation of Brute Force to choose Features, Smoothers and Function Approximators
- Moore, Hill, et al.
- 1992
(Show Context)
Citation Context ...ery (Atkeson et al., 1995). Cheap cross validation makes search for model parameters routine, and we have explored procedures that take advantage of this (Atkeson et al., 1995; Maron and Moore, 1994; =-=Moore et al., 1992-=-; Moore and Lee, 1994). We have extended the locally weighted learning approach to give information about the reliability of the predictions and local linearizations generated, based on the local dens... |

41 | Stochastic optimization
- Schneider, Kirkpatrick
- 2006
(Show Context)
Citation Context ..., 1995). We will see how the explicit representation of speci c memories can speed up convergence and improve the robustness and autonomy of optimization and control algorithms (Atkeson et al., 1995; =-=Moore and Schneider, 1995-=-). It is frustrating to watch a robot repeat its mistakes, with only a slight improvement on each attempt. The goal of the learning algorithms described here is to improve performance as rapidly as po... |

39 |
111, “Real-Time Application of Neural Networks for Sensor Based Control of Robots with Vision
- Miller
- 1989
(Show Context)
Citation Context ...ks in later sections. The next sections describe inverse and forward models. 2.1 Control Using Inverse Models An inverse model uses states and outcomes to predict the necessary action (Atkeson, 1990; =-=Miller, 1989-=-): u = b f ,1 (x; y) (2) This function speci es directly what action to take in each state, but does not specify what would happen given a state and an action. A lazy learner can represent aninverse m... |

38 |
Acquisition of Dynamic Control Knowledge for a Robot Manipulator
- Moore
- 1990
(Show Context)
Citation Context ...s how sometimes the action selected by the inverse function is too aggressive. 2.2 Control Using Forward Models The forward model uses states and actions to predict outcomes (Miller, 1989; Mel, 1989; =-=Moore, 1990-=-; Jordan and Rumelhart, 1992): y = b f(x; u) (3) This allows prediction of the e ects of various actions (mental simulation) but does not prescribe the correct action to take. 6sBehavior (y) y d True ... |

33 |
Dynamics of Physical Systems
- Cannon
- 1967
(Show Context)
Citation Context ...ns of in nite magnitude or cause the size of the required actions to grow without limit. One step deadbeat control will fail on some non-minimum phase systems, of which pole balancing is one example (=-=Cannon, 1967-=-). In these systems, one must move away from the goal to approach it later. In the case of the cart-pole system the cart must initially moveaway from the target position so that the pole leans in the ... |

28 |
Fast, robust adaptive control by learning only forward models
- Moore
- 1992
(Show Context)
Citation Context ...In order to explore the e cacy of lazy learning methods for the control of temporally independent tasks, the previously described approaches were implemented on the billiards robot shown in Figure 4 (=-=Moore, 1992-=-; Moore et al., 1992). The equipment consists of a small (1:5m 0:75m) pool table, a spring actuated cue with a rotary joint under the control of a stepper motor, and two cameras attached to a Datacube... |

26 | Toward memory-based reasoning - ll, C, et al. - 1986 |

26 |
Stochastic Optimal Control
- Stengel
- 1986
(Show Context)
Citation Context ...ul techniques from the previous section, and ignore the temporal dependence. One-step deadbeat control chooses actions to (in expectation) cause the immediate next state to be the desired next state (=-=Stengel, 1986-=-). Assuming the next state is always attainable in one step, the action may bechosen without paying attention to future states, decisions, or performance. 3.1.1 An Implementation of Deadbeat Control: ... |

25 | Scaling up average reward reinforcement learning by approximating the domain models and the value function - Tadepalli, Ok - 1996 |

23 | Learning to catch: Applying nearest neighbor algorithms to dyna mic control tasks - Aha, Salzberg - 1993 |

22 | Learning to control a dynamic physical system - Connell, Utgoff - 1987 |

21 | Bayesian model comparison and backprop nets
- DHAENE, MacKay
(Show Context)
Citation Context ... of estimating uncertainty with locally weighted methods is small. Nonlinear parametric representations such asmulti-layer sigmoidal neural networks can also be adapted to return con dence intervals (=-=MacKay, 1992-=-; Pomerleau, 1994), but approximations are required, and the computational cost is larger. Worse, parametric models (e.g., global polynomial regression) that predict con dence statistically are typica... |