Results 1 
4 of
4
Locally weighted learning
 ARTIFICIAL INTELLIGENCE REVIEW
, 1997
"... This paper surveys locally weighted learning, a form of lazy learning and memorybased learning, and focuses on locally weighted linear regression. The survey discusses distance functions, smoothing parameters, weighting functions, local model structures, regularization of the estimates and bias, ass ..."
Abstract

Cited by 594 (53 self)
 Add to MetaCart
This paper surveys locally weighted learning, a form of lazy learning and memorybased learning, and focuses on locally weighted linear regression. The survey discusses distance functions, smoothing parameters, weighting functions, local model structures, regularization of the estimates and bias, assessing predictions, handling noisy data and outliers, improving the quality of predictions by tuning t parameters, interference between old and new data, implementing locally weighted learning e ciently, and applications of locally weighted learning. A companion paper surveys how locally weighted learning can be used in robot learning and control.
Powerful Choices: Tuning Parameter Selection Based on Power
, 2005
"... We consider procedures which select the bandwidth in local linear regression by maximizing the limiting power for local Pitman alternatives to the hypothesis that µ(x) ≡ E(Y  X = x) is constant. The focus is on achieving high power near a covariate value x0 and we consider tests based on data with ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
We consider procedures which select the bandwidth in local linear regression by maximizing the limiting power for local Pitman alternatives to the hypothesis that µ(x) ≡ E(Y  X = x) is constant. The focus is on achieving high power near a covariate value x0 and we consider tests based on data with X restricted to an interval containing x0 with bandwidth h. The power optimal bandwidth is shown to tend to zero as sample size goes to infinity if and only if the sequence of Pitman alternatives is such that the length of the interval centered at x0 on which µ(x) = µn(x) is nonconstant converges to zero as n → ∞. We show that tests which are based on local linear fits over asymmetric intervals of the form [x0 − (1 − λ)h, x0 + (1 + λ)h], where −1 ≤ λ ≤ 1, rather than the symmetric intervals [x0−h, x0+h] will give better asymptotic power. A simple procedure for selecting h and λ consists of using order statistics intervals containing x0. Examples illustrate that the effect of these choices are not trivial: Power optimal bandwidth can give much higher power than bandwidth chosen to minimize mean squared error. Because we focus on power, rather than plotting estimates of µ(x) we plot a correlation curve �ρ(x) which indicates the strength of the dependence between Y and X near each X = x. Extensions to
Selecting Local Models in Multiple Regression by Maximizing Power
, 2006
"... This paper considers multiple regression procedures for analyzing the relationship between a response variable and a vector of d covariates in a nonparametric setting where both tuning parameters and the number of covariates need to be selected. We introduce an approach which handles the dilemma tha ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
This paper considers multiple regression procedures for analyzing the relationship between a response variable and a vector of d covariates in a nonparametric setting where both tuning parameters and the number of covariates need to be selected. We introduce an approach which handles the dilemma that with high dimensional data the sparsity of data in regions of the sample space makes estimation of nonparametric curves and surfaces virtually impossible. This is accomplished by abandoning the goal of trying to estimate true underlying curves and instead estimating measures of dependence that can determine important relationships between variables. These dependence measures are based on local parametric fits on subsets of the covariate space that vary in both dimension and size within each dimension. The subset which maximizes a signal to noise ratio is chosen, where the signal is a local estimate of a dependence parameter which depends on the subset dimension and size, and the noise is an estimate of the standard error (SE) of the estimated signal. This approach of choosing the window size to maximize a signal to noise ratio lifts the curse of dimensionality because for regions with sparsity of data the SE is very large. It corresponds to asymptotically maximizing the probability of correctly finding nonspurious relationships between covariates and a response or, more precisely, maximizing asymptotic power among a class of asymptotic level α ttests indexed by subsets of the covariate space. Subsets that achieve this goal are called features. We investigate the properties
by Maximizing Power
, 2006
"... This paper considers multiple regression procedures for analyzing the relationship between a response variable and a vector of d covariates in a nonparametric setting where both tuning parameters and the number of covariates need to be selected. We introduce an approach which handles the dilemma tha ..."
Abstract
 Add to MetaCart
This paper considers multiple regression procedures for analyzing the relationship between a response variable and a vector of d covariates in a nonparametric setting where both tuning parameters and the number of covariates need to be selected. We introduce an approach which handles the dilemma that with high dimensional data the sparsity of data in regions of the sample space makes estimation of nonparametric curves and surfaces virtually impossible. This is accomplished by abandoning the goal of trying to estimate true underlying curves and instead estimating measures of dependence that can determine important relationships between variables. These dependence measures are based on local parametric fits on subsets of the covariate space that vary in both dimension and size within each dimension. The subset which maximizes a signal to noise ratio is chosen, where the signal is a local estimate of a dependence parameter which depends on the subset dimension and size, and the noise is an estimate of the standard error (SE) of the estimated signal. This approach of choosing the window size to maximize a signal to noise ratio lifts the curse of dimensionality because for regions with sparsity of data the SE is very large. It corresponds to asymptotically maximizing the probability of correctly finding nonspurious relationships between covariates and a response or, more precisely, maximizing asymptotic