Results 1  10
of
63
Locally weighted learning
 ARTIFICIAL INTELLIGENCE REVIEW
, 1997
"... This paper surveys locally weighted learning, a form of lazy learning and memorybased learning, and focuses on locally weighted linear regression. The survey discusses distance functions, smoothing parameters, weighting functions, local model structures, regularization of the estimates and bias, ass ..."
Abstract

Cited by 448 (52 self)
 Add to MetaCart
This paper surveys locally weighted learning, a form of lazy learning and memorybased learning, and focuses on locally weighted linear regression. The survey discusses distance functions, smoothing parameters, weighting functions, local model structures, regularization of the estimates and bias, assessing predictions, handling noisy data and outliers, improving the quality of predictions by tuning t parameters, interference between old and new data, implementing locally weighted learning e ciently, and applications of locally weighted learning. A companion paper surveys how locally weighted learning can be used in robot learning and control.
Regularization Theory and Neural Networks Architectures
 Neural Computation
, 1995
"... We had previously shown that regularization principles lead to approximation schemes which are equivalent to networks with one layer of hidden units, called Regularization Networks. In particular, standard smoothness functionals lead to a subclass of regularization networks, the well known Radial Ba ..."
Abstract

Cited by 309 (31 self)
 Add to MetaCart
We had previously shown that regularization principles lead to approximation schemes which are equivalent to networks with one layer of hidden units, called Regularization Networks. In particular, standard smoothness functionals lead to a subclass of regularization networks, the well known Radial Basis Functions approximation schemes. This paper shows that regularization networks encompass a much broader range of approximation schemes, including many of the popular general additive models and some of the neural networks. In particular, we introduce new classes of smoothness functionals that lead to different classes of basis functions. Additive splines as well as some tensor product splines can be obtained from appropriate classes of smoothness functionals. Furthermore, the same generalization that extends Radial Basis Functions (RBF) to Hyper Basis Functions (HBF) also leads from additive models to ridge approximation models, containing as special cases Breiman's hinge functions, som...
GTM: The generative topographic mapping
 Neural Computation
, 1998
"... Latent variable models represent the probability density of data in a space of several dimensions in terms of a smaller number of latent, or hidden, variables. A familiar example is factor analysis which is based on a linear transformations between the latent space and the data space. In this paper ..."
Abstract

Cited by 275 (5 self)
 Add to MetaCart
Latent variable models represent the probability density of data in a space of several dimensions in terms of a smaller number of latent, or hidden, variables. A familiar example is factor analysis which is based on a linear transformations between the latent space and the data space. In this paper we introduce a form of nonlinear latent variable model called the Generative Topographic Mapping for which the parameters of the model can be determined using the EM algorithm. GTM provides a principled alternative to the widely used SelfOrganizing Map (SOM) of Kohonen (1982), and overcomes most of the significant limitations of the SOM. We demonstrate the performance of the GTM algorithm on a toy problem and on simulated data from flow diagnostics for a multiphase oil pipeline. Copyright c○MIT Press (1998). 1
Constructive Incremental Learning from Only Local Information
, 1998
"... ... This article illustrates the potential learning capabilities of purely local learning and offers an interesting and powerful approach to learning with receptive fields. ..."
Abstract

Cited by 160 (37 self)
 Add to MetaCart
... This article illustrates the potential learning capabilities of purely local learning and offers an interesting and powerful approach to learning with receptive fields.
Local Regression: Automatic Kernel Carpentry
 Statistical Science
, 1993
"... . A kernel smoother is an intuitive estimate of a regression function or conditional expectation; at each point x 0 the estimate of E(Y j x 0 ) is a weighted mean of the sample Y i , with observations close to x 0 receiving the largest weights. Unfortunately this simplicity has flaws. At the boundar ..."
Abstract

Cited by 107 (2 self)
 Add to MetaCart
. A kernel smoother is an intuitive estimate of a regression function or conditional expectation; at each point x 0 the estimate of E(Y j x 0 ) is a weighted mean of the sample Y i , with observations close to x 0 receiving the largest weights. Unfortunately this simplicity has flaws. At the boundary of the predictor space, the kernel neighborhood is asymmetric and the estimate may have substantial bias. Bias can be a problem in the interior as well if the predictors are nonuniform or if the regression function has substantial curvature. These problems are particularly severe when the predictors are multidimensional. A variety of kernel modifications have been proposed to provide approximate and asymptotic adjustment for these biases. Such methods generally place substantial restrictions on the regression problems that can be considered; in unfavorable situations, they can perform very poorly. Moreover, the necessary modifications are very difficult to implement in the multidimensional...
Smoothing by Local Regression: Principles and Methods
"... this paper we describe two adaptive procedures, one based on C p and the other based on crossvalidation. Still, when we have a final adaptive fit in hand, it is critical to subject it to graphical diagnostics to study its performance. The important implication of these statements is that the above c ..."
Abstract

Cited by 88 (1 self)
 Add to MetaCart
this paper we describe two adaptive procedures, one based on C p and the other based on crossvalidation. Still, when we have a final adaptive fit in hand, it is critical to subject it to graphical diagnostics to study its performance. The important implication of these statements is that the above choices must be tailored to each data set in practice; that is, the choices represent a modeling of the data. It is widely accepted that in global parametric regression there are a variety of choices that must be made  for example, the parametric family to be fitted and the form of the distribution of the response  and that we must rely on our knowledge of the mechanism generating the data, on model selection diagnostics, and on graphical diagnostic methods to make the choices. The same is true for smoothing. Cleveland (1993) presents many examples of this modeling process. For example, in one application, oxides of nitrogen from an automobile engine are fitted to the equivalence ratio, E, of the fuel and the compression ratio, C, of the engine. Coplots show that it is reasonable to use quadratics as the local parametric family but with the added assumption that given E the fitted f
The Specification of Conditional Expectations
, 1991
"... this paper was written while the author was visiting the Graduate School of Business at the University of Chicago. This paper incorporates some results previously circulated in Is the Expected Compensation for Market Volatility Constant Through Time? and On the Linearity of Conditionally Expected ..."
Abstract

Cited by 43 (6 self)
 Add to MetaCart
this paper was written while the author was visiting the Graduate School of Business at the University of Chicago. This paper incorporates some results previously circulated in Is the Expected Compensation for Market Volatility Constant Through Time? and On the Linearity of Conditionally Expected Returns. I have bene tted from the comments of Daniel Beneish, Marshall Blume, Doug Breeden, Wayne Ferson, Doug Foster, Mike Giarla, Mike Hemler, Ravi Jagannathan, Dan Nelson, Adrian Pagan, Tom Smith, Rob Stambaugh, S
Distributionfree consistency results in nonparametric discrimination and regression function estimation
 Ann. Statist
, 1980
"... (X,,, Y„) be a random sample drawn from its distribution. We study the consistency properties of the kernel estimate m(x) of the regression function m(x) = E { Y X = x} that is defined by m(x) = ~ i1 Y,k((X, x)/h)/7. n~1k((Xi x)/h,?) where k is a bounded nonnegative function on Rd with compact ..."
Abstract

Cited by 41 (9 self)
 Add to MetaCart
(X,,, Y„) be a random sample drawn from its distribution. We study the consistency properties of the kernel estimate m(x) of the regression function m(x) = E { Y X = x} that is defined by m(x) = ~ i1 Y,k((X, x)/h)/7. n~1k((Xi x)/h,?) where k is a bounded nonnegative function on Rd with compact support and (h,? ) is a sequence of positive numbers satisfying h „~,,0, nh,'n oo. It is shown that E { f I m„ (x) m(x)rµ(dx))~,,0 whenever E(I YAP) < x (p> 1). No other restrictions are placed on the distribution of (X, Y). The result is applied to verify the Bayes risk consistency of the corresponding discrimination rules. 1. Introduction and summary. In this paper we present consistency results for the nonparametric regression function estimation problem. Assume that (X, Y), (X1, Y1),. • • , (Xn, Yn) are independent identically distributed Rd x Rvalued random vectors with E { I Y I} C oo. The purpose is to estimate the regression function m(x) = E{YIX = x}
From isolation to cooperation: An alternative view of a system of experts
 Advances in Neural Information Processing Systems 8
, 1996
"... We introduce a constructive, incremental learning system for regression problems that models data by means of locally linear experts. In contrast to other approaches, the experts are trained independently and do not compete for data during learning. Only when a prediction for a query is required do ..."
Abstract

Cited by 27 (5 self)
 Add to MetaCart
We introduce a constructive, incremental learning system for regression problems that models data by means of locally linear experts. In contrast to other approaches, the experts are trained independently and do not compete for data during learning. Only when a prediction for a query is required do the experts cooperate by blending their individual predictions. Each expert is trained by minimizing a penalized local cross validation error using second order methods. In this way, an expert is able to find a local distance metric by adjusting the size and shape of the receptive field in which its predictions are valid, and also to detect relevant input features by adjusting its bias on the importance of individual input dimensions. We derive asymptotic results for our method. In a variety of simulations the properties of the algorithm are demonstrated with respect to interference, learning speed, prediction accuracy, feature detection, and task oriented incremental learning. 1.
Local Polynomial Estimation of Regression Functions for Mixing Processes
"... Local polynomial fitting has many exciting statistical properties which where established under i.i.d. setting. However, the need for nonlinear time series modeling, constructing predictive intervals, understanding divergence of nonlinear time series requires the development of the theory of local p ..."
Abstract

Cited by 25 (9 self)
 Add to MetaCart
Local polynomial fitting has many exciting statistical properties which where established under i.i.d. setting. However, the need for nonlinear time series modeling, constructing predictive intervals, understanding divergence of nonlinear time series requires the development of the theory of local polynomial fitting for dependent data. In this paper, we study the problem of estimating conditional mean functions and their derivatives via a local polynomial fit. The functions include conditional moments, conditional distribution as well as conditional density functions. Joint asymptotic normality for derivative estimation is established for both strongly mixing and aemixing processes.