Results 1 - 10
of
50
Locally weighted learning
- Artificial Intelligence Review
, 1997
"... This paper surveys locally weighted learning, a form of lazy learning and memorybased learning, and focuses on locally weighted linear regression. The survey discusses distance functions, smoothing parameters, weighting functions, local model structures, regularization of the estimates and bias, ass ..."
Abstract
-
Cited by 370 (43 self)
- Add to MetaCart
This paper surveys locally weighted learning, a form of lazy learning and memorybased learning, and focuses on locally weighted linear regression. The survey discusses distance functions, smoothing parameters, weighting functions, local model structures, regularization of the estimates and bias, assessing predictions, handling noisy data and outliers, improving the quality of predictions by tuning t parameters, interference between old and new data, implementing locally weighted learning e ciently, and applications of locally weighted learning. A companion paper surveys how locally weighted learning can be used in robot learning and control.
Regularization Theory and Neural Networks Architectures
- Neural Computation
, 1995
"... We had previously shown that regularization principles lead to approximation schemes which are equivalent to networks with one layer of hidden units, called Regularization Networks. In particular, standard smoothness functionals lead to a subclass of regularization networks, the well known Radial Ba ..."
Abstract
-
Cited by 257 (30 self)
- Add to MetaCart
We had previously shown that regularization principles lead to approximation schemes which are equivalent to networks with one layer of hidden units, called Regularization Networks. In particular, standard smoothness functionals lead to a subclass of regularization networks, the well known Radial Basis Functions approximation schemes. This paper shows that regularization networks encompass a much broader range of approximation schemes, including many of the popular general additive models and some of the neural networks. In particular, we introduce new classes of smoothness functionals that lead to different classes of basis functions. Additive splines as well as some tensor product splines can be obtained from appropriate classes of smoothness functionals. Furthermore, the same generalization that extends Radial Basis Functions (RBF) to Hyper Basis Functions (HBF) also leads from additive models to ridge approximation models, containing as special cases Breiman's hinge functions, som...
GTM: The generative topographic mapping
- Neural Computation
, 1998
"... Latent variable models represent the probability density of data in a space of several dimensions in terms of a smaller number of latent, or hidden, variables. A familiar example is factor analysis which is based on a linear transformations between the latent space and the data space. In this paper ..."
Abstract
-
Cited by 234 (5 self)
- Add to MetaCart
Latent variable models represent the probability density of data in a space of several dimensions in terms of a smaller number of latent, or hidden, variables. A familiar example is factor analysis which is based on a linear transformations between the latent space and the data space. In this paper we introduce a form of non-linear latent variable model called the Generative Topographic Mapping for which the parameters of the model can be determined using the EM algorithm. GTM provides a principled alternative to the widely used Self-Organizing Map (SOM) of Kohonen (1982), and overcomes most of the significant limitations of the SOM. We demonstrate the performance of the GTM algorithm on a toy problem and on simulated data from flow diagnostics for a multi-phase oil pipeline. Copyright c○MIT Press (1998). 1
Constructive Incremental Learning from Only Local Information
, 1998
"... ... This article illustrates the potential learning capabilities of purely local learning and offers an interesting and powerful approach to learning with receptive fields. ..."
Abstract
-
Cited by 126 (35 self)
- Add to MetaCart
... This article illustrates the potential learning capabilities of purely local learning and offers an interesting and powerful approach to learning with receptive fields.
Local Regression: Automatic Kernel Carpentry
- Statistical Science
, 1993
"... . A kernel smoother is an intuitive estimate of a regression function or conditional expectation; at each point x 0 the estimate of E(Y j x 0 ) is a weighted mean of the sample Y i , with observations close to x 0 receiving the largest weights. Unfortunately this simplicity has flaws. At the boundar ..."
Abstract
-
Cited by 93 (2 self)
- Add to MetaCart
. A kernel smoother is an intuitive estimate of a regression function or conditional expectation; at each point x 0 the estimate of E(Y j x 0 ) is a weighted mean of the sample Y i , with observations close to x 0 receiving the largest weights. Unfortunately this simplicity has flaws. At the boundary of the predictor space, the kernel neighborhood is asymmetric and the estimate may have substantial bias. Bias can be a problem in the interior as well if the predictors are nonuniform or if the regression function has substantial curvature. These problems are particularly severe when the predictors are multidimensional. A variety of kernel modifications have been proposed to provide approximate and asymptotic adjustment for these biases. Such methods generally place substantial restrictions on the regression problems that can be considered; in unfavorable situations, they can perform very poorly. Moreover, the necessary modifications are very difficult to implement in the multidimensional...
Smoothing by Local Regression: Principles and Methods
"... this paper we describe two adaptive procedures, one based on C p and the other based on crossvalidation. Still, when we have a final adaptive fit in hand, it is critical to subject it to graphical diagnostics to study its performance. The important implication of these statements is that the above c ..."
Abstract
-
Cited by 69 (1 self)
- Add to MetaCart
this paper we describe two adaptive procedures, one based on C p and the other based on crossvalidation. Still, when we have a final adaptive fit in hand, it is critical to subject it to graphical diagnostics to study its performance. The important implication of these statements is that the above choices must be tailored to each data set in practice; that is, the choices represent a modeling of the data. It is widely accepted that in global parametric regression there are a variety of choices that must be made --- for example, the parametric family to be fitted and the form of the distribution of the response --- and that we must rely on our knowledge of the mechanism generating the data, on model selection diagnostics, and on graphical diagnostic methods to make the choices. The same is true for smoothing. Cleveland (1993) presents many examples of this modeling process. For example, in one application, oxides of nitrogen from an automobile engine are fitted to the equivalence ratio, E, of the fuel and the compression ratio, C, of the engine. Coplots show that it is reasonable to use quadratics as the local parametric family but with the added assumption that given E the fitted f
Distribution-free consistency results in nonparametric discrimination and regression function estimation
- Ann. Statist
, 1980
"... (X,,, Y„) be a random sample drawn from its distribution. We study the consistency properties of the kernel estimate m(x) of the regression function m(x) = E { Y X = x} that is defined by m(x) = ~ i-1 Y,k((X,- x)/h)/7. n~1k((Xi- x)/h,?) where k is a bounded nonnegative function on Rd with compact ..."
Abstract
-
Cited by 37 (7 self)
- Add to MetaCart
(X,,, Y„) be a random sample drawn from its distribution. We study the consistency properties of the kernel estimate m(x) of the regression function m(x) = E { Y X = x} that is defined by m(x) = ~ i-1 Y,k((X,- x)/h)/7. n~1k((Xi- x)/h,?) where k is a bounded nonnegative function on Rd with compact support and (h,? ) is a sequence of positive numbers satisfying h „--~,,0, nh,'-n oo. It is shown that E { f I m„ (x)- m(x)rµ(dx))--~,,0 whenever E(I YAP) < x (p> 1). No other restrictions are placed on the distribution of (X, Y). The result is applied to verify the Bayes risk consistency of the corresponding discrimination rules. 1. Introduction and summary. In this paper we present consistency results for the nonparametric regression function estimation problem. Assume that (X, Y), (X1, Y1),. • • , (Xn, Yn) are independent identically distributed Rd x R-valued random vectors with E { I Y I} C oo. The purpose is to estimate the regression function m(x) = E{YIX = x}
The Specification of Conditional Expectations
, 1991
"... this paper was written while the author was visiting the Graduate School of Business at the University of Chicago. This paper incorporates some results previously circulated in Is the Expected Compensation for Market Volatility Constant Through Time? and On the Linearity of Conditionally Expected ..."
Abstract
-
Cited by 28 (4 self)
- Add to MetaCart
this paper was written while the author was visiting the Graduate School of Business at the University of Chicago. This paper incorporates some results previously circulated in Is the Expected Compensation for Market Volatility Constant Through Time? and On the Linearity of Conditionally Expected Returns. I have bene tted from the comments of Daniel Beneish, Marshall Blume, Doug Breeden, Wayne Ferson, Doug Foster, Mike Giarla, Mike Hemler, Ravi Jagannathan, Dan Nelson, Adrian Pagan, Tom Smith, Rob Stambaugh, S
From isolation to cooperation: An alternative view of a system of experts
- Advances in Neural Information Processing Systems 8
, 1996
"... We introduce a constructive, incremental learning system for regression problems that models data by means of locally linear experts. In contrast to other approaches, the experts are trained independently and do not compete for data during learning. Only when a prediction for a query is required do ..."
Abstract
-
Cited by 26 (4 self)
- Add to MetaCart
We introduce a constructive, incremental learning system for regression problems that models data by means of locally linear experts. In contrast to other approaches, the experts are trained independently and do not compete for data during learning. Only when a prediction for a query is required do the experts cooperate by blending their individual predictions. Each expert is trained by minimizing a penalized local cross validation error using second order methods. In this way, an expert is able to find a local distance metric by adjusting the size and shape of the receptive field in which its predictions are valid, and also to detect relevant input features by adjusting its bias on the importance of individual input dimensions. We derive asymptotic results for our method. In a variety of simulations the properties of the algorithm are demonstrated with respect to interference, learning speed, prediction accuracy, feature detection, and task oriented incremental learning. 1.
Nonparametric Estimation and Testing of Interaction in Additive Models
, 2002
"... We consider an additive model with second order interaction terms. Both marginal integration estimators and a combined backfitting-integration estimator are proposed for all components of the model and their derivatives. The corresponding asymptotic distributions are derived. Moreover, two test stat ..."
Abstract
-
Cited by 18 (6 self)
- Add to MetaCart
We consider an additive model with second order interaction terms. Both marginal integration estimators and a combined backfitting-integration estimator are proposed for all components of the model and their derivatives. The corresponding asymptotic distributions are derived. Moreover, two test statistics for testing the presence of interactions are proposed. Asymptotics for the test functions and local power results are obtained. Since direct implementation of the test procedure based on the asymptotics would produce inaccurate results unless the number of observations is very large, a bootstrap procedure is provided, which is applicable for small or moderate sample sizes. Further, based on these methods a general test for additivity is developed. Estimation and testing methods are shown to work well in simulation studies. Finally, our methods

