Results 1  10
of
70
Stacked generalization
 Neural Networks
, 1992
"... Abstract: This paper introduces stacked generalization, a scheme for minimizing the generalization error rate of one or more generalizers. Stacked generalization works by deducing the biases of the generalizer(s) with respect to a provided learning set. This deduction proceeds by generalizing in a s ..."
Abstract

Cited by 550 (7 self)
 Add to MetaCart
Abstract: This paper introduces stacked generalization, a scheme for minimizing the generalization error rate of one or more generalizers. Stacked generalization works by deducing the biases of the generalizer(s) with respect to a provided learning set. This deduction proceeds by generalizing in a second space whose inputs are (for example) the guesses of the original generalizers when taught with part of the learning set and trying to guess the rest of it, and whose output is (for example) the correct guess. When used with multiple generalizers, stacked generalization can be seen as a more sophisticated version of crossvalidation, exploiting a strategy more sophisticated than crossvalidation’s crude winnertakesall for combining the individual generalizers. When used with a single generalizer, stacked generalization is a scheme for estimating (and then correcting for) the error of a generalizer which has been trained on a particular learning set and then asked a particular question. After introducing stacked generalization and justifying its use, this paper presents two numerical experiments. The first demonstrates how stacked generalization improves upon a set of separate generalizers for the NETtalk task of translating text to phonemes. The second demonstrates how stacked generalization improves the performance of a single surfacefitter. With the other experimental evidence in the literature, the usual arguments supporting crossvalidation, and the abstract justifications presented in this paper, the conclusion is that for almost any realworld generalization problem one should use some version of stacked generalization to minimize the generalization error rate. This paper ends by discussing some of the variations of stacked generalization, and how it touches on other fields like chaos theory. Key Words: generalization and induction, combining generalizers, learning set preprocessing, crossvalidation, error estimation and correction.
Locally weighted learning
 ARTIFICIAL INTELLIGENCE REVIEW
, 1997
"... This paper surveys locally weighted learning, a form of lazy learning and memorybased learning, and focuses on locally weighted linear regression. The survey discusses distance functions, smoothing parameters, weighting functions, local model structures, regularization of the estimates and bias, ass ..."
Abstract

Cited by 448 (52 self)
 Add to MetaCart
This paper surveys locally weighted learning, a form of lazy learning and memorybased learning, and focuses on locally weighted linear regression. The survey discusses distance functions, smoothing parameters, weighting functions, local model structures, regularization of the estimates and bias, assessing predictions, handling noisy data and outliers, improving the quality of predictions by tuning t parameters, interference between old and new data, implementing locally weighted learning e ciently, and applications of locally weighted learning. A companion paper surveys how locally weighted learning can be used in robot learning and control.
A Theory of Networks for Approximation and Learning
 Laboratory, Massachusetts Institute of Technology
, 1989
"... Learning an inputoutput mapping from a set of examples, of the type that many neural networks have been constructed to perform, can be regarded as synthesizing an approximation of a multidimensional function, that is solving the problem of hypersurface reconstruction. From this point of view, t ..."
Abstract

Cited by 194 (24 self)
 Add to MetaCart
Learning an inputoutput mapping from a set of examples, of the type that many neural networks have been constructed to perform, can be regarded as synthesizing an approximation of a multidimensional function, that is solving the problem of hypersurface reconstruction. From this point of view, this form of learning is closely related to classical approximation techniques, such as generalized splines and regularization theory. This paper considers the problems of an exact representation and, in more detail, of the approximation of linear and nonlinear mappings in terms of simpler functions of fewer variables. Kolmogorov's theorem concerning the representation of functions of several variables in terms of functions of one variable turns out to be almost irrelevant in the context of networks for learning. Wedevelop a theoretical framework for approximation based on regularization techniques that leads to a class of threelayer networks that we call Generalized Radial Basis Functions (GRBF), since they are mathematically related to the wellknown Radial Basis Functions, mainly used for strict interpolation tasks. GRBF networks are not only equivalent to generalized splines, but are also closely related to pattern recognition methods suchasParzen windows and potential functions and to several neural network algorithms, suchas Kanerva's associative memory,backpropagation and Kohonen's topology preserving map. They also haveaninteresting interpretation in terms of prototypes that are synthesized and optimally combined during the learning stage. The paper introduces several extensions and applications of the technique and discusses intriguing analogies with neurobiological data.
Constructive Incremental Learning from Only Local Information
, 1998
"... ... This article illustrates the potential learning capabilities of purely local learning and offers an interesting and powerful approach to learning with receptive fields. ..."
Abstract

Cited by 160 (37 self)
 Add to MetaCart
... This article illustrates the potential learning capabilities of purely local learning and offers an interesting and powerful approach to learning with receptive fields.
Locally Weighted Learning for Control
, 1996
"... Lazy learning methods provide useful representations and training algorithms for learning about complex phenomena during autonomous adaptive control of complex systems. This paper surveys ways in which locally weighted learning, a type of lazy learning, has been applied by us to control tasks. We ex ..."
Abstract

Cited by 159 (17 self)
 Add to MetaCart
Lazy learning methods provide useful representations and training algorithms for learning about complex phenomena during autonomous adaptive control of complex systems. This paper surveys ways in which locally weighted learning, a type of lazy learning, has been applied by us to control tasks. We explain various forms that control tasks can take, and how this affects the choice of learning paradigm. The discussion section explores the interesting impact that explicitly remembering all previous experiences has on the problem of learning to control.
Finding Chaos in Noisy Systems
, 1991
"... In the past twenty years there has been much interest in the physical and biological sciences in nonlinear dynamical systems that appear to have random, unpredictable behavior. One important parameter of a dynamic system is the dominant Lyapunov exponent (LE). When the behavior of the system is comp ..."
Abstract

Cited by 50 (1 self)
 Add to MetaCart
In the past twenty years there has been much interest in the physical and biological sciences in nonlinear dynamical systems that appear to have random, unpredictable behavior. One important parameter of a dynamic system is the dominant Lyapunov exponent (LE). When the behavior of the system is compared for two similar initial conditions, this exponent is related to the rate at which the subsequent trajectories diverge. A bounded system with a positive LE is one operational definition of chaotic behavior. Most methods for determining the LE have assumed thousands of observations generated from carefully controlled physical experiments. Less attention has been given to estimating the LE for biological and economic systems that are subjected to random perturbations and observed over a limited amount of time. Using nonparametric regression techniques (Neural Networks and Thin Plate Splines) it is possible to consistently estimate the LE. The properties of these methods have been studied using simulated data and are applied to a biological time series: marten fur returns for the Hudson Bay Company (18201900). Based on a nonparametric analysis there is little evidence for lowdimensional chaos in these data. Although these methods appear to work well for systems perturbed by small amounts of noise, finding chaos in a system with a significant stochastic component may be difficult.
Interdisciplinary application of nonlinear time series methods
 Phys. Reports
, 1998
"... This paper reports on the application to field measurements of time series methods developed on the basis of the theory of deterministic chaos. The major difficulties are pointed out that arise when the data cannot be assumed to be purely deterministic and the potential that remains in this situatio ..."
Abstract

Cited by 42 (5 self)
 Add to MetaCart
This paper reports on the application to field measurements of time series methods developed on the basis of the theory of deterministic chaos. The major difficulties are pointed out that arise when the data cannot be assumed to be purely deterministic and the potential that remains in this situation is discussed. For signals with weakly nonlinear structure, the presence of nonlinearity in a general sense has to be inferred statistically. The paper reviews the relevant methods and discusses the implications for deterministic modeling. Most field measurements yield nonstationary time series, which poses a severe problem for their analysis. Recent progress in the detection and understanding of nonstationarity is reported. If a clear signature of approximate determinism is found, the notions of phase space, attractors, invariant manifolds etc. provide a convenient framework for time series analysis. Although the results have to be interpreted with great care, superior performance can be achieved for typical signal processing tasks. In particular, prediction and filtering of signals are discussed, as well as the classification of system states by means of time series recordings.
The Maintenance of Uncertainty
 in Control Systems
, 1997
"... It is important to remain uncertain, of observation, model and law. For the Fermi Summer School, Criticisms Requested email : lenny@maths.ox.ac.uk, Contents 1 ..."
Abstract

Cited by 27 (6 self)
 Add to MetaCart
It is important to remain uncertain, of observation, model and law. For the Fermi Summer School, Criticisms Requested email : lenny@maths.ox.ac.uk, Contents 1
Generalized Redundancies for Time Series Analysis
 Physica D
, 1995
"... Extensions to various informationtheoretic quantities (such as entropy, redundancy, and mutual information) are discussed in the context of their role in nonlinear time series analysis. We also discuss "linearized" versions of these quantities and their use as benchmarks in tests for nonlinearity. ..."
Abstract

Cited by 27 (0 self)
 Add to MetaCart
Extensions to various informationtheoretic quantities (such as entropy, redundancy, and mutual information) are discussed in the context of their role in nonlinear time series analysis. We also discuss "linearized" versions of these quantities and their use as benchmarks in tests for nonlinearity. Many of these quantities can be expressed in terms of the generalized correlation integral, and this expression permits us to more clearly exhibit the relationships of these quantities to each other and to other commonly used nonlinear statistics (such as the BDS and GreenSavit statistics). Further, numerical estimation of these quantities is found to be more accurate and more efficient when the the correlation integral is employed in the computation. Finally, we consider several "local" versions of these quantities, including a local KolmogorovSinai entropy, which gives an estimate of variability of the shortterm predictability. 1 Introduction In Shaw's influential (and prizewinning)...