Results 1  10
of
124
Stacked generalization
 Neural Networks
, 1992
"... Abstract: This paper introduces stacked generalization, a scheme for minimizing the generalization error rate of one or more generalizers. Stacked generalization works by deducing the biases of the generalizer(s) with respect to a provided learning set. This deduction proceeds by generalizing in a s ..."
Abstract

Cited by 556 (7 self)
 Add to MetaCart
Abstract: This paper introduces stacked generalization, a scheme for minimizing the generalization error rate of one or more generalizers. Stacked generalization works by deducing the biases of the generalizer(s) with respect to a provided learning set. This deduction proceeds by generalizing in a second space whose inputs are (for example) the guesses of the original generalizers when taught with part of the learning set and trying to guess the rest of it, and whose output is (for example) the correct guess. When used with multiple generalizers, stacked generalization can be seen as a more sophisticated version of crossvalidation, exploiting a strategy more sophisticated than crossvalidation’s crude winnertakesall for combining the individual generalizers. When used with a single generalizer, stacked generalization is a scheme for estimating (and then correcting for) the error of a generalizer which has been trained on a particular learning set and then asked a particular question. After introducing stacked generalization and justifying its use, this paper presents two numerical experiments. The first demonstrates how stacked generalization improves upon a set of separate generalizers for the NETtalk task of translating text to phonemes. The second demonstrates how stacked generalization improves the performance of a single surfacefitter. With the other experimental evidence in the literature, the usual arguments supporting crossvalidation, and the abstract justifications presented in this paper, the conclusion is that for almost any realworld generalization problem one should use some version of stacked generalization to minimize the generalization error rate. This paper ends by discussing some of the variations of stacked generalization, and how it touches on other fields like chaos theory. Key Words: generalization and induction, combining generalizers, learning set preprocessing, crossvalidation, error estimation and correction.
A Theory of Networks for Approximation and Learning
 Laboratory, Massachusetts Institute of Technology
, 1989
"... Learning an inputoutput mapping from a set of examples, of the type that many neural networks have been constructed to perform, can be regarded as synthesizing an approximation of a multidimensional function, that is solving the problem of hypersurface reconstruction. From this point of view, t ..."
Abstract

Cited by 195 (24 self)
 Add to MetaCart
Learning an inputoutput mapping from a set of examples, of the type that many neural networks have been constructed to perform, can be regarded as synthesizing an approximation of a multidimensional function, that is solving the problem of hypersurface reconstruction. From this point of view, this form of learning is closely related to classical approximation techniques, such as generalized splines and regularization theory. This paper considers the problems of an exact representation and, in more detail, of the approximation of linear and nonlinear mappings in terms of simpler functions of fewer variables. Kolmogorov's theorem concerning the representation of functions of several variables in terms of functions of one variable turns out to be almost irrelevant in the context of networks for learning. Wedevelop a theoretical framework for approximation based on regularization techniques that leads to a class of threelayer networks that we call Generalized Radial Basis Functions (GRBF), since they are mathematically related to the wellknown Radial Basis Functions, mainly used for strict interpolation tasks. GRBF networks are not only equivalent to generalized splines, but are also closely related to pattern recognition methods suchasParzen windows and potential functions and to several neural network algorithms, suchas Kanerva's associative memory,backpropagation and Kohonen's topology preserving map. They also haveaninteresting interpretation in terms of prototypes that are synthesized and optimally combined during the learning stage. The paper introduces several extensions and applications of the technique and discusses intriguing analogies with neurobiological data.
An Unsupervised Ensemble Learning Method for Nonlinear Dynamic StateSpace Models
 Neural Computation
, 2001
"... A Bayesian ensemble learning method is introduced for unsupervised extraction of dynamic processes from noisy data. The data are assumed to be generated by an unknown nonlinear mapping from unknown factors. The dynamics of the factors are modeled using a nonlinear statespace model. The nonlinear map ..."
Abstract

Cited by 87 (32 self)
 Add to MetaCart
A Bayesian ensemble learning method is introduced for unsupervised extraction of dynamic processes from noisy data. The data are assumed to be generated by an unknown nonlinear mapping from unknown factors. The dynamics of the factors are modeled using a nonlinear statespace model. The nonlinear mappings in the model are represented using multilayer perceptron networks. The proposed method is computationally demanding, but it allows the use of higher dimensional nonlinear latent variable models than other existing approaches. Experiments with chaotic data show that the new method is able to blindly estimate the factors and the dynamic process which have generated the data. It clearly outperforms currently available nonlinear prediction techniques in this very di#cult test problem.
Nonlinear Prediction of Chaotic Time Series Using Support Vector Machines
 IEEE Workshop on Neural Networks for Signal Processing VII
, 1997
"... A novel method for regression has been recently proposed by V. Vapnik et al. [8, 9]. The technique, called Support Vector Machine (SVM), is very well founded from the mathematical point of view and seems to provide a new insight in function approximation. We implemented the SVM and tested it on the ..."
Abstract

Cited by 87 (1 self)
 Add to MetaCart
A novel method for regression has been recently proposed by V. Vapnik et al. [8, 9]. The technique, called Support Vector Machine (SVM), is very well founded from the mathematical point of view and seems to provide a new insight in function approximation. We implemented the SVM and tested it on the same data base of chaotic time series that was used in [1] to compare the performances of different approximation techniques, including polynomial and rational approximation, local polynomial techniques, Radial Basis Functions, and Neural Networks. The SVM performs better than the approaches presented in [1]. We also study, for a particular time series, the variability in performance with respect to the few free parameters of SVM.
Nonlinear Gated Experts for Time Series: Discovering Regimes and Avoiding Overfitting
, 1995
"... this paper: ftp://ftp.cs.colorado.edu/pub/TimeSeries/MyPapers/experts.ps.Z, ..."
Abstract

Cited by 81 (5 self)
 Add to MetaCart
this paper: ftp://ftp.cs.colorado.edu/pub/TimeSeries/MyPapers/experts.ps.Z,
Spatial Transformations in the Parietal Cortex Using Basis Functions
, 1997
"... Sensorimotor transformations are nonlinear mappings of sensory inputs to motor responses. We explore here the possibility that the responses of single neurons in the parietal cortex serve as basis functions for these transformations. Basis function decomposition is a general method for approximating ..."
Abstract

Cited by 70 (7 self)
 Add to MetaCart
Sensorimotor transformations are nonlinear mappings of sensory inputs to motor responses. We explore here the possibility that the responses of single neurons in the parietal cortex serve as basis functions for these transformations. Basis function decomposition is a general method for approximating nonlinear functions that is computationally efficient and well suited for adaptive modification. In particular, the responses of single parietal neurons can be approximated by the product of a Gaussian function of retinal location and a sigmoid function of eye position, called a gain field. A large set of such functions forms a basis set that can be used to perform an arbitrary motor response through a direct projection. We compare this hypothesis with other approaches that are commonly used to model population codes, such as computational maps and vectorial representations. Neither of these alternatives can fully account for the responses of parietal neurons, and they are computationally less efficient for nonlinear transformations. Basis functions also have the advantage of not depending on any coordinate system or reference frame. As a consequence, the position of an object can be represented in multiple reference frames simultaneously, a property consistent with the behavior of hemineglect patients with lesions in the parietal cortex.
Annealed Competition of Experts for a Segmentation and Classification of Switching Dynamics
, 1996
"... We present a method for the unsupervised segmentation of data streams originating from different unknown sources which alternate in time. We use an architecture consisting of competing neural networks. Memory is included in order to resolve ambiguities of inputoutput relations. In order to obtain m ..."
Abstract

Cited by 67 (21 self)
 Add to MetaCart
We present a method for the unsupervised segmentation of data streams originating from different unknown sources which alternate in time. We use an architecture consisting of competing neural networks. Memory is included in order to resolve ambiguities of inputoutput relations. In order to obtain maximal specialization, the competition is adiabatically increased during training. Our method achieves almost perfect identification and segmentation in the case of switching chaotic dynamics where input manifolds overlap and inputoutput relations are ambiguous. Only a small dataset is needed for the training proceedure. Applications to time series from complex systems demonstrate the potential relevance of our approach for time series analysis and shortterm prediction. 1 Introduction Neural networks provide frameworks for the representation of relations present in data. Especially in the fields of classification and time series prediction, neural networks Corresponding author, email:k...
A practical method for calculating largest Lyapunov exponents from small data sets
 PHYSICA D
, 1993
"... Detecting the presence of chaos in a dynamical system is an important problem that is solved by measuring the largest Lyapunov exponent. Lyapunov exponents quantify the exponential divergence of initially close statespace trajectories and estimate the amount of chaos in a system. We present a new m ..."
Abstract

Cited by 67 (0 self)
 Add to MetaCart
Detecting the presence of chaos in a dynamical system is an important problem that is solved by measuring the largest Lyapunov exponent. Lyapunov exponents quantify the exponential divergence of initially close statespace trajectories and estimate the amount of chaos in a system. We present a new method for calculating the largest Lyapunov exponent from an experimental time series. The method follows directly from the definition of the largest Lyapunov exponent and is accurate because it takes advantage of all the available data. We show that the algorithm is fast, easy to implement, and robust to changes in the following quantities: embedding dimension, size of data set, reconstruction delay, and noise level. Furthermore, one may use the algorithm to calculate simultaneously the correlation dimension. Thus, one sequence of computations will yield an estimate of both the level of chaos and the system complexity.
Interdisciplinary application of nonlinear time series methods
 Phys. Reports
, 1998
"... This paper reports on the application to field measurements of time series methods developed on the basis of the theory of deterministic chaos. The major difficulties are pointed out that arise when the data cannot be assumed to be purely deterministic and the potential that remains in this situatio ..."
Abstract

Cited by 44 (5 self)
 Add to MetaCart
This paper reports on the application to field measurements of time series methods developed on the basis of the theory of deterministic chaos. The major difficulties are pointed out that arise when the data cannot be assumed to be purely deterministic and the potential that remains in this situation is discussed. For signals with weakly nonlinear structure, the presence of nonlinearity in a general sense has to be inferred statistically. The paper reviews the relevant methods and discusses the implications for deterministic modeling. Most field measurements yield nonstationary time series, which poses a severe problem for their analysis. Recent progress in the detection and understanding of nonstationarity is reported. If a clear signature of approximate determinism is found, the notions of phase space, attractors, invariant manifolds etc. provide a convenient framework for time series analysis. Although the results have to be interpreted with great care, superior performance can be achieved for typical signal processing tasks. In particular, prediction and filtering of signals are discussed, as well as the classification of system states by means of time series recordings.
ConstrainedRealization MonteCarlo method for Hypothesis Testing
 Physica D
"... : We compare two theoretically distinct approaches to generating artificial (or "surrogate") data for testing hypotheses about a given data set. The first and more straightforward approach is to fit a single "best" model to the original data, and then to generate surrogate data sets that are "typica ..."
Abstract

Cited by 42 (1 self)
 Add to MetaCart
: We compare two theoretically distinct approaches to generating artificial (or "surrogate") data for testing hypotheses about a given data set. The first and more straightforward approach is to fit a single "best" model to the original data, and then to generate surrogate data sets that are "typical realizations" of that model. The second approach concentrates not on the model but directly on the original data; it attempts to constrain the surrogate data sets so that they exactly agree with the original data for a specified set of sample statistics. Examples of these two approaches are provided for two simple cases: a test for deviations from a gaussian distribution, and a test for serial dependence in a time series. Additionally, we consider tests for nonlinearity in time series based on a Fourier transform (FT) method and on more conventional autoregressive movingaverage (ARMA) fits to the data. The comparative performance of hypothesis testing schemes based on these two approaches...