Results 1  10
of
55
Stacked generalization
 Neural Networks
, 1992
"... Abstract: This paper introduces stacked generalization, a scheme for minimizing the generalization error rate of one or more generalizers. Stacked generalization works by deducing the biases of the generalizer(s) with respect to a provided learning set. This deduction proceeds by generalizing in a s ..."
Abstract

Cited by 550 (7 self)
 Add to MetaCart
Abstract: This paper introduces stacked generalization, a scheme for minimizing the generalization error rate of one or more generalizers. Stacked generalization works by deducing the biases of the generalizer(s) with respect to a provided learning set. This deduction proceeds by generalizing in a second space whose inputs are (for example) the guesses of the original generalizers when taught with part of the learning set and trying to guess the rest of it, and whose output is (for example) the correct guess. When used with multiple generalizers, stacked generalization can be seen as a more sophisticated version of crossvalidation, exploiting a strategy more sophisticated than crossvalidation’s crude winnertakesall for combining the individual generalizers. When used with a single generalizer, stacked generalization is a scheme for estimating (and then correcting for) the error of a generalizer which has been trained on a particular learning set and then asked a particular question. After introducing stacked generalization and justifying its use, this paper presents two numerical experiments. The first demonstrates how stacked generalization improves upon a set of separate generalizers for the NETtalk task of translating text to phonemes. The second demonstrates how stacked generalization improves the performance of a single surfacefitter. With the other experimental evidence in the literature, the usual arguments supporting crossvalidation, and the abstract justifications presented in this paper, the conclusion is that for almost any realworld generalization problem one should use some version of stacked generalization to minimize the generalization error rate. This paper ends by discussing some of the variations of stacked generalization, and how it touches on other fields like chaos theory. Key Words: generalization and induction, combining generalizers, learning set preprocessing, crossvalidation, error estimation and correction.
Regularization Theory and Neural Networks Architectures
 Neural Computation
, 1995
"... We had previously shown that regularization principles lead to approximation schemes which are equivalent to networks with one layer of hidden units, called Regularization Networks. In particular, standard smoothness functionals lead to a subclass of regularization networks, the well known Radial Ba ..."
Abstract

Cited by 309 (31 self)
 Add to MetaCart
We had previously shown that regularization principles lead to approximation schemes which are equivalent to networks with one layer of hidden units, called Regularization Networks. In particular, standard smoothness functionals lead to a subclass of regularization networks, the well known Radial Basis Functions approximation schemes. This paper shows that regularization networks encompass a much broader range of approximation schemes, including many of the popular general additive models and some of the neural networks. In particular, we introduce new classes of smoothness functionals that lead to different classes of basis functions. Additive splines as well as some tensor product splines can be obtained from appropriate classes of smoothness functionals. Furthermore, the same generalization that extends Radial Basis Functions (RBF) to Hyper Basis Functions (HBF) also leads from additive models to ridge approximation models, containing as special cases Breiman's hinge functions, som...
Inferring Body Pose without Tracking Body Parts
 IN CVPR
, 1999
"... A novel approach for estimating articulated body posture and motion from monocular video sequences is proposed. Human pose is defined as the instantaneous two dimensional configuration (i.e.,the projection onto the image plane) of a single articulated body in terms of the position of a predetermined ..."
Abstract

Cited by 110 (3 self)
 Add to MetaCart
A novel approach for estimating articulated body posture and motion from monocular video sequences is proposed. Human pose is defined as the instantaneous two dimensional configuration (i.e.,the projection onto the image plane) of a single articulated body in terms of the position of a predetermined set of joints. First, statistical segmentation of the human bodies from the background is performed and lowlevel visual features are found given the segmented body shape. The goal is to be able to map these, generally low level, visual features to body configurations. The system estimates different mappings, each one with a specific cluster in the visual feature space. Given a set of body motion sequences for training, unsupervised clustering is obtained via the Expectation Maximization algorithm. For each of the clusters, a function is estimated to build the mapping between lowlevel features to 2D pose. Given new visual features, a mapping from each cluster is performed to yield a set of possible poses. From this set, the system selects the most likely pose given the learned probability distribution and the visual feature similarity between hypothesis and input. Performance of the proposed approach is characterized using real and artificially generated body postures, showing promising results.
Approximation theory of the MLP model in neural networks
 ACTA NUMERICA
, 1999
"... In this survey we discuss various approximationtheoretic problems that arise in the multilayer feedforward perceptron (MLP) model in neural networks. Mathematically it is one of the simpler models. Nonetheless the mathematics of this model is not well understood, and many of these problems are appr ..."
Abstract

Cited by 39 (3 self)
 Add to MetaCart
In this survey we discuss various approximationtheoretic problems that arise in the multilayer feedforward perceptron (MLP) model in neural networks. Mathematically it is one of the simpler models. Nonetheless the mathematics of this model is not well understood, and many of these problems are approximationtheoretic in character. Most of the research we will discuss is of very recent vintage. We will report on what has been done and on various unanswered questions. We will not be presenting practical (algorithmic) methods. We will, however, be exploring the capabilities and limitations of this model. In the first
A developmental model for the evolution of artificial neural networks
, 2001
"... We present a model of decentralized growth and development for artificial neural networks (ANNs), inspired by developmental biology and the physiology of nervous systems. In this model, each individual artificial neuron is an autonomous unit whose behavior is determined only by the genetic informat ..."
Abstract

Cited by 38 (1 self)
 Add to MetaCart
We present a model of decentralized growth and development for artificial neural networks (ANNs), inspired by developmental biology and the physiology of nervous systems. In this model, each individual artificial neuron is an autonomous unit whose behavior is determined only by the genetic information it harbors and local concentrations of substrates. The chemicals and substrates, in turn, are modeled by a simple artificial chemistry. While the system is designed to allow for the evolution of complex networks, we demonstrate the power of the artificial chemistry by analyzing engineered (handwritten) genomes that lead to the growth of simple networks with behaviors known from physiology. To evolve more complex structures, a Javabased, platformindependent, asynchronous, distributed genetic algorithm (GA) has been implemented that allows users to participate in evolutionary experiments via the World Wide Web.
Large Vocabulary Recognition of Online Handwritten Cursive Words
, 1995
"... A critical feature of any computer system is its interface with the user. This has led to the development of user interface technologies such as mouse, touchscreen and penbased input devices. Since handwriting is one of the most familiar communication media, penbased interfaces combined with automa ..."
Abstract

Cited by 20 (1 self)
 Add to MetaCart
A critical feature of any computer system is its interface with the user. This has led to the development of user interface technologies such as mouse, touchscreen and penbased input devices. Since handwriting is one of the most familiar communication media, penbased interfaces combined with automatic handwriting recognition offers a very easy and natural input method. Penbased interfaces are also essential in mobile computing because they are scalable. Recent advances in penbased hardware and wireless communication have been influential factors in the renewed interest in online recognition systems. Online handwriting recognition is fundamentally a pattern classification task; the objective is to take an input pattern, the handwritten signal collected online via a digitizing device, and classify it as one of a prespecified set of words (i.e., the system's lexicon). Because exact recognition is very difficult, a lexicon is used to constrain the recognition output to a known vocab...
Multiobjective evolutionary optimization of the size, shape, and position parameters of radial basis function networks for function approximation
 IEEE Transactions on Neural Networks
"... Abstract—This paper presents a multiobjective evolutionary algorithm to optimize radial basis function neural networks (RBFNNs) in order to approach target functions from a set of inputoutput pairs. The procedure allows the application of heuristics to improve the solution of the problem at hand by ..."
Abstract

Cited by 19 (6 self)
 Add to MetaCart
Abstract—This paper presents a multiobjective evolutionary algorithm to optimize radial basis function neural networks (RBFNNs) in order to approach target functions from a set of inputoutput pairs. The procedure allows the application of heuristics to improve the solution of the problem at hand by including some new genetic operators in the evolutionary process. These new operators are based on two wellknown matrix transformations: singular value decomposition (SVD) and orthogonal least squares (OLS), which have been used to define new mutation operators that produce local or global modifications in the radial basis functions (RBFs) of the networks (the individuals in the population in the evolutionary procedure). After analyzing the efficiency of the different operators, we have shown that the global mutation operators yield an improved procedure to adjust the parameters of the RBFNNs. Index Terms—Evolutionary computation, neural networks, radial basis functions (RBFs), orthogonal transformations, heuristics.
A Theory Of Classifier Combination: The Neural Network Approach
, 1995
"... There is a trend in recent OCR development to improve system performance by combining recognition results of several complementary algorithms. This thesis examines the classifier combination problem under strict separation of the classifier and combinator design. None other than the fact that every ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
There is a trend in recent OCR development to improve system performance by combining recognition results of several complementary algorithms. This thesis examines the classifier combination problem under strict separation of the classifier and combinator design. None other than the fact that every classifier has the same input and output specification is assumed about the training, design or implementation of the classifiers. A general theory of combination should possess the following properties. It must be able to combine anytype of classifiers regardless of the level of information contents in the outputs. In addition, a general combinator must be able to combine any mixture of classifier types and utilize all information available. Since classifier independence is difficult to achieve and to detect, it is essential for a combinator to handle correlated classifiers robustly. Although the performance of a robust (against correlation) combinator can be improved by adding classifiers indiscriminantly, it is generally of interest to achieve comparable performance with the minimum number of classifiers. Therefore, the combinator should have the ability to eliminate redundant classifiers. Furthermore, it is desirable to have a complexity control mechanism for the combinator. In the past, simplifications come from assumptions and constraints imposed by the system designers. In the general theory, there should be a mechanism to reduce solution complexity by exercising nonclassifierspecific constraints. Finally, a combinator should capture classifier/image dependencies. Nearly all combination methods have ignored the fact that classifier performances (and outputs) depend on various image characteristics, and this dependency is manifested in classifier output patterns in relation to input imag...
Square Unit Augmented, Radially Extended, Multilayer Perceptrons
 Neural Networks: Tricks of the Trade
"... . Consider a multilayer perceptron (MLP) with d inputs, a single hidden sigmoidal layer and a linear output. By adding an additional d inputs to the network with values set to the square of the first d inputs, properties reminiscent of higherorder neural networks and radial basis function netw ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
. Consider a multilayer perceptron (MLP) with d inputs, a single hidden sigmoidal layer and a linear output. By adding an additional d inputs to the network with values set to the square of the first d inputs, properties reminiscent of higherorder neural networks and radial basis function networks (RBFN) are added to the architecture with little added expense in terms of weight requirements. Of particular interest, this architecture has the ability to form localized features in a ddimensional space with a single hidden node but can also span large volumes of the input space; thus, the architecture has the localized properties of an RBFN but does not suffer as badly from the curse of dimensionality. I refer to a network of this type as a SQuare Unit Augmented, Radially Extended, MultiLayer Perceptron (SQUAREMLP or SMLP). 1 Introduction and Motivation When faced with a new and challenging problem, the most crucial decision that a neural network researcher must make is in...
The Informational Complexity of Learning from Examples
, 1996
"... This thesis attempts to quantify the amount of information needed to learn certain tasks. The tasks chosen vary from learning functions in a Sobolev space using radial basis function networks to learning grammars in the principles and parameters framework of modern linguistic theory. These problem ..."
Abstract

Cited by 13 (4 self)
 Add to MetaCart
This thesis attempts to quantify the amount of information needed to learn certain tasks. The tasks chosen vary from learning functions in a Sobolev space using radial basis function networks to learning grammars in the principles and parameters framework of modern linguistic theory. These problems are analyzed from the perspective of computational learning theory and certain unifying perspectives emerge. Copyright c fl Massachusetts Institute of Technology, 1996 This report describes research done within the Center for Biological and Computational Learning in the Department of Brain and Cognitive Sciences and at the Artificial Intelligence Laboratory at the Massachusetts Institute of Technology. This research is sponsored by a grant from the National Science Foundation under contract ASC9217041 (this award includes funds from ARPA provided under the HPCC program); and by a grant from ARPA/ONR under contract N0001492J1879. Additional support has been provided by Siemens Co...