Results 1  10
of
11
When Networks Disagree: Ensemble Methods for Hybrid Neural Networks
, 1993
"... This paper presents a general theoretical framework for ensemble methods of constructing significantly improved regression estimates. Given a population of regression estimators, we construct a hybrid estimator which is as good or better in the MSE sense than any estimator in the population. We argu ..."
Abstract

Cited by 290 (2 self)
 Add to MetaCart
This paper presents a general theoretical framework for ensemble methods of constructing significantly improved regression estimates. Given a population of regression estimators, we construct a hybrid estimator which is as good or better in the MSE sense than any estimator in the population. We argue that the ensemble method presented has several properties: 1) It efficiently uses all the networks of a population  none of the networks need be discarded. 2) It efficiently uses all the available data for training without overfitting. 3) It inherently performs regularization by smoothing in functional space which helps to avoid overfitting. 4) It utilizes local minima to construct improved estimates whereas other neural network algorithms are hindered by local minima. 5) It is ideally suited for parallel computation. 6) It leads to a very useful and natural measure of the number of distinct estimators in a population. 7) The optimal parameters of the ensemble estimator are given in clo...
Improving Regression Estimation: Averaging Methods for Variance Reduction with Extensions to General Convex Measure Optimization
, 1993
"... ..."
Discovering Neural Nets With Low Kolmogorov Complexity And High Generalization Capability
 Neural Networks
, 1997
"... Many neural net learning algorithms aim at finding "simple" nets to explain training data. The expectation is: the "simpler" the networks, the better the generalization on test data (! Occam's razor). Previous implementations, however, use measures for "simplicity" that lack the power, universali ..."
Abstract

Cited by 50 (31 self)
 Add to MetaCart
Many neural net learning algorithms aim at finding "simple" nets to explain training data. The expectation is: the "simpler" the networks, the better the generalization on test data (! Occam's razor). Previous implementations, however, use measures for "simplicity" that lack the power, universality and elegance of those based on Kolmogorov complexity and Solomonoff's algorithmic probability. Likewise, most previous approaches (especially those of the "Bayesian" kind) suffer from the problem of choosing appropriate priors. This paper addresses both issues. It first reviews some basic concepts of algorithmic complexity theory relevant to machine learning, and how the SolomonoffLevin distribution (or universal prior) deals with the prior problem. The universal prior leads to a probabilistic method for finding "algorithmically simple" problem solutions with high generalization capability. The method is based on Levin complexity (a timebounded generalization of Kolmogorov comple...
Engineering Multiversion NeuralNet Systems
 NEURAL COMPUTATION
, 1995
"... In this paper we address the problem of constructing reliable neuralnet implementations, given the assumption that any particular implementation will not be totally correct. The approach taken in this paper is to organize the inevitable errors so as to minimize their impact in the context of a mult ..."
Abstract

Cited by 47 (6 self)
 Add to MetaCart
In this paper we address the problem of constructing reliable neuralnet implementations, given the assumption that any particular implementation will not be totally correct. The approach taken in this paper is to organize the inevitable errors so as to minimize their impact in the context of a multiversion system.  i.e. the system functionality is reproduced in multiple versions which together will constitute the neuralnet system. The unique characteristics of neural computing are exploited in order to engineer reliable systems in the form of diverse, multiversion systems which are used together with a `decision strategy' (such as majority vote). Theoretical notions of "methodological diversity" contributing to the improvement of system performance are implemented and tested. An important aspect of the engineering of an optimal system is to overproduce the components and then choose an optimal subset. Three general techniques for choosing final system components are implemented an...
Flat Minima
, 1997
"... this paper (available on the WorldWide Web; see our home pages) contains pseudocode of an efficient implementation. It is based on fast multiplication of the Hessian and a vector due to Pearlmutter (1994) and Mller (1993). Acknowledgments ..."
Abstract

Cited by 32 (14 self)
 Add to MetaCart
this paper (available on the WorldWide Web; see our home pages) contains pseudocode of an efficient implementation. It is based on fast multiplication of the Hessian and a vector due to Pearlmutter (1994) and Mller (1993). Acknowledgments
Combining Exploratory Projection Pursuit And Projection Pursuit Regression With Application To Neural Networks
 Neural Computation
, 1992
"... We present a novel classification and regression method that combines exploratory projection pursuit (unsupervised training) with projection pursuit regression (supervised training), to yield a new family of cost/complexity penalty terms. Some improved generalization properties are demonstrated on r ..."
Abstract

Cited by 17 (9 self)
 Add to MetaCart
We present a novel classification and regression method that combines exploratory projection pursuit (unsupervised training) with projection pursuit regression (supervised training), to yield a new family of cost/complexity penalty terms. Some improved generalization properties are demonstrated on real world problems. 1 Introduction Parameter estimation becomes difficult in highdimensional spaces due to the increasing sparseness of the data. Therefore, when a low dimensional representation is embedded in the data, dimensionality reduction methods become useful. One such method  projection pursuit regression (Friedman and Stuetzle, 1981) (PPR) is capable of performing dimensionality reduction by composition, namely, it constructs an approximation to the desired response function using a composition of lower dimensional smooth functions. These functions depend on low dimensional projections through the data. When the dimensionality of the problem is in the thousands, even projection...
Discovering Problem Solutions with Low Kolmogorov Complexity and High Generalization Capability
 MACHINE LEARNING: PROCEEDINGS OF THE TWELFTH INTERNATIONAL CONFERENCE
, 1994
"... Many machine learning algorithms aim at finding "simple" rules to explain training data. The expectation is: the "simpler" the rules, the better the generalization on test data (! Occam's razor). Most practical implementations, however, use measures for "simplicity" that lack the power, universality ..."
Abstract

Cited by 16 (8 self)
 Add to MetaCart
Many machine learning algorithms aim at finding "simple" rules to explain training data. The expectation is: the "simpler" the rules, the better the generalization on test data (! Occam's razor). Most practical implementations, however, use measures for "simplicity" that lack the power, universality and elegance of those based on Kolmogorov complexity and Solomonoff's algorithmic probability. Likewise, most previous approaches (especially those of the "Bayesian" kind) suffer from the problem of choosing appropriate priors. This paper addresses both issues. It first reviews some basic concepts of algorithmic complexity theory relevant to machine learning, and how the SolomonoffLevin distribution (or universal prior) deals with the prior problem. The universal prior leads to a probabilistic method for finding "algorithmically simple" problem solutions with high generalization capability. The method is based on Levin complexity (a timebounded generalization of Kolmogorov complexity) and...
Flat Minimum Search Finds Simple Nets
, 1994
"... We present a new algorithm for finding low complexity neural networks with high generalization capability. The algorithm searches for a "flat" minimum of the error function. A flat minimum is a large connected region in weightspace where the error remains approximately constant. An MDLbased argume ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
We present a new algorithm for finding low complexity neural networks with high generalization capability. The algorithm searches for a "flat" minimum of the error function. A flat minimum is a large connected region in weightspace where the error remains approximately constant. An MDLbased argument shows that flat minima correspond to low expected overfitting. Although our algorithm requires the computation of second order derivatives, it has backprop's order of complexity. Automatically, it effectively prunes units, weights, and input lines. Various experiments with feedforward and recurrent nets are described. In an application to stock market prediction, flat minimum search outperforms (1) conventional backprop, (2) weight decay, (3) "optimal brain surgeon" / "optimal brain damage".
Simplifying Neural Nets by Discovering Flat Minima
 Advances in Neural Information Processing Systems 7
, 1995
"... We present a new algorithm for finding low complexity networks with high generalization capability. The algorithm searches for large connected regions of socalled "fiat" minima of the error function. In the weightspace environment of a "fiat" minimum, the error remains approximately constant. U ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
We present a new algorithm for finding low complexity networks with high generalization capability. The algorithm searches for large connected regions of socalled "fiat" minima of the error function. In the weightspace environment of a "fiat" minimum, the error remains approximately constant. Using an MDLbased argument, fiat minima can be shown to correspond to low expected overfitting. Although our algorithm requires the computation of second order derivatives, it has backprop's order of complexity. Experiments with feedforward and recurrent nets are described.
Interpreting NeuralNetwork Models
, 1993
"... Artificial Neural Network seem very promising for regression and classification, especially for large covariate spaces. These methods represent a nonlinear function as a composition of low dimensional ridge functions and therefore appear to be less sensitive to the dimensionality of the covariate s ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Artificial Neural Network seem very promising for regression and classification, especially for large covariate spaces. These methods represent a nonlinear function as a composition of low dimensional ridge functions and therefore appear to be less sensitive to the dimensionality of the covariate space. It is possible to show that some of these methods extend the well known logistic regression model. In this paper we propose a method for model interpretation and prediction via neural networks that takes advantage of the possible high nonlinearity of the model. 1 Introduction Traditional statistical models are worthy in that they are interpretable, and in that the statistical properties of the estimators are known. This enables an analyst to explore data, in terms of model structure, to talk about robustness of a model under different samples, and to interpret the findings. On the other hand, previous work on health outcomes (e.g. response to medical treatment, death, etc.) have res...