Results 1  10
of
140
When Networks Disagree: Ensemble Methods for Hybrid Neural Networks
, 1993
"... This paper presents a general theoretical framework for ensemble methods of constructing significantly improved regression estimates. Given a population of regression estimators, we construct a hybrid estimator which is as good or better in the MSE sense than any estimator in the population. We argu ..."
Abstract

Cited by 347 (3 self)
 Add to MetaCart
(Show Context)
This paper presents a general theoretical framework for ensemble methods of constructing significantly improved regression estimates. Given a population of regression estimators, we construct a hybrid estimator which is as good or better in the MSE sense than any estimator in the population. We argue that the ensemble method presented has several properties: 1) It efficiently uses all the networks of a population  none of the networks need be discarded. 2) It efficiently uses all the available data for training without overfitting. 3) It inherently performs regularization by smoothing in functional space which helps to avoid overfitting. 4) It utilizes local minima to construct improved estimates whereas other neural network algorithms are hindered by local minima. 5) It is ideally suited for parallel computation. 6) It leads to a very useful and natural measure of the number of distinct estimators in a population. 7) The optimal parameters of the ensemble estimator are given in clo...
Operations for Learning with Graphical Models
 Journal of Artificial Intelligence Research
, 1994
"... This paper is a multidisciplinary review of empirical, statistical learning from a graphical model perspective. Wellknown examples of graphical models include Bayesian networks, directed graphs representing a Markov chain, and undirected networks representing a Markov field. These graphical models ..."
Abstract

Cited by 277 (13 self)
 Add to MetaCart
(Show Context)
This paper is a multidisciplinary review of empirical, statistical learning from a graphical model perspective. Wellknown examples of graphical models include Bayesian networks, directed graphs representing a Markov chain, and undirected networks representing a Markov field. These graphical models are extended to model data analysis and empirical learning using the notation of plates. Graphical operations for simplifying and manipulating a problem are provided including decomposition, differentiation, and the manipulation of probability models from the exponential family. Two standard algorithm schemas for learning are reviewed in a graphical framework: Gibbs sampling and the expectation maximization algorithm. Using these operations and schemas, some popular algorithms can be synthesized from their graphical specification. This includes versions of linear regression, techniques for feedforward networks, and learning Gaussian and discrete Bayesian networks from data. The paper conclu...
Learning classification trees
 Statistics and Computing
, 1992
"... Algorithms for learning cIassification trees have had successes in artificial intelligence and statistics over many years. This paper outlines how a tree learning algorithm can be derived using Bayesian statistics. This iutroduces Bayesian techniques for splitting, smoothing, and tree averaging. T ..."
Abstract

Cited by 145 (8 self)
 Add to MetaCart
Algorithms for learning cIassification trees have had successes in artificial intelligence and statistics over many years. This paper outlines how a tree learning algorithm can be derived using Bayesian statistics. This iutroduces Bayesian techniques for splitting, smoothing, and tree averaging. The splitting rule is similar to QuinIan’s information gain, while smoothing and averaging replace pruning. Comparative experiments with reimplementations of a minimum encoding approach, Quinlan’s C4 (1987) and Breiman et aL’s CART (1984) show the full Bayesian algorithm produces more accurate predictions than versions
Nonlinear Gated Experts for Time Series: Discovering Regimes and Avoiding Overfitting
, 1995
"... this paper: ftp://ftp.cs.colorado.edu/pub/TimeSeries/MyPapers/experts.ps.Z, ..."
Abstract

Cited by 108 (5 self)
 Add to MetaCart
this paper: ftp://ftp.cs.colorado.edu/pub/TimeSeries/MyPapers/experts.ps.Z,
Constructive Algorithms for Structure Learning in Feedforward Neural Networks for Regression Problems
 IEEE Transactions on Neural Networks
, 1997
"... In this survey paper, we review the constructive algorithms for structure learning in feedforward neural networks for regression problems. The basic idea is to start with a small network, then add hidden units and weights incrementally until a satisfactory solution is found. By formulating the whole ..."
Abstract

Cited by 87 (2 self)
 Add to MetaCart
(Show Context)
In this survey paper, we review the constructive algorithms for structure learning in feedforward neural networks for regression problems. The basic idea is to start with a small network, then add hidden units and weights incrementally until a satisfactory solution is found. By formulating the whole problem as a state space search, we first describe the general issues in constructive algorithms, with special emphasis on the search strategy. A taxonomy, based on the differences in the state transition mapping, the training algorithm and the network architecture, is then presented. Keywords Constructive algorithm, structure learning, state space search, dynamic node creation, projection pursuit regression, cascadecorrelation, resourceallocating network, group method of data handling. I. Introduction A. Problems with Fixed Size Networks I N recent years, many neural network models have been proposed for pattern classification, function approximation and regression problems. Among...
Comparison of Approximate Methods for Handling Hyperparameters
 NEURAL COMPUTATION
"... I examine two approximate methods for computational implementation of Bayesian hierarchical models, that is, models which include unknown hyperparameters such as regularization constants and noise levels. In the 'evidence framework' the model parameters are integrated over, and the resu ..."
Abstract

Cited by 87 (1 self)
 Add to MetaCart
I examine two approximate methods for computational implementation of Bayesian hierarchical models, that is, models which include unknown hyperparameters such as regularization constants and noise levels. In the 'evidence framework' the model parameters are integrated over, and the resulting evidence is maximized over the hyperparameters. The optimized
Improving Regression Estimation: Averaging Methods for Variance Reduction with Extensions to General Convex Measure Optimization
, 1993
"... ..."
Bayesian and Regularization Methods for Hyperparameter Estimation in Image Restoration
 IEEE Trans. Image Processing
, 1999
"... In this paper, we propose the application of the hierarchical Bayesian paradigm to the image restoration problem. We derive expressions for the iterative evaluation of the two hyperparameters applying the evidence and maximum a posteriori (MAP) analysis within the hierarchical Bayesian paradigm. We ..."
Abstract

Cited by 77 (28 self)
 Add to MetaCart
In this paper, we propose the application of the hierarchical Bayesian paradigm to the image restoration problem. We derive expressions for the iterative evaluation of the two hyperparameters applying the evidence and maximum a posteriori (MAP) analysis within the hierarchical Bayesian paradigm. We show analytically that the analysis provided by the evidence approach is more realistic and appropriate than the MAP approach for the image restoration problem. We furthermore study the relationship between the evidence and an iterative approach resulting from the set theoretic regularization approach for estimating the two hyperparameters, or their ratio, defined as the regularization parameter. Finally the proposed algorithms are tested experimentally.
Ensemble Learning using Decorrelated Neural Networks
 Connection Science
, 1996
"... We describe a decorrelation network training method for improving the quality of regression learning in "ensemble " neural networks that are composed of linear combinations of individual neural networks. In this method, individual networks are trained by backpropagation to not only reprodu ..."
Abstract

Cited by 77 (0 self)
 Add to MetaCart
(Show Context)
We describe a decorrelation network training method for improving the quality of regression learning in "ensemble " neural networks that are composed of linear combinations of individual neural networks. In this method, individual networks are trained by backpropagation to not only reproduce a desired output, but also to have their errors be linearly decorrelated with the other networks. Outputs from the individual networks are then linearly combined to produce the output of the ensemble network. We demonstrate the performances of decorrelated network training on learning the "3 Parity" logic function, a noisy sine function, and a one dimensional nonlinear function, and compare the results with the ensemble networks composed of independently trained individual networks (without decorrelation training). Empirical results show that when individual networks are forced to be decorrelated with one another the resulting ensemble neural networks have lower mean squared errors than the ensembl...
Learning from incomplete data
, 1994
"... Realworld learning tasks often involve highdimensional data sets with complex patterns of missing features. In this paper we review the problem of learning from incomplete data from two statistical perspectivesthe likelihoodbased and the Bayesian. The goal is twofold: to place current neura ..."
Abstract

Cited by 66 (0 self)
 Add to MetaCart
(Show Context)
Realworld learning tasks often involve highdimensional data sets with complex patterns of missing features. In this paper we review the problem of learning from incomplete data from two statistical perspectivesthe likelihoodbased and the Bayesian. The goal is twofold: to place current neural network approaches to missing data within a statistical framework, and to describe a set of algorithms, derived from the likelihoodbased framework, that handle clustering, classification, and function approximation from incomplete data in a principled and efficient manner. These algorithms are based on mixture modeling and maketwo distinct appeals to the ExpectationMaximization (EM) principle (Dempster et al., 1977)both for the estimation of mixture components and for coping with the missing data.