Results 1 - 10
of
95
When Networks Disagree: Ensemble Methods for Hybrid Neural Networks
, 1993
"... This paper presents a general theoretical framework for ensemble methods of constructing significantly improved regression estimates. Given a population of regression estimators, we construct a hybrid estimator which is as good or better in the MSE sense than any estimator in the population. We argu ..."
Abstract
-
Cited by 267 (2 self)
- Add to MetaCart
This paper presents a general theoretical framework for ensemble methods of constructing significantly improved regression estimates. Given a population of regression estimators, we construct a hybrid estimator which is as good or better in the MSE sense than any estimator in the population. We argue that the ensemble method presented has several properties: 1) It efficiently uses all the networks of a population - none of the networks need be discarded. 2) It efficiently uses all the available data for training without over-fitting. 3) It inherently performs regularization by smoothing in functional space which helps to avoid over-fitting. 4) It utilizes local minima to construct improved estimates whereas other neural network algorithms are hindered by local minima. 5) It is ideally suited for parallel computation. 6) It leads to a very useful and natural measure of the number of distinct estimators in a population. 7) The optimal parameters of the ensemble estimator are given in clo...
Operations for Learning with Graphical Models
- Journal of Artificial Intelligence Research
, 1994
"... This paper is a multidisciplinary review of empirical, statistical learning from a graphical model perspective. Well-known examples of graphical models include Bayesian networks, directed graphs representing a Markov chain, and undirected networks representing a Markov field. These graphical models ..."
Abstract
-
Cited by 214 (13 self)
- Add to MetaCart
This paper is a multidisciplinary review of empirical, statistical learning from a graphical model perspective. Well-known examples of graphical models include Bayesian networks, directed graphs representing a Markov chain, and undirected networks representing a Markov field. These graphical models are extended to model data analysis and empirical learning using the notation of plates. Graphical operations for simplifying and manipulating a problem are provided including decomposition, differentiation, and the manipulation of probability models from the exponential family. Two standard algorithm schemas for learning are reviewed in a graphical framework: Gibbs sampling and the expectation maximization algorithm. Using these operations and schemas, some popular algorithms can be synthesized from their graphical specification. This includes versions of linear regression, techniques for feed-forward networks, and learning Gaussian and discrete Bayesian networks from data. The paper conclu...
Learning classification trees
- Statistics and Computing
, 1992
"... Algorithms for learning cIassification trees have had successes in ar-tificial intelligence and statistics over many years. This paper outlines how a tree learning algorithm can be derived using Bayesian statis-tics. This iutroduces Bayesian techniques for splitting, smoothing, and tree averaging. T ..."
Abstract
-
Cited by 112 (8 self)
- Add to MetaCart
Algorithms for learning cIassification trees have had successes in ar-tificial intelligence and statistics over many years. This paper outlines how a tree learning algorithm can be derived using Bayesian statis-tics. This iutroduces Bayesian techniques for splitting, smoothing, and tree averaging. The splitting rule is similar to QuinIan’s information gain, while smoothing and averaging replace pruning. Comparative ex-periments with reimplementations of a minimum encoding approach, Quinlan’s C4 (1987) and Breiman et aL’s CART (1984) show the full Bayesian algorithm produces more accurate predictions than versions
Nonlinear Gated Experts for Time Series: Discovering Regimes and Avoiding Overfitting
, 1995
"... this paper: ftp://ftp.cs.colorado.edu/pub/Time-Series/MyPapers/experts.ps.Z, ..."
Abstract
-
Cited by 74 (5 self)
- Add to MetaCart
this paper: ftp://ftp.cs.colorado.edu/pub/Time-Series/MyPapers/experts.ps.Z,
Improving Regression Estimation: Averaging Methods for Variance Reduction with Extensions to General Convex Measure Optimization
, 1993
"... ..."
Ensemble Learning using Decorrelated Neural Networks
- Connection Science
, 1996
"... We describe a decorrelation network training method for improving the quality of regression learning in "ensemble " neural networks that are composed of linear combinations of individual neural networks. In this method, individual networks are trained by backpropagation to not only reproduce a desir ..."
Abstract
-
Cited by 63 (0 self)
- Add to MetaCart
We describe a decorrelation network training method for improving the quality of regression learning in "ensemble " neural networks that are composed of linear combinations of individual neural networks. In this method, individual networks are trained by backpropagation to not only reproduce a desired output, but also to have their errors be linearly decorrelated with the other networks. Outputs from the individual networks are then linearly combined to produce the output of the ensemble network. We demonstrate the performances of decorrelated network training on learning the "3 Parity" logic function, a noisy sine function, and a one dimensional nonlinear function, and compare the results with the ensemble networks composed of independently trained individual networks (without decorrelation training). Empirical results show that when individual networks are forced to be decorrelated with one another the resulting ensemble neural networks have lower mean squared errors than the ensembl...
Learning from incomplete data
, 1994
"... Real-world learning tasks often involve high-dimensional data sets with complex patterns of missing features. In this paper we review the problem of learning from incomplete data from two statistical perspectives---the likelihood-based and the Bayesian. The goal is two-fold: to place current neura ..."
Abstract
-
Cited by 49 (0 self)
- Add to MetaCart
Real-world learning tasks often involve high-dimensional data sets with complex patterns of missing features. In this paper we review the problem of learning from incomplete data from two statistical perspectives---the likelihood-based and the Bayesian. The goal is two-fold: to place current neural network approaches to missing data within a statistical framework, and to describe a set of algorithms, derived from the likelihood-based framework, that handle clustering, classification, and function approximation from incomplete data in a principled and efficient manner. These algorithms are based on mixture modeling and maketwo distinct appeals to the Expectation-Maximization (EM) principle (Dempster et al., 1977)---both for the estimation of mixture components and for coping with the missing data.
Comparison of Approximate Methods for Handling Hyperparameters
- NEURAL COMPUTATION
"... I examine two approximate methods for computational implementation of Bayesian hierarchical models, that is, models which include unknown hyperparameters such as regularization constants and noise levels. In the 'evidence framework' the model parameters are integrated over, and the resulting evid ..."
Abstract
-
Cited by 49 (1 self)
- Add to MetaCart
I examine two approximate methods for computational implementation of Bayesian hierarchical models, that is, models which include unknown hyperparameters such as regularization constants and noise levels. In the 'evidence framework' the model parameters are integrated over, and the resulting evidence is maximized over the hyperparameters. The optimized
Constructive Algorithms for Structure Learning in Feedforward Neural Networks for Regression Problems
- IEEE Transactions on Neural Networks
, 1997
"... In this survey paper, we review the constructive algorithms for structure learning in feedforward neural networks for regression problems. The basic idea is to start with a small network, then add hidden units and weights incrementally until a satisfactory solution is found. By formulating the whole ..."
Abstract
-
Cited by 47 (2 self)
- Add to MetaCart
In this survey paper, we review the constructive algorithms for structure learning in feedforward neural networks for regression problems. The basic idea is to start with a small network, then add hidden units and weights incrementally until a satisfactory solution is found. By formulating the whole problem as a state space search, we first describe the general issues in constructive algorithms, with special emphasis on the search strategy. A taxonomy, based on the differences in the state transition mapping, the training algorithm and the network architecture, is then presented. Keywords--- Constructive algorithm, structure learning, state space search, dynamic node creation, projection pursuit regression, cascade-correlation, resource-allocating network, group method of data handling. I. Introduction A. Problems with Fixed Size Networks I N recent years, many neural network models have been proposed for pattern classification, function approximation and regression problems. Among...
Selecting Input Variables Using Mutual Information and Nonparametric Density Estimation
, 1996
"... In learning problems where a connectionist network is trained with a finite sized training set, better generalization performance is often obtained when unneeded weights in the network are eliminated. One source of unneeded weights comes from the inclusion of input variables that provide little info ..."
Abstract
-
Cited by 43 (2 self)
- Add to MetaCart
In learning problems where a connectionist network is trained with a finite sized training set, better generalization performance is often obtained when unneeded weights in the network are eliminated. One source of unneeded weights comes from the inclusion of input variables that provide little information about the output variables. We propose a method for identifying and eliminating these input variables. The method first determines the relationship between input and output variables using nonparametric density estimation and then measures the relevance of input variables using the information theoretic concept of mutual information. We present results from our method on a simple toy problem and a nonlinear time series. 1 INTRODUCTION Generalization performance on a fixed-size training set is closely related to the number of free parameters in a network. Selecting too many free parameters can lead to poor generalization performance (Baum & Haussler, 1989; Geman, Bienenstock, & Dours...

