## Error Correlation And Error Reduction In Ensemble Classifiers (1996)

### Cached

### Download Links

- [ftp.lans.ece.utexas.edu]
- [web.engr.oregonstate.edu]
- DBLP

### Other Repositories/Bibliography

Citations: | 162 - 22 self |

### BibTeX

@MISC{Tumer96errorcorrelation,

author = {Kagan Tumer and Joydeep Ghosh},

title = {Error Correlation And Error Reduction In Ensemble Classifiers},

year = {1996}

}

### Years of Citing Articles

### OpenURL

### Abstract

Using an ensemble of classifiers, instead of a single classifier, can lead to improved generalization. The gains obtained by combining however, are often affected more by the selection of what is presented to the combiner, than by the actual combining method that is chosen. In this paper we focus on data selection and classifier training methods, in order to "prepare" classifiers for combining. We review a combining framework for classification problems that quantifies the need for reducing the correlation among individual classifiers. Then, we discuss several methods that make the classifiers in an ensemble more complementary. Experimental results are provided to illustrate the benefits and pitfalls of reducing the correlation among classifiers, especially when the training data is in limited supply. 2 1 Introduction A classifier's ability to meaningfully respond to novel patterns, or generalize, is perhaps its most important property (Levin et al., 1990; Wolpert, 1990). In...

### Citations

4204 |
Neural Networks, a comprehensive foundation
- Haykin
(Show Context)
Citation Context ... voting is used. 4.4 Weighted Experts The final method of correlation reduction that we present has a different flavor than the previous ones. The method is based on the mixture of experts framework (=-=Haykin, 1994-=-; Jacobs et al., 1991; Xu et al., 1995), where the output is a weighted sum of the outputs of individual networks or ”experts”. The weights are determined by a gating network, and are a function of th... |

2721 | Bagging Predictors
- Breiman
- 1996
(Show Context)
Citation Context ...nt classifiers are worth as much as N dependent classifiers (Jacobs, 1995). Breiman also addresses this issue, and discusses methods aimed at reducing the correlation among estimators (Breiman, 1993; =-=Breiman, 1996-=-). Krogh and Vedelsky discuss how cross-validation can be used to improve ensemble performance (Krogh and Vedelsby, 1995). The influence of the amount of training on ensemble performance is studied in... |

832 |
Adaptive mixtures of local experts
- Jacobs, Jordan, et al.
- 1991
(Show Context)
Citation Context ...d. 4.4 Weighted Experts The final method of correlation reduction that we present has a different flavor than the previous ones. The method is based on the mixture of experts framework (Haykin, 1994; =-=Jacobs et al., 1991-=-; Xu et al., 1995), where the output is a weighted sum of the outputs of individual networks or ”experts”. The weights are determined by a gating network, and are a function of the inputs. In a given ... |

811 |
Cross-Validatory Choice and Assessment of Statistical Predictions
- Stone
- 1974
(Show Context)
Citation Context ...he same classifier trained on the same training set are needed. 4.1 Combining k−1-of-k Trained Classifiers Cross-validation, a statistical method aimed at estimating the “true’ error (Friedman, 1994; =-=Stone, 1974-=-; Weiss and Kulikowski, 1991), provides a method for lowering correlations. In kfold cross-validation, the training set is divided into k subsets. Then, k−1 of these subsets are used to train the netw... |

766 |
The Jackknife, the Bootstrap and Other Resampling Plans
- Efron
- 1982
(Show Context)
Citation Context ...ng sets for each classifier by resampling the original set. This resampling method is called bootstrapping. It is generally used for estimating the true error rate for problems with very little data (=-=Efron, 1982-=-; Efron, 1983; Jain et al., 1987; Weiss and Kulikowski, 1991). Breiman first used this idea to improve the performance of predictors, and dubbed it “bagging” predictors (Breiman, 1996). For regression... |

543 |
Neural network ensembles
- Hansen, Salamon
- 1990
(Show Context)
Citation Context ... non-linear combiners using rank-based information (Ho et al., 1994; Al-Ghoneim and Vijaya Kumar, 1995), belief-based methods (Rogova, 1994; Yang and Singh, 1994; Xu et al., 1992), or voting schemes (=-=Hansen and Salamon, 1990-=-; Battiti and Colla, 1994). We have introduced “order statistics” combiners, and analyzed their properties (Tumer and Ghosh, 1995b; Tumer and Ghosh, 1995c). Wolpert introduced the concept of “stacking... |

413 | Neural network ensembles, cross validation, and active learning
- Krogh, Vedelsby
- 1995
(Show Context)
Citation Context ..., and discusses methods aimed at reducing the correlation among estimators (Breiman, 1993; Breiman, 1996). Krogh and Vedelsky discuss how cross-validation can be used to improve ensemble performance (=-=Krogh and Vedelsby, 1995-=-). The influence of the amount of training on ensemble performance is studied in (Sollich and Krogh, 1996), and the selection of individual classifier through a genetic algorithm is suggested in (Opit... |

399 |
Methods of combining multiple classifiers and their applications to handwriting recognition
- Xu, Krzyzak, et al.
- 1992
(Show Context)
Citation Context ...). Some researchers have investigated non-linear combiners using rank-based information (Ho et al., 1994; Al-Ghoneim and Vijaya Kumar, 1995), belief-based methods (Rogova, 1994; Yang and Singh, 1994; =-=Xu et al., 1992-=-), or voting schemes (Hansen and Salamon, 1990; Battiti and Colla, 1994). We have introduced “order statistics” combiners, and analyzed their properties (Tumer and Ghosh, 1995b; Tumer and Ghosh, 1995c... |

375 |
Computer Systems that Learn
- Weiss, M, et al.
- 1991
(Show Context)
Citation Context ...ifier trained on the same training set are needed. 4.1 Combining k−1-of-k Trained Classifiers Cross-validation, a statistical method aimed at estimating the “true’ error (Friedman, 1994; Stone, 1974; =-=Weiss and Kulikowski, 1991-=-), provides a method for lowering correlations. In kfold cross-validation, the training set is divided into k subsets. Then, k−1 of these subsets are used to train the network and results are tested o... |

340 |
Stacked regressions
- Breiman
- 1996
(Show Context)
Citation Context ...′ ≤ N independent classifiers are worth as much as N dependent classifiers (Jacobs, 1995). Breiman also addresses this issue, and discusses methods aimed at reducing the correlation among estimators (=-=Breiman, 1993-=-; Breiman, 1996). Krogh and Vedelsky discuss how cross-validation can be used to improve ensemble performance (Krogh and Vedelsby, 1995). The influence of the amount of training on ensemble performanc... |

330 | Decision combination in multiple classifier systems
- Ho, Hull, et al.
- 1994
(Show Context)
Citation Context ..., 1995c; Tumer and Ghosh, 1996), and regression problems (Perrone and Cooper, 1993a; Hashem and Schmeiser, 1993). Some researchers have investigated non-linear combiners using rank-based information (=-=Ho et al., 1994-=-; Al-Ghoneim and Vijaya Kumar, 1995), belief-based methods (Rogova, 1994; Yang and Singh, 1994; Xu et al., 1992), or voting schemes (Hansen and Salamon, 1990; Battiti and Colla, 1994). We have introdu... |

306 | When Networks Disagree: Ensemble Method for Neural Networks
- Perrone, Coopler
- 1993
(Show Context)
Citation Context ...sifiers are pooled before a decision is made. Currently, the most popular way of combining multiple classifiers is via simple averaging of the corresponding output values (Lincoln and Skrzypek, 1990; =-=Perrone and Cooper, 1993-=-b; Tumer and Ghosh, 1996). Weighted averaging has also been proposed, and different methods for computing the proper classifier weights have been studied (Benediktsson et al., 1994; Hashem and Schmeis... |

281 |
Neural Network Classifiers Estimate Bayesian a posteriori Probabilities
- Richard, Lippmann
- 1991
(Show Context)
Citation Context ...at are trained to minimize a cross-entropy or mean square error (MSE) function, given “one-of-L” desired output patterns, approximate the aposteriori probability densities of the corresponding class (=-=Richard and Lippmann, 1991-=-; Ruck et al., 1990). Therefore, the ith output unit of a one-of-L classifier network to a given input x can be 3smodeled as 1 : fi(x) =p(ci|x)+ηi(x) , (1) where p(ci|x) isthea posteriori probability ... |

152 |
Stacked generalization. Neural networks
- Wolpert
- 2002
(Show Context)
Citation Context ... analyzed their properties (Tumer and Ghosh, 1995b; Tumer and Ghosh, 1995c). Wolpert introduced the concept of “stacking” classifiers, allowing each stage to correct the mistakes of the previous one (=-=Wolpert, 1992-=-). Combiners have also been successfully applied to a multitude of real world problems (Baxt, 1992; Ghosh et al., 1996; Lee et al., 1991). Most research in this area focuses on finding the types of co... |

147 |
Methods for combining experts’ probability assessments
- Jacobs
- 1995
(Show Context)
Citation Context ...nd Ghosh, 1996). Weighted averaging has also been proposed, and different methods for computing the proper classifier weights have been studied (Benediktsson et al., 1994; Hashem and Schmeiser, 1993; =-=Jacobs, 1995-=-; Lincoln and Skrzypek, 1990). Such linear combining techniques have been mathematically analyzed both for classification (Tumer and Ghosh, 1995c; Tumer and Ghosh, 1996), and regression problems (Perr... |

138 |
Multisurface method of pattern separation for medical diagnosis applied to breast cytology
- Wolberg, Mangasarian
- 1990
(Show Context)
Citation Context ... the same notation as in the Proben1 benchmarks. 17sCANCER1 is based on breast cancer data, obtained from the University of Wisconsin Hospitals, from Dr. William H. Wolberg (Mangasarian et al., 1990; =-=Wolberg and Mangasarian, 1990-=-). This set has 9 inputs, 2 outputs and 699 patterns, of which 350 are used for training. GENE1 is based on intron/exon boundary detection, or the detection of splice junctions in DNA sequences (Noord... |

112 |
Combining the results of several neural network classifiers
- Rogova
(Show Context)
Citation Context ...r, 1993a; Hashem and Schmeiser, 1993). Some researchers have investigated non-linear combiners using rank-based information (Ho et al., 1994; Al-Ghoneim and Vijaya Kumar, 1995), belief-based methods (=-=Rogova, 1994-=-; Yang and Singh, 1994; Xu et al., 1992), or voting schemes (Hansen and Salamon, 1990; Battiti and Colla, 1994). We have introduced “order statistics” combiners, and analyzed their properties (Tumer a... |

110 | Generating accurate and diverse members of a neural-network ensemble
- Opitz, Shavlik
- 1996
(Show Context)
Citation Context ...1995). The influence of the amount of training on ensemble performance is studied in (Sollich and Krogh, 1996), and the selection of individual classifier through a genetic algorithm is suggested in (=-=Opitz and Shavlik, 1996-=-). For classification problems, the influence of the correlation among the classifiers on the error rate of multiple classifiers was quantified by Tumer and Ghosh (Tumer and Ghosh, 1995c; Tumer and Gh... |

105 |
PROBEN1 – A Set of Benchmarks and Benchmarking Rules for Neural Network Training Algorithms, Universitaet Karlsruhe
- Prechelt
- 1994
(Show Context)
Citation Context ...imental results on one difficult data set, outlining all the relevant design steps/parameters. Then we will summarize results on some other data sets taken from the UCI depository/Proben1 benchmarks (=-=Prechelt, 1994-=-), and discuss the implications of those results. 9s5.1 Underwater Sonar Data In order to examine the benefits of combining and the effect of correlation on combining results, we use a difficult data ... |

91 |
The multilayer perceptron as an approximation to a bayes optimal discriminant function
- Ruck, Rogers, et al.
- 1990
(Show Context)
Citation Context ... cross-entropy or mean square error (MSE) function, given “one-of-L” desired output patterns, approximate the aposteriori probability densities of the corresponding class (Richard and Lippmann, 1991; =-=Ruck et al., 1990-=-). Therefore, the ith output unit of a one-of-L classifier network to a given input x can be 3smodeled as 1 : fi(x) =p(ci|x)+ηi(x) , (1) where p(ci|x) isthea posteriori probability distribution of the... |

86 |
Democracy in neural nets: voting schemes for classification
- Battiti, Colla
- 1994
(Show Context)
Citation Context ...g rank-based information (Ho et al., 1994; Al-Ghoneim and Vijaya Kumar, 1995), belief-based methods (Rogova, 1994; Yang and Singh, 1994; Xu et al., 1992), or voting schemes (Hansen and Salamon, 1990; =-=Battiti and Colla, 1994-=-). We have introduced “order statistics” combiners, and analyzed their properties (Tumer and Ghosh, 1995b; Tumer and Ghosh, 1995c). Wolpert introduced the concept of “stacking” classifiers, allowing e... |

79 | Analysis of decision boundaries in linearly combined neural classifiers
- Tumer, Ghosh
- 1996
(Show Context)
Citation Context ... two posteriors. For analyzing the error regions after combining, and comparing them to the single classifier case, one needs to determine the variance of the boundary obtained with the combiner. In (=-=Tumer and Ghosh, 1996-=-), we show that when the classifier errors are i.i.d., combining reduces the added error by N, orthatE ave add = 1 N Eadd. In the next section we derive the added error of a combiner when the assumpti... |

78 |
Boosting and other ensemble methods
- Drucker, Cortes, et al.
- 1994
(Show Context)
Citation Context ...rror reductions in the context of decision trees (Ali and Pazzani, 1995). The Boosting algorithm trains subsequent classifiers on training patterns that 2shave been “selected” by earlier classifiers (=-=Drucker et al., 1994-=-), thus reducing the correlation among them. However, one can quickly run out of training data in practice if this approach is used. Twomey and Smith discuss combining and resampling in the context of... |

77 | Prediction risk and architecture selection for neural networks
- Moody
- 1993
(Show Context)
Citation Context ...trained on the full (and identical) training set. We will call this “k−1-of-k” training. Note that if cross-validation is anyway being used to determine when to stop training for best generalization (=-=Moody, 1994-=-), we already get k trained classifiers, i.e. the extra overhead of combining is very little (Lippmann, 1995). 4.2 Input Decimation Combining Another approach to reducing the correlation of classifier... |

73 | Pattern recognition via linear programming: Theory and application to medical diagnosis
- Mangasarian, Setiono, et al.
- 1990
(Show Context)
Citation Context ...94-21.ps.Z. 5 We are using the same notation as in the Proben1 benchmarks. 17sCANCER1 is based on breast cancer data, obtained from the University of Wisconsin Hospitals, from Dr. William H. Wolberg (=-=Mangasarian et al., 1990-=-; Wolberg and Mangasarian, 1990). This set has 9 inputs, 2 outputs and 699 patterns, of which 350 are used for training. GENE1 is based on intron/exon boundary detection, or the detection of splice ju... |

71 |
An overview of predictive learning and function approximation
- Friedman
- 1994
(Show Context)
Citation Context ...N instances of the same classifier trained on the same training set are needed. 4.1 Combining k−1-of-k Trained Classifiers Cross-validation, a statistical method aimed at estimating the “true’ error (=-=Friedman, 1994-=-; Stone, 1974; Weiss and Kulikowski, 1991), provides a method for lowering correlations. In kfold cross-validation, the training set is divided into k subsets. Then, k−1 of these subsets are used to t... |

70 |
Training knowledge-based neural networks to recognize genes in DNA sequences
- Noordewier, Towell, et al.
- 1991
(Show Context)
Citation Context ... 1990). This set has 9 inputs, 2 outputs and 699 patterns, of which 350 are used for training. GENE1 is based on intron/exon boundary detection, or the detection of splice junctions in DNA sequences (=-=Noordewier et al., 1991-=-; Towell and Shavlik, 1992). 120 inputs are used to determine whether a DNA section is a donor, an acceptor or neither. There are 3175 examples, of which 1588 are used for training. The GLASS1 data se... |

67 | An Alternative Model for Mixtures of Experts
- Xu, Jordan, et al.
- 1995
(Show Context)
Citation Context ...ts The final method of correlation reduction that we present has a different flavor than the previous ones. The method is based on the mixture of experts framework (Haykin, 1994; Jacobs et al., 1991; =-=Xu et al., 1995-=-), where the output is a weighted sum of the outputs of individual networks or ”experts”. The weights are determined by a gating network, and are a function of the inputs. In a given region of the inp... |

61 |
Synergy of clustering multiple back propagation networks
- Lincoln, Skrzypek
- 1990
(Show Context)
Citation Context ... the outputs of several classifiers are pooled before a decision is made. Currently, the most popular way of combining multiple classifiers is via simple averaging of the corresponding output values (=-=Lincoln and Skrzypek, 1990-=-; Perrone and Cooper, 1993b; Tumer and Ghosh, 1996). Weighted averaging has also been proposed, and different methods for computing the proper classifier weights have been studied (Benediktsson et al.... |

57 | Learning with ensembles: how overfitting can be useful
- Sollich, Krogh
- 1996
(Show Context)
Citation Context ... Krogh and Vedelsky discuss how cross-validation can be used to improve ensemble performance (Krogh and Vedelsby, 1995). The influence of the amount of training on ensemble performance is studied in (=-=Sollich and Krogh, 1996-=-), and the selection of individual classifier through a genetic algorithm is suggested in (Opitz and Shavlik, 1996). For classification problems, the influence of the correlation among the classifiers... |

57 | Interpretation of Artificial Neural Networks
- Towell, Shavlik
- 1992
(Show Context)
Citation Context ...puts, 2 outputs and 699 patterns, of which 350 are used for training. GENE1 is based on intron/exon boundary detection, or the detection of splice junctions in DNA sequences (Noordewier et al., 1991; =-=Towell and Shavlik, 1992-=-). 120 inputs are used to determine whether a DNA section is a donor, an acceptor or neither. There are 3175 examples, of which 1588 are used for training. The GLASS1 data set is based on the chemical... |

56 |
A statistical approach to learning and generalization in layered neural networks
- Levin, Tishby, et al.
- 1989
(Show Context)
Citation Context ...CS 9307632, and ARO contracts DAAH 04-94-G0417 and 04-95-10494.s1 Introduction A classifier’s ability to meaningfully respond to novel patterns, or generalize, is perhaps its most important property (=-=Levin et al., 1990-=-; Wolpert, 1990). In general however, the generalization is not unique, and different classifiers provide different generalizations by realizing different decision boundaries (Ghosh and Tumer, 1994). ... |

41 |
Bootstrap techniques for error estimation
- Jain, Dubes, et al.
- 1987
(Show Context)
Citation Context ...r by resampling the original set. This resampling method is called bootstrapping. It is generally used for estimating the true error rate for problems with very little data (Efron, 1982; Efron, 1983; =-=Jain et al., 1987-=-; Weiss and Kulikowski, 1991). Breiman first used this idea to improve the performance of predictors, and dubbed it “bagging” predictors (Breiman, 1996). For regression problems, bagging uses the aver... |

37 |
Improving the accuracy of an artificial neural network using multiple differently trained networks
- Baxt
- 1992
(Show Context)
Citation Context ...oncept of “stacking” classifiers, allowing each stage to correct the mistakes of the previous one (Wolpert, 1992). Combiners have also been successfully applied to a multitude of real world problems (=-=Baxt, 1992-=-; Ghosh et al., 1996; Lee et al., 1991). Most research in this area focuses on finding the types of combiners that improve performance. Yet, it is important to note that if the classifiers to be combi... |

33 | Structural adaptation and generalization in supervised feedforward networks
- Gosh, Tumer
- 1994
(Show Context)
Citation Context ...erty (Levin et al., 1990; Wolpert, 1990). In general however, the generalization is not unique, and different classifiers provide different generalizations by realizing different decision boundaries (=-=Ghosh and Tumer, 1994-=-). For example, when classification is performed using a multilayered, feed-forward artificial neural network, different weight initializations, or different architectures (number of hidden units, hid... |

30 | Theoretical foundations of linear and order statistics combiners for neural pattern classifiers
- Tumer, Ghosh
- 1996
(Show Context)
Citation Context ... studied (Benediktsson et al., 1994; Hashem and Schmeiser, 1993; Jacobs, 1995; Lincoln and Skrzypek, 1990). Such linear combining techniques have been mathematically analyzed both for classification (=-=Tumer and Ghosh, 1995-=-c; Tumer and Ghosh, 1996), and regression problems (Perrone and Cooper, 1993a; Hashem and Schmeiser, 1993). Some researchers have investigated non-linear combiners using rank-based information (Ho et ... |

25 | On the link between error correlation and error reduction in decision tree ensembles
- Ah, pazzani
- 1995
(Show Context)
Citation Context ...re weakened if the networks are not independent (Perrone and Cooper, 1993b). Ali and Pazzani discuss the relationship between error correlations and error reductions in the context of decision trees (=-=Ali and Pazzani, 1995-=-). The Boosting algorithm trains subsequent classifiers on training patterns that 2shave been “selected” by earlier classifiers (Drucker et al., 1994), thus reducing the correlation among them. Howeve... |

24 | Approximating a function and its derivatives using MSE-optimal linear combinations of trained feedforward neural networks
- Hashem, Schmeiser
- 1993
(Show Context)
Citation Context ...e and Cooper, 1993b; Tumer and Ghosh, 1996). Weighted averaging has also been proposed, and different methods for computing the proper classifier weights have been studied (Benediktsson et al., 1994; =-=Hashem and Schmeiser, 1993-=-; Jacobs, 1995; Lincoln and Skrzypek, 1990). Such linear combining techniques have been mathematically analyzed both for classification (Tumer and Ghosh, 1995c; Tumer and Ghosh, 1996), and regression ... |

21 |
A Mathematical Theory of Generalization
- Wolpert
- 1989
(Show Context)
Citation Context ...contracts DAAH 04-94-G0417 and 04-95-10494.s1 Introduction A classifier’s ability to meaningfully respond to novel patterns, or generalize, is perhaps its most important property (Levin et al., 1990; =-=Wolpert, 1990-=-). In general however, the generalization is not unique, and different classifiers provide different generalizations by realizing different decision boundaries (Ghosh and Tumer, 1994). For example, wh... |

18 | Evidence combination techniques for robust classification of short-duration oceanic signals
- Ghosh, Beck, et al.
- 1992
(Show Context)
Citation Context ...ts, we use a difficult data set extracted from underwater acoustic signals. From the original passive sonar returns from four different underwater objects, a 25-dimensional feature set was extracted (=-=Ghosh et al., 1992-=-; Ghosh et al., 1996). Each patterns consists of 16 Gabor wavelet coefficients, 8 temporal descriptors and spectral measurements and 1 value denoting signal duration. There were 496 patterns in the tr... |

15 | Integration of neural classifiers for passive sonar signals
- Ghosh, Tumer, et al.
- 1996
(Show Context)
Citation Context ...tacking” classifiers, allowing each stage to correct the mistakes of the previous one (Wolpert, 1992). Combiners have also been successfully applied to a multitude of real world problems (Baxt, 1992; =-=Ghosh et al., 1996-=-; Lee et al., 1991). Most research in this area focuses on finding the types of combiners that improve performance. Yet, it is important to note that if the classifiers to be combined repeatedly provi... |

15 | Order statistics combiners for neural classifiers
- Tumer, Ghosh
- 1995
(Show Context)
Citation Context ...e.utexas.edu. This data set was selected because: • The classification task is reasonably complex; • The input dimensionality is high; • We have an estimate of the Bayes error rate (Ebay � 3.61, see (=-=Tumer and Ghosh, 1995-=-a)), and thus a yardstick to measure classifier performance. • The number of training patterns is moderate, allowing various methods to be tested without biasing the experiments towards highly data in... |

12 | Learning ranks with neural networks (Invited paper - Al-Ghoneim, Kumar - 1995 |

12 |
Learning from what’s been learned: Supervised learning in multi-neural network systems
- Perrone, Cooper
(Show Context)
Citation Context ...sifiers are pooled before a decision is made. Currently, the most popular way of combining multiple classifiers is via simple averaging of the corresponding output values (Lincoln and Skrzypek, 1990; =-=Perrone and Cooper, 1993-=-b; Tumer and Ghosh, 1996). Weighted averaging has also been proposed, and different methods for computing the proper classifier weights have been studied (Benediktsson et al., 1994; Hashem and Schmeis... |

10 |
Integration of neural networks and decision tree classifiers for automated cytology screening
- Lee, Hwang, et al.
- 1991
(Show Context)
Citation Context ..., allowing each stage to correct the mistakes of the previous one (Wolpert, 1992). Combiners have also been successfully applied to a multitude of real world problems (Baxt, 1992; Ghosh et al., 1996; =-=Lee et al., 1991-=-). Most research in this area focuses on finding the types of combiners that improve performance. Yet, it is important to note that if the classifiers to be combined repeatedly provide the same (eithe... |

9 |
Estimating the error rate of a prediction rule
- Efron
- 1983
(Show Context)
Citation Context ...ach classifier by resampling the original set. This resampling method is called bootstrapping. It is generally used for estimating the true error rate for problems with very little data (Efron, 1982; =-=Efron, 1983-=-; Jain et al., 1987; Weiss and Kulikowski, 1991). Breiman first used this idea to improve the performance of predictors, and dubbed it “bagging” predictors (Breiman, 1996). For regression problems, ba... |

8 | Advances in using Hierarchical Mixture of Experts for Signal Classification
- Ramamurti, Ghosh
- 1996
(Show Context)
Citation Context ... typical mixture of experts, each individual network is single layered (Haykin, 1994), and thus has limited capabilities. Consequently, a large number of experts may be needed for realistic problems (=-=Ramamurti and Ghosh, 1996-=-). To make a fair comparison with the training and combining schemes analyzed earlier, we use a smaller number of more powerful experts, namely MLP or RBF networks. The localized network of (Xu et al.... |

7 |
Parallel consensual neural networks with optimally weighted outputs
- Benediktsson, Sveinsson, et al.
- 1994
(Show Context)
Citation Context ... and Skrzypek, 1990; Perrone and Cooper, 1993b; Tumer and Ghosh, 1996). Weighted averaging has also been proposed, and different methods for computing the proper classifier weights have been studied (=-=Benediktsson et al., 1994-=-; Hashem and Schmeiser, 1993; Jacobs, 1995; Lincoln and Skrzypek, 1990). Such linear combining techniques have been mathematically analyzed both for classification (Tumer and Ghosh, 1995c; Tumer and G... |

3 |
Bayes error rate estimation through classifier combining
- Tumer, Ghosh
- 1995
(Show Context)
Citation Context ... studied (Benediktsson et al., 1994; Hashem and Schmeiser, 1993; Jacobs, 1995; Lincoln and Skrzypek, 1990). Such linear combining techniques have been mathematically analyzed both for classification (=-=Tumer and Ghosh, 1995-=-c; Tumer and Ghosh, 1996), and regression problems (Perrone and Cooper, 1993a; Hashem and Schmeiser, 1993). Some researchers have investigated non-linear combiners using rank-based information (Ho et ... |

2 |
Bias, variance, and the combination of estimators; the case of least linear squares
- Meir
- 1995
(Show Context)
Citation Context ...oach is used. Twomey and Smith discuss combining and resampling in the context of a 1-d regression problem (Twomey and Smith, 1995). Meir discusses the effect of independence on combiner performance (=-=Meir, 1995-=-), and Jacobs reports that N ′ ≤ N independent classifiers are worth as much as N dependent classifiers (Jacobs, 1995). Breiman also addresses this issue, and discusses methods aimed at reducing the c... |