## Classifier Combining: Analytical Results and Implications (1995)

Venue: | In Proceedings of the AAAI-96 Workshop on Integrating Multiple Learned Models for Improving and Scaling Machine Learning Algorithms |

Citations: | 16 - 0 self |

### BibTeX

@INPROCEEDINGS{Tumer95classifiercombining:,

author = {Kagan Tumer and Joydeep Ghosh},

title = {Classifier Combining: Analytical Results and Implications},

booktitle = {In Proceedings of the AAAI-96 Workshop on Integrating Multiple Learned Models for Improving and Scaling Machine Learning Algorithms},

year = {1995},

pages = {126--132},

publisher = {AAAI Press}

}

### OpenURL

### Abstract

Several researchers have experimentally shown that substantial improvements can be obtained in difficult pattern recognition problems by combining or integrating the outputs of multiple classifiers. This paper summarizes our recent theoretical results that quantify the improvements due to multiple classifier combining. Furthermore, we present an extension of this theory that leads to an estimate of the Bayes error rate. Practical aspects such as expressing the confidences in decisions and determining the best data partition/classifier selection are also discussed. Keywords: Linear combining, order statistics combining, Bayes error, error correlation, error reduction, ensemble networks, performance limits. Introduction Given infinite training data, consistent classifiers approximate the Bayesian decision boundaries to arbitrary precision, therefore providing similar generalizations (Geman, Bienenstock, & Doursat 1992). However, often only a limited portion of the pattern space is avai...

### Citations

3921 |
Pattern Classification and Scene Analysis
- Duda, Hart
- 1973
(Show Context)
Citation Context ...daries to arbitrary precision, therefore providing similar generalizations (Geman, Bienenstock, & Doursat 1992). However, often only a limited portion of the pattern space is available or observable (=-=Duda & Hart 1973-=-; Fukunaga 1990). Given a finite and noisy data set, different classifiers typically provide different generalizations by realizing different decision boundaries (Ghosh & Tumer 1994). For example, whe... |

2649 |
Introduction to Statistical Pattern Recognition”, 2nd edition
- Fukunaga
- 1990
(Show Context)
Citation Context ...y precision, therefore providing similar generalizations (Geman, Bienenstock, & Doursat 1992). However, often only a limited portion of the pattern space is available or observable (Duda & Hart 1973; =-=Fukunaga 1990-=-). Given a finite and noisy data set, different classifiers typically provide different generalizations by realizing different decision boundaries (Ghosh & Tumer 1994). For example, when classificatio... |

2492 | Bagging predictors
- Breiman
- 1996
(Show Context)
Citation Context ...erent partition of the data, the correlation among them is reduced, and combining the k classifiers provides the final decision. An alternative to this method is found in bootstrapping and combining (=-=Breiman 1994-=-; Tumer & Ghosh 1996b). In this scheme training sets are generated by resampling the original set with substitution, leading to classifiers trained on different data partitions. In our experiments we ... |

893 |
Nearest neighbor pattern classification
- Cover, Hart
- 1967
(Show Context)
Citation Context ...ese bounds can be recursively extended to multi-class problems (Garber & Djouadi 1988). A non-parametric method based on the nearest neighbor classifier also provides also bounds for the Bayes error (=-=Cover & Hart 1967-=-; Fukunaga 1985). An alternative method that estimates the Bayes error is based on the classifier combining strategy (Tumer Table 2: Bayes Error Estimates for Artificial Data. Data Actual Combiner Nea... |

645 |
Pattern recognition: A statistical approach
- Devijver, Kittler
- 1982
(Show Context)
Citation Context ...uestion that arises of course, is how to determine the optimum classification rate. Since the Bayes decision provides the lowest error rates, the problem is equivalent to determining the Bayes error (=-=Devijver & Kittler 1982-=-; Schalkoff 1992; Young & Calvert 1974), which can be expressed as (Fukunaga 1990; Garber & Djouadi 1988): E bay = 1 \Gamma L X i=1 Z C i P (c i )p(xjc i )dx (9) where C i is the region where class i ... |

612 |
The jackknife, the bootstrap, and other resampling plans
- Efron
- 1982
(Show Context)
Citation Context ..., where the combiner provides a decision different than either of the individual classifiers. Statistical methods such as cross-validation (Friedman 1994; Weiss & Kulikowski 1991), and bootstrapping (=-=Efron 1982; Jain, Du-=-bes, & Chen 1987) that aim at deducing the "true error" of a classifier can also be used to artificially reduce the correlation among classifiers. For example, in (Tumer & Ghosh 1996b), we d... |

381 |
Order Statistics
- David, Nagaraja
- 2003
(Show Context)
Citation Context ... (x)s\Delta \Delta \Deltasf N :N i (x): The kth order statistic, which will be denoted f k:N i (x), is the kth value in this progression. An excellent introduction to order statistics is provided in (=-=David 1970-=-). Now, let us define the max, med and min combiners as follows: f max i (x) = f N :N i (x); (5) f med i (x) = 8 ! : f N 2 :N i (x) + f N 2 +1:N i (x) 2 (N even) f N+1 2 :N i (x) (N odd); (6) f min i ... |

368 | Methods of combining multiple classifiers and their applications to handwriting recognition - Xu, Krzyzak, et al. - 1992 |

365 |
Computer Systems that Learn
- Weiss, Kulikowski
- 1995
(Show Context)
Citation Context ...cally possible to have a 0confidence decision, where the combiner provides a decision different than either of the individual classifiers. Statistical methods such as cross-validation (Friedman 1994; =-=Weiss & Kulikowski 1991), and boo-=-tstrapping (Efron 1982; Jain, Dubes, & Chen 1987) that aim at deducing the "true error" of a classifier can also be used to artificially reduce the correlation among classifiers. For example... |

313 | Decision combination in multiple classifier systems - Ho, Hull, et al. - 1994 |

307 |
Stacked regressions
- Breiman
- 1996
(Show Context)
Citation Context ...ucing the improvements due to each extra classifier. This has been recently observed by some researchers, such as Breiman, who developed bootstrap methods for achieving independence among estimators (=-=Breiman 1993-=-; 1994), and by Jacobs (Jacobs 1995). Acknowledgements: This research was supported in part by AFOSR contract F49620-93-1-0307, NSF grant ECS 9307632, and ARO contracts DAAH 04-94-G0417 and 04-95-1049... |

290 | When networks disagree: Ensemble methods for hybrid neural networks
- Perrone, Cooper
- 1993
(Show Context)
Citation Context ... combining has been studied in the connectionist framework in several forms, including "stacked generalization" (Wolpert 1992), and ensemble methods (Hampshire & Waibel 1992; Lincoln & Skrzy=-=pek 1990; Perrone & Cooper 1993; Tumer & Ghosh 1996-=-c). Concepts such as "weighted averaging" (Benediktsson et al. 1994; Hashem & Schmeiser 1993; Jacobs 1995; Lincoln & Skrzypek 1990), "rank based combining" (Al-Ghoneim & Vijaya Kum... |

267 |
Neural network classifiers estimate Bayesian a posteriori probabiüties
- D, Lippmann
- 1991
(Show Context)
Citation Context ...s. Finally, we discuss the implications of these results. Linear Combining Consider a reasonably well trained classifier, whose outputs approximate the corresponding a posteriori class probabilities (=-=Richard & Lippmann 1991-=-). The Class i Class j x f (x) i f (x) j i p (c |x) p (c |x) j x * b x b Optimum Boundary Obtained Boundary Figure 1: Error regions associated with approximating the a posteriori probabilities. output... |

171 |
Pattern Recognition: Statistical, Structural and Neural Approaches
- Schalkoff
- 1992
(Show Context)
Citation Context ...urse, is how to determine the optimum classification rate. Since the Bayes decision provides the lowest error rates, the problem is equivalent to determining the Bayes error (Devijver & Kittler 1982; =-=Schalkoff 1992-=-; Young & Calvert 1974), which can be expressed as (Fukunaga 1990; Garber & Djouadi 1988): E bay = 1 \Gamma L X i=1 Z C i P (c i )p(xjc i )dx (9) where C i is the region where class i has the highest ... |

155 | Error correlation and error reduction in ensemble classifiers
- Tumer, Ghosh
(Show Context)
Citation Context ...died in the connectionist framework in several forms, including "stacked generalization" (Wolpert 1992), and ensemble methods (Hampshire & Waibel 1992; Lincoln & Skrzypek 1990; Perrone & Coo=-=per 1993; Tumer & Ghosh 1996c). Concepts such as-=- "weighted averaging" (Benediktsson et al. 1994; Hashem & Schmeiser 1993; Jacobs 1995; Lincoln & Skrzypek 1990), "rank based combining" (Al-Ghoneim & Vijaya Kumar 1995; Ho, Hull, &... |

140 |
Methods of combining experts’ probability assessments
- Jacobs
- 1995
(Show Context)
Citation Context ...emble methods (Hampshire & Waibel 1992; Lincoln & Skrzypek 1990; Perrone & Cooper 1993; Tumer & Ghosh 1996c). Concepts such as "weighted averaging" (Benediktsson et al. 1994; Hashem & Schmei=-=ser 1993; Jacobs 1995; Lincoln & Skrzypek-=- 1990), "rank based combining" (Al-Ghoneim & Vijaya Kumar 1995; Ho, Hull, & Srihari 1994), "belief based combining" (Rogova 1994; Xu, Krzyzak, & Suen 1992; Yang & Singh 1994) have ... |

131 | A First Course in Order Statistics - Arnold, Balakrishnan, et al. - 1993 |

105 |
Combining the results of several neural network classifiers
- Rogova
- 1994
(Show Context)
Citation Context ...(Benediktsson et al. 1994; Hashem & Schmeiser 1993; Jacobs 1995; Lincoln & Skrzypek 1990), "rank based combining" (Al-Ghoneim & Vijaya Kumar 1995; Ho, Hull, & Srihari 1994), "belief bas=-=ed combining" (Rogova 1994-=-; Xu, Krzyzak, & Suen 1992; Yang & Singh 1994) have also been analyzed. Combining has also been studied in other fields such as econometrics (Granger 1989), and machine learning (Barnett 1981; Garvey,... |

95 | Neural networks and the bias/variance dilemma. Neural Comp - Geman, Bienenstock, et al. - 1992 |

75 | Analysis of decision boundaries in linearly combined neural classifiers
- Tumer, Ghosh
- 1996
(Show Context)
Citation Context ...died in the connectionist framework in several forms, including "stacked generalization" (Wolpert 1992), and ensemble methods (Hampshire & Waibel 1992; Lincoln & Skrzypek 1990; Perrone & Coo=-=per 1993; Tumer & Ghosh 1996c). Concepts such as-=- "weighted averaging" (Benediktsson et al. 1994; Hashem & Schmeiser 1993; Jacobs 1995; Lincoln & Skrzypek 1990), "rank based combining" (Al-Ghoneim & Vijaya Kumar 1995; Ho, Hull, &... |

74 | Hybrid System for Protein Secondary Structure Prediction - Zhang - 1992 |

68 |
An overview of predictive learning and function approximation. In: From Statistics to Neural Networks: Theory and. Pattern recognition Applications
- Friedman
- 1995
(Show Context)
Citation Context ... it is theoretically possible to have a 0confidence decision, where the combiner provides a decision different than either of the individual classifiers. Statistical methods such as cross-validation (=-=Friedman 1994; Weiss & -=-Kulikowski 1991), and bootstrapping (Efron 1982; Jain, Dubes, & Chen 1987) that aim at deducing the "true error" of a classifier can also be used to artificially reduce the correlation among... |

64 | Computational methods for a mathematical theory of evidence
- Barnett
(Show Context)
Citation Context ...mbining" (Rogova 1994; Xu, Krzyzak, & Suen 1992; Yang & Singh 1994) have also been analyzed. Combining has also been studied in other fields such as econometrics (Granger 1989), and machine learn=-=ing (Barnett 1981-=-; Garvey, Lowrance, & Fischler 1981). This paper summarizes our recent theoretical results that quantify the improvements due to multiple classifier combining. First, we summarize linear combiners, an... |

59 |
Synergy of clustering multiple backpropagation networks
- Lincoln, Skrzypek
- 1990
(Show Context)
Citation Context ... patterns. The concept of combining has been studied in the connectionist framework in several forms, including "stacked generalization" (Wolpert 1992), and ensemble methods (Hampshire & Wai=-=bel 1992; Lincoln & Skrzypek 1990; Perrone & Cooper 1-=-993; Tumer & Ghosh 1996c). Concepts such as "weighted averaging" (Benediktsson et al. 1994; Hashem & Schmeiser 1993; Jacobs 1995; Lincoln & Skrzypek 1990), "rank based combining" (... |

39 | An inference technique for integrating knowledge from disparate sources. These proceedings - Garvey, Lowrance, et al. |

39 | Bootstrap Techniques for Error Estimation - Jain, Dubes, et al. - 1987 |

31 | Structural adaptation and generalization in supervised feedforward networks
- Ghosh, Tumer
- 1994
(Show Context)
Citation Context ...lable or observable (Duda & Hart 1973; Fukunaga 1990). Given a finite and noisy data set, different classifiers typically provide different generalizations by realizing different decision boundaries (=-=Ghosh & Tumer 1994-=-). For example, when classification is performed using a multilayered, feed-forward artificial neural network, varying weight initializations, or network architectures (number of hidden units, hidden ... |

29 | Theoretical foundations of linear and order statistics combiners for neural pattern classifiers
- Tumer, Ghosh
- 1995
(Show Context)
Citation Context ...died in the connectionist framework in several forms, including "stacked generalization" (Wolpert 1992), and ensemble methods (Hampshire & Waibel 1992; Lincoln & Skrzypek 1990; Perrone & Coo=-=per 1993; Tumer & Ghosh 1996c). Concepts such as-=- "weighted averaging" (Benediktsson et al. 1994; Hashem & Schmeiser 1993; Jacobs 1995; Lincoln & Skrzypek 1990), "rank based combining" (Al-Ghoneim & Vijaya Kumar 1995; Ho, Hull, &... |

23 | Approximating a function and its derivatives using MSE-optimal linear combinations of trained feedforward neural networks
- Hashem, B
(Show Context)
Citation Context ..." (Wolpert 1992), and ensemble methods (Hampshire & Waibel 1992; Lincoln & Skrzypek 1990; Perrone & Cooper 1993; Tumer & Ghosh 1996c). Concepts such as "weighted averaging" (Benediktsso=-=n et al. 1994; Hashem & Schmeiser 1993; Jacobs 1995; Linco-=-ln & Skrzypek 1990), "rank based combining" (Al-Ghoneim & Vijaya Kumar 1995; Ho, Hull, & Srihari 1994), "belief based combining" (Rogova 1994; Xu, Krzyzak, & Suen 1992; Yang & Sing... |

19 |
Combining Forecasts—Twenty years later
- Granger
- 1989
(Show Context)
Citation Context ...ll, & Srihari 1994), "belief based combining" (Rogova 1994; Xu, Krzyzak, & Suen 1992; Yang & Singh 1994) have also been analyzed. Combining has also been studied in other fields such as econ=-=ometrics (Granger 1989-=-), and machine learning (Barnett 1981; Garvey, Lowrance, & Fischler 1981). This paper summarizes our recent theoretical results that quantify the improvements due to multiple classifier combining. Fir... |

18 |
Estimation of location and scale parameters by order statistics by singly and doubly censored samples
- Sarhan, Greenberg
- 1956
(Show Context)
Citation Context ...found in tabulated form (Arnold, Balakrishnan, & Nagaraja 1992). For example, Table 1 provides ff values for all three os combiners for a Gaussian distribution (Arnold, Balakrishnan, & Nagaraja 1992; =-=Sarhan & Greenberg 1956-=-). Table 1: Reduction factors ff for the min, max and med combiners for Gaussian error model. N OS Combiners minimum/maximum median 2 .682 .682 3 .560 .449 4 .492 .361 5 .448 .287 10 .344 .151 15 .301... |

18 |
Stacked generalization. Neural Networks 5:241–259
- Wolpert
- 1992
(Show Context)
Citation Context ...limited number of training data, or unusually high dimensional patterns. The concept of combining has been studied in the connectionist framework in several forms, including "stacked generalizati=-=on" (Wolpert 1992), and ens-=-emble methods (Hampshire & Waibel 1992; Lincoln & Skrzypek 1990; Perrone & Cooper 1993; Tumer & Ghosh 1996c). Concepts such as "weighted averaging" (Benediktsson et al. 1994; Hashem & Schmei... |

15 | Integration of neural classifiers for passive sonar signals
- Ghosh, Tumer, et al.
- 1996
(Show Context)
Citation Context ... all the available data (Tumer & Ghosh 1996b). Discussion Combining the outputs of several classifiers before making the classification decision, has led to improved performance in many applications (=-=Ghosh et al. 1996-=-; Xu, Krzyzak, & Suen 1992; Zhang, Mesirov, & Waltz 1992). This paper summarizes our recent results that: quantify such improvements; lead to the estimation of the Bayes error rate; express the decisi... |

15 |
An evidential reasoning approach for multiple-attribute decision making with uncertainty
- Yang, Singh
- 1994
(Show Context)
Citation Context ...meiser 1993; Jacobs 1995; Lincoln & Skrzypek 1990), "rank based combining" (Al-Ghoneim & Vijaya Kumar 1995; Ho, Hull, & Srihari 1994), "belief based combining" (Rogova 1994; Xu, Kr=-=zyzak, & Suen 1992; Yang & Singh 1994-=-) have also been analyzed. Combining has also been studied in other fields such as econometrics (Granger 1989), and machine learning (Barnett 1981; Garvey, Lowrance, & Fischler 1981). This paper summa... |

12 | Learning ranks with neural networks (Invited paper - Al-Ghoneim, Kumar - 1995 |

8 |
Bounds on the Bayes classification error based on pairwise risk functions
- Garber, Djouadi
- 1988
(Show Context)
Citation Context ...on provides the lowest error rates, the problem is equivalent to determining the Bayes error (Devijver & Kittler 1982; Schalkoff 1992; Young & Calvert 1974), which can be expressed as (Fukunaga 1990; =-=Garber & Djouadi 1988-=-): E bay = 1 \Gamma L X i=1 Z C i P (c i )p(xjc i )dx (9) where C i is the region where class i has the highest posterior, P (c i ) is the a priori class probability of class i, 1sisL, and p(xjc i ) i... |

8 |
The Meta-Pi network: Building distributed representations for robust multisource pattern recognition
- Hampshire, Waibel
- 1992
(Show Context)
Citation Context ...nusually high dimensional patterns. The concept of combining has been studied in the connectionist framework in several forms, including "stacked generalization" (Wolpert 1992), and ensemble=-= methods (Hampshire & Waibel 1992; Lincoln -=-& Skrzypek 1990; Perrone & Cooper 1993; Tumer & Ghosh 1996c). Concepts such as "weighted averaging" (Benediktsson et al. 1994; Hashem & Schmeiser 1993; Jacobs 1995; Lincoln & Skrzypek 1990),... |

7 |
Parallel consensual neural networks with optimally weighted outputs
- Benediktsson, Sveinsson, et al.
- 1994
(Show Context)
Citation Context ...ng "stacked generalization" (Wolpert 1992), and ensemble methods (Hampshire & Waibel 1992; Lincoln & Skrzypek 1990; Perrone & Cooper 1993; Tumer & Ghosh 1996c). Concepts such as "weight=-=ed averaging" (Benediktsson et al. 1994; Hashem & Schmeiser-=- 1993; Jacobs 1995; Lincoln & Skrzypek 1990), "rank based combining" (Al-Ghoneim & Vijaya Kumar 1995; Ho, Hull, & Srihari 1994), "belief based combining" (Rogova 1994; Xu, Krzyzak,... |

6 | The quality of training-sample estimates of the Battacharyya coefficient - Djouadi, Snorrason, et al. - 1990 |

6 |
The estimation of the Bayes error by the k-nearest neighbor approach
- Fukunaga
- 1985
(Show Context)
Citation Context ...ecursively extended to multi-class problems (Garber & Djouadi 1988). A non-parametric method based on the nearest neighbor classifier also provides also bounds for the Bayes error (Cover & Hart 1967; =-=Fukunaga 1985-=-). An alternative method that estimates the Bayes error is based on the classifier combining strategy (Tumer Table 2: Bayes Error Estimates for Artificial Data. Data Actual Combiner Nearest Neighbor M... |

4 |
Limits to performance gains in combined neural classifiers
- Tumer, Ghosh
- 1995
(Show Context)
Citation Context ... and the estimated correlation among them. Table 2 shows the combiner based Bayes error estimate, as well as some classical estimates for two wellknown artificial problems detailed in (Fukunaga 1990; =-=Tumer & Ghosh 1995-=-a). Such experimental results suggest that the combiner based method provides reliable estimates of the Bayes error rate (Tumer & Ghosh 1995a). The Plurality Limit The previous section focused on esti... |

3 |
Bayes error rate estimation through classifier combining
- Tumer, Ghosh
- 1995
(Show Context)
Citation Context ... and the estimated correlation among them. Table 2 shows the combiner based Bayes error estimate, as well as some classical estimates for two wellknown artificial problems detailed in (Fukunaga 1990; =-=Tumer & Ghosh 1995-=-a). Such experimental results suggest that the combiner based method provides reliable estimates of the Bayes error rate (Tumer & Ghosh 1995a). The Plurality Limit The previous section focused on esti... |