Results 1 - 10
of
96
Error Correlation And Error Reduction In Ensemble Classifiers
, 1996
"... Using an ensemble of classifiers, instead of a single classifier, can lead to improved generalization. The gains obtained by combining however, are often affected more by the selection of what is presented to the combiner, than by the actual combining method that is chosen. In this paper we focus ..."
Abstract
-
Cited by 139 (21 self)
- Add to MetaCart
Using an ensemble of classifiers, instead of a single classifier, can lead to improved generalization. The gains obtained by combining however, are often affected more by the selection of what is presented to the combiner, than by the actual combining method that is chosen. In this paper we focus on data selection and classifier training methods, in order to "prepare" classifiers for combining. We review a combining framework for classification problems that quantifies the need for reducing the correlation among individual classifiers. Then, we discuss several methods that make the classifiers in an ensemble more complementary. Experimental results are provided to illustrate the benefits and pitfalls of reducing the correlation among classifiers, especially when the training data is in limited supply. 2 1 Introduction A classifier's ability to meaningfully respond to novel patterns, or generalize, is perhaps its most important property (Levin et al., 1990; Wolpert, 1990). In...
Face Recognition: A Convolutional Neural Network Approach
- IEEE Transactions on Neural Networks
, 1997
"... Faces represent complex, multidimensional, meaningful visual stimuli and developing a computational model for face recognition is difficult [43]. We present a hybrid neural network solution which compares favorably with other methods. The system combines local image sampling, a self-organizing map n ..."
Abstract
-
Cited by 127 (0 self)
- Add to MetaCart
Faces represent complex, multidimensional, meaningful visual stimuli and developing a computational model for face recognition is difficult [43]. We present a hybrid neural network solution which compares favorably with other methods. The system combines local image sampling, a self-organizing map neural network, and a convolutional neural network. The self-organizing map provides a quantization of the image samples into a topological space where inputs that are nearby in the original space are also nearby in the output space, thereby providing dimensionality reduction and invariance to minor changes in the image sample, and the convolutional neural network provides for partial invariance to translation, rotation, scale, and deformation. The convolutional network extracts successively larger features in a hierarchical set of layers. We present results using the Karhunen-Loeve transform in place of the self-organizing map, and a multi-layer perceptron in place of the convolutional netwo...
Decision templates for multiple classifier fusion: an experimental comparison
- Pattern Recognition
, 2001
"... Multiple classifier fusion may generate more accurate classification than each of the constituent classifiers. Fusion is often based on fixed combination rules like the product and average. Only under strict probabilistic conditions can these rules be justified. We present here a simple rule for ada ..."
Abstract
-
Cited by 77 (7 self)
- Add to MetaCart
Multiple classifier fusion may generate more accurate classification than each of the constituent classifiers. Fusion is often based on fixed combination rules like the product and average. Only under strict probabilistic conditions can these rules be justified. We present here a simple rule for adapting the class combiner to the application. c decision templates (one per class) are estimated with the same training set that is used for the set of classifiers. These templates are then matched to the decision profile of new incoming objects by some similarity measure. We compare 11 versions of our model with 14 other techniques for classifier fusion on the Satimage and Phoneme datasets from the database ELENA. Our results show that decision templates based on integral type measures of similarity are superior to the other schemes on both data sets.
Nonlinear Gated Experts for Time Series: Discovering Regimes and Avoiding Overfitting
, 1995
"... this paper: ftp://ftp.cs.colorado.edu/pub/Time-Series/MyPapers/experts.ps.Z, ..."
Abstract
-
Cited by 74 (5 self)
- Add to MetaCart
this paper: ftp://ftp.cs.colorado.edu/pub/Time-Series/MyPapers/experts.ps.Z,
Issues in Stacked Generalization
- Journal of Artificial Intelligence Research
, 1999
"... Stacked generalization is a general method of using a high-level model to combine lower-level models to achieve greater predictive accuracy. In this paper we address two crucial issues which have been considered to be a `black art' in classification tasks ever since the introduction of stacked gener ..."
Abstract
-
Cited by 71 (1 self)
- Add to MetaCart
Stacked generalization is a general method of using a high-level model to combine lower-level models to achieve greater predictive accuracy. In this paper we address two crucial issues which have been considered to be a `black art' in classification tasks ever since the introduction of stacked generalization in 1992 by Wolpert: the type of generalizer that is suitable to derive the higher-level model, and the kind of attributes that should be used as its input. We find that best results are obtained when the higher-level model combines the confidence (and not just the predictions) of the lower-level ones.
On Combining Artificial Neural Nets
- Connection Science
, 1996
"... This paper reviews research on combining artificial neural nets, and provides an overview of, and an introduction to, the papers contained this Special Issue, and its companion (Connection Science, 9, 1). Two main approaches, ensemble-based, and modular, are identified and considered. An ensembl ..."
Abstract
-
Cited by 67 (3 self)
- Add to MetaCart
This paper reviews research on combining artificial neural nets, and provides an overview of, and an introduction to, the papers contained this Special Issue, and its companion (Connection Science, 9, 1). Two main approaches, ensemble-based, and modular, are identified and considered. An ensemble, or committee, is made up of a set of nets, each of which is a general function approximator. The members of the ensemble are combined in order to obtain better generalisation performance than would be achieved by any of the individual nets. The main issues considered here under the heading of ensemble-based approaches, are (a) how to combine the outputs of the ensemble members (b) how to create candidate ensemble members and (c) which methods lead to the most effective ensembles? Under the heading of modular approaches we begin by considering a divide-and-conquer approach by which a function is automatically decomposed into a number of subfunctions which are treated by specialis...
Combining Multiple Classifiers By Averaging Or By Multiplying?
, 2000
"... In classification tasks it may be wise to combine observations from di!erent sources. Not only it decreases the training time but it can also increase the robustness and the performance of the classi"cation. Combining is often done by just (weighted) averaging of the outputs of the di!erent classi"e ..."
Abstract
-
Cited by 60 (1 self)
- Add to MetaCart
In classification tasks it may be wise to combine observations from di!erent sources. Not only it decreases the training time but it can also increase the robustness and the performance of the classi"cation. Combining is often done by just (weighted) averaging of the outputs of the di!erent classi"ers. Using equal weights for all classi"ers then results in the mean combination rule. This works very well in practice, but the combination strategy lacks a fundamental basis as it cannot readily be derived from the joint probabilities. This contrasts with the product combination rule which can be obtained from the joint probability under the assumption of independency. In this paper we will show di!erences and similarities between this mean combination rule and the product combination rule in theory and in practice. # 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved.
Linear and Order Statistics Combiners for Pattern Classification
- Combining Artificial Neural Nets
, 1999
"... Several researchers have experimentally shown that substantial improvements can be obtained in difficult pattern recognition problems by combining or integrating the outputs of multiple classifiers. This chapter provides an analytical framework to quantify the improvements in classification resul ..."
Abstract
-
Cited by 56 (6 self)
- Add to MetaCart
Several researchers have experimentally shown that substantial improvements can be obtained in difficult pattern recognition problems by combining or integrating the outputs of multiple classifiers. This chapter provides an analytical framework to quantify the improvements in classification results due to combining. The results apply to both linear combiners and order statistics combiners. We first show that to a first order approximation, the error rate obtained over and above the Bayes error rate, is directly proportional to the variance of the actual decision boundaries around the Bayes optimum boundary. Combining classifiers in output space reduces this variance, and hence reduces the "added" error. If N unbiased classifiers are combined by simple averaging, the added error rate can be reduced by a factor of N if the individual errors in approximating the decision boundaries are uncorrelated. Expressions are then derived for linear combiners which are biased or correlated, and the effect of output correlations on ensemble performance is quantified. For order statistics based non-linear combiners, we derive expressions that indicate how much the median, the maximum and in general the ith order statistic can improve classifier performance. The analysis presented here facilitates the understanding of the relationships among error rates, classifier boundary distributions, and combining in output space. Experimental results on several public domain data sets are provided to illustrate the benefits of combining and to support the analytical results.
Graphical models and automatic speech recognition
- Mathematical Foundations of Speech and Language Processing
, 2003
"... Graphical models provide a promising paradigm to study both existing and novel techniques for automatic speech recognition. This paper first provides a brief overview of graphical models and their uses as statistical models. It is then shown that the statistical assumptions behind many pattern recog ..."
Abstract
-
Cited by 49 (10 self)
- Add to MetaCart
Graphical models provide a promising paradigm to study both existing and novel techniques for automatic speech recognition. This paper first provides a brief overview of graphical models and their uses as statistical models. It is then shown that the statistical assumptions behind many pattern recognition techniques commonly used as part of a speech recognition system can be described by a graph – this includes Gaussian distributions, mixture models, decision trees, factor analysis, principle component analysis, linear discriminant analysis, and hidden Markov models. Moreover, this paper shows that many advanced models for speech recognition and language processing can also be simply described by a graph, including many at the acoustic-, pronunciation-, and language-modeling levels. A number of speech recognition techniques born directly out of the graphical-models paradigm are also surveyed. Additionally, this paper includes a novel graphical analysis regarding why derivative (or delta) features improve hidden Markov model-based speech recognition by improving structural discriminability. It also includes an example where a graph can be used to represent language model smoothing constraints. As will be seen, the space of models describable by a graph is quite large. A thorough exploration of this space should yield techniques that ultimately will supersede the hidden Markov model.

