Results 1  10
of
44
Error Correlation And Error Reduction In Ensemble Classifiers
, 1996
"... Using an ensemble of classifiers, instead of a single classifier, can lead to improved generalization. The gains obtained by combining however, are often affected more by the selection of what is presented to the combiner, than by the actual combining method that is chosen. In this paper we focus ..."
Abstract

Cited by 164 (22 self)
 Add to MetaCart
(Show Context)
Using an ensemble of classifiers, instead of a single classifier, can lead to improved generalization. The gains obtained by combining however, are often affected more by the selection of what is presented to the combiner, than by the actual combining method that is chosen. In this paper we focus on data selection and classifier training methods, in order to "prepare" classifiers for combining. We review a combining framework for classification problems that quantifies the need for reducing the correlation among individual classifiers. Then, we discuss several methods that make the classifiers in an ensemble more complementary. Experimental results are provided to illustrate the benefits and pitfalls of reducing the correlation among classifiers, especially when the training data is in limited supply. 2 1 Introduction A classifier's ability to meaningfully respond to novel patterns, or generalize, is perhaps its most important property (Levin et al., 1990; Wolpert, 1990). In...
An empirical comparison of pattern recognition, neural nets, and machine learning classification methods
 In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence
, 1989
"... Classification methods from statistical pattern recognition, neural nets, and machine learning were applied to four realworld data sets. Each of these data sets has been previously analyzed and reported in the statistical, medical, or machine learning literature. The data sets are characterized by ..."
Abstract

Cited by 130 (2 self)
 Add to MetaCart
Classification methods from statistical pattern recognition, neural nets, and machine learning were applied to four realworld data sets. Each of these data sets has been previously analyzed and reported in the statistical, medical, or machine learning literature. The data sets are characterized by statisucal uncertainty; there is no completely accurate solution to these problems. Training and testing or resampling techniques are used to estimate the true error rates of the classification methods. Detailed attention is given to the analysis of performance of the neural nets using back propagation. For these problems, which have relatively few hypotheses and features, the machine learning procedures for rule induction or tree induction clearly performed best. 1
Dimensionality Reduction Using Genetic Algorithms
, 2000
"... Pattern recognition generally requires that objects be described in terms of a set of measurable features. The selection and quality of the features representing each pattern has a considerable bearing on the success of subsequent pattern classification. Feature extraction is the process of deriving ..."
Abstract

Cited by 89 (8 self)
 Add to MetaCart
Pattern recognition generally requires that objects be described in terms of a set of measurable features. The selection and quality of the features representing each pattern has a considerable bearing on the success of subsequent pattern classification. Feature extraction is the process of deriving new features from the original features in order to reduce the cost of feature measurement, increase classifier efficiency, and allow higher classification accuracy. Many current feature extraction techniques involve linear transformations of the original pattern vectors to new vectors of lower dimensionality. While this is useful for data visualization and increasing classification efficiency, it does not necessarily reduce the number of features that must be measured, since each new feature may be a linear combination of all of the features in the original pattern vector. Here we present a new approach to feature extraction in which feature selection, feature extraction, and classifier training are performed simultaneously using a genetic algorithm. The genetic algorithm optimizes a vector of feature weights, which are used to scale the individual features in the original pattern vectors in either a linear or a nonlinear fashion. A masking vector is also employed to perform simultaneous selection of a subset of the features. We employ this technique in combination with the knearestneighbor classification rule, and compare the results with classical feature selection and extraction techniques, including sequential floating forward feature selection, and linear discriminant analysis. We also present results for identification of favorable water binding sites on protein surfaces, an important problem in biochemistry and drug design.
Linear and Order Statistics Combiners for Pattern Classification
 Combining Artificial Neural Nets
, 1999
"... Several researchers have experimentally shown that substantial improvements can be obtained in difficult pattern recognition problems by combining or integrating the outputs of multiple classifiers. This chapter provides an analytical framework to quantify the improvements in classification resul ..."
Abstract

Cited by 69 (7 self)
 Add to MetaCart
(Show Context)
Several researchers have experimentally shown that substantial improvements can be obtained in difficult pattern recognition problems by combining or integrating the outputs of multiple classifiers. This chapter provides an analytical framework to quantify the improvements in classification results due to combining. The results apply to both linear combiners and order statistics combiners. We first show that to a first order approximation, the error rate obtained over and above the Bayes error rate, is directly proportional to the variance of the actual decision boundaries around the Bayes optimum boundary. Combining classifiers in output space reduces this variance, and hence reduces the "added" error. If N unbiased classifiers are combined by simple averaging, the added error rate can be reduced by a factor of N if the individual errors in approximating the decision boundaries are uncorrelated. Expressions are then derived for linear combiners which are biased or correlated, and the effect of output correlations on ensemble performance is quantified. For order statistics based nonlinear combiners, we derive expressions that indicate how much the median, the maximum and in general the ith order statistic can improve classifier performance. The analysis presented here facilitates the understanding of the relationships among error rates, classifier boundary distributions, and combining in output space. Experimental results on several public domain data sets are provided to illustrate the benefits of combining and to support the analytical results.
Crossvalidation and the bootstrap: estimating the error rate of the predicting rule
, 1995
"... ..."
Small Sample Statistics for Classification Error Rates I: Error Rate Measurements
 Dept. of Inf. and Comp. Sci
, 1996
"... Several methods (independent subsamples, leaveoneout, crossvalidation, and bootstrapping) have been proposed for estimating the error rates of classifiers. The rationale behind the various estimators and the causes of the sometimes conflicting claims regarding their bias and precision are explore ..."
Abstract

Cited by 31 (1 self)
 Add to MetaCart
Several methods (independent subsamples, leaveoneout, crossvalidation, and bootstrapping) have been proposed for estimating the error rates of classifiers. The rationale behind the various estimators and the causes of the sometimes conflicting claims regarding their bias and precision are explored in this paper. The biases and variances of each of the estimators are examined empirically. Crossvalidation, 10fold or greater, seems to be the best approach; the other methods are biased, have poorer precision, or are inconsistent. Though unbiased for linear discriminant classifiers, the 632b bootstrap estimator is biased for nearest neighbors classifiers, more so for single nearest neighbor than for three nearest neighbors. The 632b estimator is also biased for Cartstyle decision trees. Weiss' loo* estimator is unbiased and has better precision than crossvalidation for discriminant and nearest neighbors classifiers, but its lack of bias and improved precision for those classifiers do...
Classifier Combining: Analytical Results and Implications
 In Proceedings of the AAAI96 Workshop on Integrating Multiple Learned Models for Improving and Scaling Machine Learning Algorithms
, 1995
"... Several researchers have experimentally shown that substantial improvements can be obtained in difficult pattern recognition problems by combining or integrating the outputs of multiple classifiers. This paper summarizes our recent theoretical results that quantify the improvements due to multiple c ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
Several researchers have experimentally shown that substantial improvements can be obtained in difficult pattern recognition problems by combining or integrating the outputs of multiple classifiers. This paper summarizes our recent theoretical results that quantify the improvements due to multiple classifier combining. Furthermore, we present an extension of this theory that leads to an estimate of the Bayes error rate. Practical aspects such as expressing the confidences in decisions and determining the best data partition/classifier selection are also discussed. Keywords: Linear combining, order statistics combining, Bayes error, error correlation, error reduction, ensemble networks, performance limits. Introduction Given infinite training data, consistent classifiers approximate the Bayesian decision boundaries to arbitrary precision, therefore providing similar generalizations (Geman, Bienenstock, & Doursat 1992). However, often only a limited portion of the pattern space is avai...
Unbiased Estimation of Ellipses by Bootstrapping
 IEEE PAMI
, 1996
"... A general method for eliminating the bias of nonlinear estimators using bootstrap is presented. Instead of the traditional mean bias we consider the definition of bias based on the median. The method is applied to the problem of fitting ellipse segments to noisy data. No assumption beyond being ind ..."
Abstract

Cited by 17 (2 self)
 Add to MetaCart
(Show Context)
A general method for eliminating the bias of nonlinear estimators using bootstrap is presented. Instead of the traditional mean bias we consider the definition of bias based on the median. The method is applied to the problem of fitting ellipse segments to noisy data. No assumption beyond being independent identically distributed (i.i.d.) is made about the error distribution and experiments with both synthetic and real data prove the effectiveness of the technique. Index terms: implicit models, curve fitting, bootstrap, lowlevel processing. 1 Conic Fitting Image formation is a perspective projection of the 3D visual environment. Features extracted from a 2D image can be useful only if they preserve some of the geometric properties of the 3D object they correspond to. Collinearity and conicity are such properties, and therefore line and conic segments are widely used as geometric primitives in computer vision. Let f(u; `) = 0 be the implicit model of a geometric primitive in the ima...
Learning pattern classification  A survey
 IEEE TRANS. INFORM. THEORY
, 1998
"... Classical and recent results in statistical pattern recognition and learning theory are reviewed in a twoclass pattern classification setting. This basic model best illustrates intuition and analysis techniques while still containing the essential features and serving as a prototype for many applic ..."
Abstract

Cited by 16 (4 self)
 Add to MetaCart
Classical and recent results in statistical pattern recognition and learning theory are reviewed in a twoclass pattern classification setting. This basic model best illustrates intuition and analysis techniques while still containing the essential features and serving as a prototype for many applications. Topics discussed include nearest neighbor, kernel, and histogram methods, Vapnik–Chervonenkis theory, and neural networks. The presentation and the large (thogh nonexhaustive) list of references is geared to provide a useful overview of this field for both specialists and nonspecialists.
Bias and Variance of Validation Methods for Function Approximation Neural Networks Under Conditions of Sparse Data
 IEEE Transactions on Systems, Man, and Cybernetics, Part C
, 1998
"... Neural networks must be constructed and validated with strong empirical dependence, which is difficult under conditions of sparse data. This paper examines the most common methods of neural network validation along with several general validation methods from the statistical resampling literature ..."
Abstract

Cited by 15 (8 self)
 Add to MetaCart
(Show Context)
Neural networks must be constructed and validated with strong empirical dependence, which is difficult under conditions of sparse data. This paper examines the most common methods of neural network validation along with several general validation methods from the statistical resampling literature as applied to function approximation networks with small sample sizes. It is shown that an increase in computation, necessary for the statistical resampling methods, produces networks that perform better than those constructed in the traditional manner. The statistical resampling methods also result in lower variance of validation, however some of the methods are biased in estimating network error. 1. INTRODUCTION To be beneficial, system models must be validated to assure the users that the model emulates the actual system in the desired manner. This is especially true of empirical models, such as neural network and statistical models, which rely primarily on observed data rather th...