Results 1  10
of
28
On bias, variance, 0/1loss, and the curseofdimensionality
 Data Mining and Knowledge Discovery
, 1997
"... Abstract. The classification problem is considered in which an output variable y assumes discrete values with respective probabilities that depend upon the simultaneous values of a set of input variables x ={x1,...,xn}.At issue is how error in the estimates of these probabilities affects classificat ..."
Abstract

Cited by 193 (1 self)
 Add to MetaCart
Abstract. The classification problem is considered in which an output variable y assumes discrete values with respective probabilities that depend upon the simultaneous values of a set of input variables x ={x1,...,xn}.At issue is how error in the estimates of these probabilities affects classification error when the estimates are used in a classification rule. These effects are seen to be somewhat counter intuitive in both their strength and nature. In particular the bias and variance components of the estimation error combine to influence classification in a very different way than with squared error on the probabilities themselves. Certain types of (very high) bias can be canceled by low variance to produce accurate classification. This can dramatically mitigate the effect of the bias associated with some simple estimators like “naive ” Bayes, and the bias induced by the curseofdimensionality on nearestneighbor procedures. This helps explain why such simple methods are often competitive with and sometimes superior to more sophisticated ones for classification, and why “bagging/aggregating ” classifiers can often improve accuracy. These results also suggest simple modifications to these procedures that can (sometimes dramatically) further improve their classification performance.
Data Mining using MLC++: A Machine Learning Library in C++
 INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS
, 1997
"... Data mining algorithmsincluding machine learning, statistical analysis, and pattern recognition techniques can greatly improve our understanding of data warehouses that are now becoming more widespread. In this paper, we focus on classification algorithms and review the need for multiple classificat ..."
Abstract

Cited by 154 (16 self)
 Add to MetaCart
Data mining algorithmsincluding machine learning, statistical analysis, and pattern recognition techniques can greatly improve our understanding of data warehouses that are now becoming more widespread. In this paper, we focus on classification algorithms and review the need for multiple classification algorithms. We describe a system called MLC++ , which was designed to help choose the appropriate classification algorithm for a given dataset by making it easy to compare the utility of different algorithms on a specific dataset of interest. MLC ++ not only provides a workbench for such comparisons, but also provides a library of C ++ classes to aid in the development of new algorithms, especially hybrid algorithms and multistrategy algorithms. Such algorithms are generally hard to code from scratch. We discuss design issues, interfaces to other programs, and visualization of the resulting classifiers. 1 Introduction Data warehouses containing massive amounts of data have been b...
Wrappers For Performance Enhancement And Oblivious Decision Graphs
, 1995
"... In this doctoral dissertation, we study three basic problems in machine learning and two new hypothesis spaces with corresponding learning algorithms. The problems we investigate are: accuracy estimation, feature subset selection, and parameter tuning. The latter two problems are related and are stu ..."
Abstract

Cited by 107 (8 self)
 Add to MetaCart
In this doctoral dissertation, we study three basic problems in machine learning and two new hypothesis spaces with corresponding learning algorithms. The problems we investigate are: accuracy estimation, feature subset selection, and parameter tuning. The latter two problems are related and are studied under the wrapper approach. The hypothesis spaces we investigate are: decision tables with a default majority rule (DTMs) and oblivious readonce decision graphs (OODGs).
Bias, Variance and Prediction Error for Classification Rules
, 1996
"... We study the notions of bias and variance for classification rules. Following Efron (1978) we develop a decomposition of prediction error into its natural components. Then we derive bootstrap estimates of these components and illustrate how they can be used to describe the error behaviour of a class ..."
Abstract

Cited by 33 (1 self)
 Add to MetaCart
We study the notions of bias and variance for classification rules. Following Efron (1978) we develop a decomposition of prediction error into its natural components. Then we derive bootstrap estimates of these components and illustrate how they can be used to describe the error behaviour of a classifier in practice. In the process we also obtain a bootstrap estimate of the error of a "bagged" classifier. Keywords: classification, prediction error, bias, variance, bootstrap 1 Introduction This article concerns classification rules that have been constructed from a set of training data. The training set X = (x 1 ; x 2 ; \Delta \Delta \Delta ; x n ) consists of n observations x i = (t i ; g i ), with t i being the predictor or feature vector and g i being the response, taking values in f1; 2; : : : Kg. On the basis of X the Addresses: tibs@utstat.toronto.edu; http://www.utstat.toronto.edu/¸tibs statistician constructs a classification rule C(t; X ). Our objective here is to unde...
Classification of seismic signals by integrating ensembles of neural networks
 in Proc. Int. Conf. Neural Inform. Process., Hong Kong
, 1996
"... Abstract—We examine a classification problem in which seismic waveforms of natural earthquakes are to be distinguished from waveforms of manmade explosions. We present an integrated classification machine (ICM), which is a hierarchy of artificial neural networks (ANN’s) that are trained to classify ..."
Abstract

Cited by 19 (0 self)
 Add to MetaCart
Abstract—We examine a classification problem in which seismic waveforms of natural earthquakes are to be distinguished from waveforms of manmade explosions. We present an integrated classification machine (ICM), which is a hierarchy of artificial neural networks (ANN’s) that are trained to classify the seismic waveforms. In order to maximize the gain of combining the multiple ANN’s, we suggest construction of a redundant classification environment (RCE) that consists of several “experts ” whose expertise depends on the different input representations to which they are exposed. In the proposed scheme, the experts are ensembles of ANN, trained on different Bootstrap replicas. We use various network architectures, different time–frequency decompositions of the seismic waveforms, and various smoothening levels in order to achieve an RCE. A confidence measure for the ensemble’s classification is defined based on the agreement (variance) within the ensembles, and an algorithm for a nonlinear integration of the ensembles using this measure is presented. An implementation on a data set of 380 seismic events is described, where the proposed ICM had classified correctly 92% of the testing signals. The comparison we made with classical methods indicates that combining a collection of ensembles of ANN’s can be used to handle complex high dimensional classification problems. Index Terms — Averaging, bootstrap, classification, combining estimators, ensembles.
Classifying Seismic Signals by Integrating Ensembles of Neural Networks
 Proceedings of ICONIP Hong Kong. Progress in Neural Information Processing
, 1996
"... This paper proposes a classification scheme based on integration of multiple Ensembles of ANNs. It is demonstrated on a classification problem, in which seismic signals of Natural Earthquakes must be distinguished from seismic signals of Artificial Explosions. A Redundant Classification Environment ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
This paper proposes a classification scheme based on integration of multiple Ensembles of ANNs. It is demonstrated on a classification problem, in which seismic signals of Natural Earthquakes must be distinguished from seismic signals of Artificial Explosions. A Redundant Classification Environment consists of several Ensembles of Neural Networks is created and trained on Bootstrap Sample Sets, using various data representations and architectures. The ANNs within the Ensembles are aggregated (as in Bagging) while the Ensembles are integrated nonlinearly, in a signal adaptive manner, using a posterior confidence measure based on the agreement (variance) within the Ensembles. The proposed Integrated Classification Machine achieved 92.1% correct classifications on the seismic test data. Cross Validation evaluations and comparisons indicate that such integration of a collection of ANN's Ensembles is a robust way for handling high dimensional problems with a complex nonstationary signal ...
Automatic Model Selection in Costsensitive Boosting
 Information Fusion
, 2003
"... This paper introduces SSTBoost, a predictive classification methodology designed to target the accuracy of a modified boosting algorithm towards required sensitivity and specificity constraints. The SSTBoost method is demonstrated in practice for the automated medical diagnosis of cancer on a set of ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
This paper introduces SSTBoost, a predictive classification methodology designed to target the accuracy of a modified boosting algorithm towards required sensitivity and specificity constraints. The SSTBoost method is demonstrated in practice for the automated medical diagnosis of cancer on a set of skin lesions (42 melanomas and 110 naevi) described by geometric and colorimetric features. A costsensitive variant of the AdaBoost agorithm is combined with a procedure for the automatic selection of optimal cost paxameters. Within each boosting step, different weights are considered for errors on false negatives and false positives, and differently updated for negatives and positives...
Selection of Treebased Classifiers with the Bootstrap 632+ Rule
 Biometrical Journal
, 1997
"... This paper introduces a novel model selection procedure for treebased classifiers. The method is based on the bootstrap 632+ rule recently proposed by Efron and Tibishirani. The rule allows selecting compact, nonoverfitting classification trees by reweighting the contributions of the resubstitutio ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
This paper introduces a novel model selection procedure for treebased classifiers. The method is based on the bootstrap 632+ rule recently proposed by Efron and Tibishirani. The rule allows selecting compact, nonoverfitting classification trees by reweighting the contributions of the resubstitution and standard bootstrap estimated errors. The proposed method is applied in a medical entomology problem for modeling the risk of parasite presence. Keywords: bootstrap 632+, model selection, classification and regression trees 1 Introduction
ESTIMATED ACCURACY OF CLASSIFICATION OF DEFECTS DETECTED IN WELDED JOINTS BY RADIOGRAPHIC TESTS
"... Abstract: This work is a study to estimate the accuracy of classification of the main classes of weld defects detected by radiography test, such as: undercut, lack of penetration, porosity, slag inclusion, crack or lack of fusion. To carry out this work nonlinear pattern classifiers were developed, ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Abstract: This work is a study to estimate the accuracy of classification of the main classes of weld defects detected by radiography test, such as: undercut, lack of penetration, porosity, slag inclusion, crack or lack of fusion. To carry out this work nonlinear pattern classifiers were developed, using neural networks, and the largest number of radiographic patterns as possible was used as well as statistical inference techniques of random selection of samples with and without repositioning (bootstrap) in order to estimate the accuracy of the classification. The results pointed to an estimated accuracy of around 80 % for the classes of defects analyzed. Introduction: The nondestructive radiographic method of inspection has been widely used over the decades to evaluate the integrity of material and equipment in a wide range of industries. In the specific case of radiographs of welded materials, the research for the development of an automatic or semiautomatic system of analysis of radiographs of welded joints has grown considerably in the last years and especially in the last 10 to 15 years [110]. The latest
On CrossValidation and Stacking: Building seemingly predictive models on random data
"... A number of times when using crossvalidation (CV) while trying to do classification/probability estimation we have observed surprisingly low AUC’s on real data with very few positive examples. AUC is the area under the ROC and measures the ranking ability and corresponds to the probability that a p ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
A number of times when using crossvalidation (CV) while trying to do classification/probability estimation we have observed surprisingly low AUC’s on real data with very few positive examples. AUC is the area under the ROC and measures the ranking ability and corresponds to the probability that a positive example receives a higher model score than a negative example. Intuition seems to suggest that no reasonable methodology should ever result in a model with an AUC significantly below 0.5. The focus of this paper is not on the estimator properties of CV (bias/variance/significance), but rather on the properties of the ‘holdout ’ predictions based on which the CV performance of a model is calculated. We show that CV creates predictions that have an ‘inverse’ ranking with AUC well below 0.25 using features that were initially entirely unpredictive and models that can only perform monotonic transformations. In the extreme, combining CV with bagging (repeated averaging of outofsample predictions) generates ‘holdout ’ predictions with perfectly opposite rankings on random data. While this would raise immediate suspicion upon inspection, we would like to caution the data mining community against using CV for stacking or in currently popular ensemble methods. They can reverse the predictions by assigning negative weights and produce in the end a model that appears to have close to perfect predictability while in reality the data was random. 1.