Results 1  10
of
89
Experiments with a New Boosting Algorithm
, 1996
"... In an earlier paper, we introduced a new “boosting” algorithm called AdaBoost which, theoretically, can be used to significantly reduce the error of any learning algorithm that consistently generates classifiers whose performance is a little better than random guessing. We also introduced the relate ..."
Abstract

Cited by 1671 (20 self)
 Add to MetaCart
In an earlier paper, we introduced a new “boosting” algorithm called AdaBoost which, theoretically, can be used to significantly reduce the error of any learning algorithm that consistently generates classifiers whose performance is a little better than random guessing. We also introduced the related notion of a “pseudoloss ” which is a method for forcing a learning algorithm of multilabel conceptsto concentrate on the labels that are hardest to discriminate. In this paper, we describe experiments we carried out to assess how well AdaBoost with and without pseudoloss, performs on real learning problems. We performed two sets of experiments. The first set compared boosting to Breiman’s “bagging ” method when used to aggregate various classifiers (including decision trees and single attributevalue tests). We compared the performance of the two methods on a collection of machinelearning benchmarks. In the second set of experiments, we studied in more detail the performance of boosting using a nearestneighbor classifier on an OCR problem.
Improved Boosting Algorithms Using Confidencerated Predictions
 MACHINE LEARNING
, 1999
"... We describe several improvements to Freund and Schapire’s AdaBoost boosting algorithm, particularly in a setting in which hypotheses may assign confidences to each of their predictions. We give a simplified analysis of AdaBoost in this setting, and we show how this analysis can be used to find impr ..."
Abstract

Cited by 705 (26 self)
 Add to MetaCart
We describe several improvements to Freund and Schapire’s AdaBoost boosting algorithm, particularly in a setting in which hypotheses may assign confidences to each of their predictions. We give a simplified analysis of AdaBoost in this setting, and we show how this analysis can be used to find improved parameter settings as well as a refined criterion for training weak hypotheses. We give a specific method for assigning confidences to the predictions of decision trees, a method closely related to one used by Quinlan. This method also suggests a technique for growing decision trees which turns out to be identical to one proposed by Kearns and Mansour. We focus next on how to apply the new boosting algorithms to multiclass classification problems, particularly to the multilabel case in which each example may belong to more than one class. We give two boosting methods for this problem, plus a third method based on output coding. One of these leads to a new method for handling the singlelabel case which is simpler but as effective as techniques suggested by Freund and Schapire. Finally, we give some experimental results comparing a few of the algorithms discussed in this paper.
Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers
 Journal of Machine Learning Research
, 2000
"... We present a unifying framework for studying the solution of multiclass categorization problems by reducing them to multiple binary problems that are then solved using a marginbased binary learning algorithm. The proposed framework unifies some of the most popular approaches in which each class ..."
Abstract

Cited by 429 (20 self)
 Add to MetaCart
We present a unifying framework for studying the solution of multiclass categorization problems by reducing them to multiple binary problems that are then solved using a marginbased binary learning algorithm. The proposed framework unifies some of the most popular approaches in which each class is compared against all others, or in which all pairs of classes are compared to each other, or in which output codes with errorcorrecting properties are used. We propose a general method for combining the classifiers generated on the binary problems, and we prove a general empirical multiclass loss bound given the empirical loss of the individual binary learning algorithms. The scheme and the corresponding bounds apply to many popular classification learning algorithms including supportvector machines, AdaBoost, regression, logistic regression and decisiontree algorithms. We also give a multiclass generalization error analysis for general output codes with AdaBoost as the binary learner. Experimental results with SVM and AdaBoost show that our scheme provides a viable alternative to the most commonly used multiclass algorithms.
The Foundations of CostSensitive Learning
 In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence
, 2001
"... This paper revisits the problem of optimal learning and decisionmaking when different misclassification errors incur different penalties. We characterize precisely but intuitively when a cost matrix is reasonable, and we show how to avoid the mistake of defining a cost matrix that is economically i ..."
Abstract

Cited by 275 (5 self)
 Add to MetaCart
This paper revisits the problem of optimal learning and decisionmaking when different misclassification errors incur different penalties. We characterize precisely but intuitively when a cost matrix is reasonable, and we show how to avoid the mistake of defining a cost matrix that is economically incoherent. For the twoclass case, we prove a theorem that shows how to change the proportion of negative examples in a training set in order to make optimal costsensitive classification decisions using a classifier learned by a standard noncostsensitive learning method. However, we then argue that changing the balance of negative and positive training examples has little effect on the classifiers produced by standard Bayesian and decision tree learning methods. Accordingly, the recommended way of applying one of these methods in a domain with differing misclassification costs is to learn a classifier from the training set as given, and then to compute optimal decisions ...
Automatic Construction of Decision Trees from Data: A MultiDisciplinary Survey
 Data Mining and Knowledge Discovery
, 1997
"... Decision trees have proved to be valuable tools for the description, classification and generalization of data. Work on constructing decision trees from data exists in multiple disciplines such as statistics, pattern recognition, decision theory, signal processing, machine learning and artificial ne ..."
Abstract

Cited by 153 (1 self)
 Add to MetaCart
Decision trees have proved to be valuable tools for the description, classification and generalization of data. Work on constructing decision trees from data exists in multiple disciplines such as statistics, pattern recognition, decision theory, signal processing, machine learning and artificial neural networks. Researchers in these disciplines, sometimes working on quite different problems, identified similar issues and heuristics for decision tree construction. This paper surveys existing work on decision tree construction, attempting to identify the important issues involved, directions the work has taken and the current state of the art. Keywords: classification, treestructured classifiers, data compaction 1. Introduction Advances in data collection methods, storage and processing technology are providing a unique challenge and opportunity for automated data exploration techniques. Enormous amounts of data are being collected daily from major scientific projects e.g., Human Genome...
Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers
 In Proceedings of the Eighteenth International Conference on Machine Learning
, 2001
"... Accurate, wellcalibrated estimates of class membership probabilities are needed in many supervised learning applications, in particular when a costsensitive decision must be made about examples with exampledependent costs. This paper presents simple but successful methods for obtaining calibrated ..."
Abstract

Cited by 101 (4 self)
 Add to MetaCart
Accurate, wellcalibrated estimates of class membership probabilities are needed in many supervised learning applications, in particular when a costsensitive decision must be made about examples with exampledependent costs. This paper presents simple but successful methods for obtaining calibrated probability estimates from decision tree and naive Bayesian classifiers. Using the large and challenging KDD'98 contest dataset as a testbed, we report the results of a detailed experimental comparison of ten methods, according to four evaluation measures. We conclude that binning succeeds in significantly improving naive Bayesian probability estimates, while for improving decision tree probability estimates, we recommend smoothing by estimation and a new variant of pruning that we call curtailment.
Learning Decision Trees Using the Area Under the ROC Curve
 Proceedings of the 19th International Conference on Machine Learning
, 2002
"... ROC analysis is increasingly being recognised as an important tool for evaluation and comparison of classifiers when the operating characteristics (i.e. class distribution and cost parameters) are not known at training time. Usually, each classifier is characterised by its estimated true and f ..."
Abstract

Cited by 93 (16 self)
 Add to MetaCart
ROC analysis is increasingly being recognised as an important tool for evaluation and comparison of classifiers when the operating characteristics (i.e. class distribution and cost parameters) are not known at training time. Usually, each classifier is characterised by its estimated true and false positive rates and is represented by a single point in the ROC diagram. In this paper, we show how a single decision tree can represent a set of classifiers by choosing different labellings of its leaves, or equivalently, an ordering on the leaves. In this setting, rather than estimating the accuracy of a single tree, it makes more sense to use the area under the ROC curve (AUC) as a quality metric. We also propose a novel splitting criterion which chooses the split with the highest local AUC. To the best of our knowledge, this is the first probabilistic splitting criterion that is not based on weighted average impurity. We present experiments suggesting that the AUC splitting criterion leads to trees with equal or better AUC value, without sacrificing accuracy if a single labelling is chosen.
Extracting Comprehensible Models from Trained Neural Networks
, 1996
"... To Mom, Dad, and Susan, for their support and encouragement. ..."
Abstract

Cited by 69 (3 self)
 Add to MetaCart
To Mom, Dad, and Susan, for their support and encouragement.
The geometry of ROC space: understanding machine learning metrics through ROC isometrics
 in Proceedings of the Twentieth International Conference on Machine Learning
, 2003
"... Many different metrics are used in machine learning and data mining to build and evaluate models. However, there is no general theory of machine learning metrics, that could answer questions such as: When we simultaneously want to optimise two criteria, how can or should they be traded off? Some met ..."
Abstract

Cited by 64 (7 self)
 Add to MetaCart
Many different metrics are used in machine learning and data mining to build and evaluate models. However, there is no general theory of machine learning metrics, that could answer questions such as: When we simultaneously want to optimise two criteria, how can or should they be traded off? Some metrics are inherently independent of class and misclassification cost distributions, while other are not — can this be made more precise? This paper provides a derivation of ROC space from first principles through 3D ROC space and the skew ratio, and redefines metrics in these dimensions. The paper demonstrates that the graphical depiction of machine learning metrics by means of ROC isometrics gives many useful insights into the characteristics of these metrics, and provides a foundation on which a theory of machine learning metrics can be built. 1.
Exploiting the cost (in)sensitivity of decision tree splitting criteria
 In Proceedings of the Seventeenth International Conference on Machine Learning
, 2000
"... This paper investigates how the splitting criteria and pruning methods of decision tree learning algorithms are in uenced by misclassi cation costs or changes to the class distribution. Splitting criteria that are relatively insensitive to costs (class distributions) are found to perform as well as ..."
Abstract

Cited by 49 (4 self)
 Add to MetaCart
This paper investigates how the splitting criteria and pruning methods of decision tree learning algorithms are in uenced by misclassi cation costs or changes to the class distribution. Splitting criteria that are relatively insensitive to costs (class distributions) are found to perform as well as or better than, in terms of expected misclassi cation cost, splitting criteria that are cost sensitive. Consequently there are two opposite ways of dealing with imbalance. One is to combine a costinsensitive splitting criterion with a cost insensitive pruning method to produce a decision tree algorithm little a ected by cost or prior class distribution. The other is to grow a costindependent tree which is then pruned in a costsensitive manner. 1.