Results 1  10
of
42
ROC graphs: Notes and practical considerations for data mining researchers
, 2003
"... Receiver Operating Characteristics (ROC) graphs are a useful technique for organizing classifiers and visualizing their performance. ROC graphs are commonly used in medical decision making, and in recent years have been increasingly adopted in the machine learning and data mining research communitie ..."
Abstract

Cited by 169 (0 self)
 Add to MetaCart
(Show Context)
Receiver Operating Characteristics (ROC) graphs are a useful technique for organizing classifiers and visualizing their performance. ROC graphs are commonly used in medical decision making, and in recent years have been increasingly adopted in the machine learning and data mining research communities. Although ROC graphs are apparently simple, there are some common misconceptions and pitfalls when using them in practice. This article serves both as a tutorial introduction to ROC graphs and as a practical guide for using them in research. Keywords: 1
Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers
 In Proceedings of the Eighteenth International Conference on Machine Learning
, 2001
"... Accurate, wellcalibrated estimates of class membership probabilities are needed in many supervised learning applications, in particular when a costsensitive decision must be made about examples with exampledependent costs. This paper presents simple but successful methods for obtaining calibrated ..."
Abstract

Cited by 106 (4 self)
 Add to MetaCart
Accurate, wellcalibrated estimates of class membership probabilities are needed in many supervised learning applications, in particular when a costsensitive decision must be made about examples with exampledependent costs. This paper presents simple but successful methods for obtaining calibrated probability estimates from decision tree and naive Bayesian classifiers. Using the large and challenging KDD'98 contest dataset as a testbed, we report the results of a detailed experimental comparison of ten methods, according to four evaluation measures. We conclude that binning succeeds in significantly improving naive Bayesian probability estimates, while for improving decision tree probability estimates, we recommend smoothing by estimation and a new variant of pruning that we call curtailment.
3DJury: A simple approach to improve protein structure predictions
 Bioinformatics
"... Motivation: Consensus structure prediction methods (metapredictors) have higher accuracy than individual structure prediction algorithms (their components). The goal for the development of the 3DJury system is to create a simple but powerful procedure for generating metapredictions using variable ..."
Abstract

Cited by 99 (13 self)
 Add to MetaCart
(Show Context)
Motivation: Consensus structure prediction methods (metapredictors) have higher accuracy than individual structure prediction algorithms (their components). The goal for the development of the 3DJury system is to create a simple but powerful procedure for generating metapredictions using variable sets of models obtained from diverse sources. The resulting protocol should help to improve the quality of structural annotations of novel proteins. Results: The 3DJury system generates metapredictions from sets of models created using variable methods. It is not necessary to know prior characteristics of the methods. The system is able to utilize immediately new components (additional prediction providers). The accuracy of the system is comparable with other welltuned prediction servers. The algorithm resembles methods of selecting models generated using ab initio folding simulations. It is simple and offers a portable solution to improve the accuracy of other protein structure prediction protocols. Availability: The 3DJury system is available via the Structure Prediction Meta Server
The Effect of Class Distribution on Classifier Learning: An Empirical Study
, 2001
"... In this article we analyze the effect of class distribution on classifier learning. We begin by describing the different ways in which class distribution affects learning and how it affects the evaluation of learned classifiers. We then present the results of two comprehensive experimental studie ..."
Abstract

Cited by 86 (2 self)
 Add to MetaCart
In this article we analyze the effect of class distribution on classifier learning. We begin by describing the different ways in which class distribution affects learning and how it affects the evaluation of learned classifiers. We then present the results of two comprehensive experimental studies. The first study compares the performance of classifiers generated from unbalanced data sets with the performance of classifiers generated from balanced versions of the same data sets. This comparison allows us to isolate and quantify the effect that the training set's class distribution has on learning and contrast the performance of the classifiers on the minority and majority classes. The second study assesses what distribution is "best" for training, with respect to two performance measures: classification accuracy and the area under the ROC curve (AUC). A tacit assumption behind much research on classifier induction is that the class distribution of the training data should match the "natural" distribution of the data. This study shows that the naturally occurring class distribution often is not best for learning, and often substantially better performance can be obtained by using a different class distribution. Understanding how classifier performance is affected by class distribution can help practitioners to choose training datain realworld situations the number of training examples often must be limited due to computational costs or the costs associated with procuring and preparing the data. 1.
Using Rule Sets to Maximize ROC Performance
, 2001
"... Rules are commonly used for classification because they are modular, intelligible and easy to learn. Existing work in classification rule learning assumes the goal is to produce categorical classifications to maximize classification accuracy. Recent work in machine learning has pointed out the limit ..."
Abstract

Cited by 38 (3 self)
 Add to MetaCart
Rules are commonly used for classification because they are modular, intelligible and easy to learn. Existing work in classification rule learning assumes the goal is to produce categorical classifications to maximize classification accuracy. Recent work in machine learning has pointed out the limitations of classification accuracy: when class distributions are skewed, or error costs are unequal, an accuracy maximizing rule set can perform poorly. A more flexible use of a rule set is to produce instance scores indicating the likelihood that an instance belongs to a given class. With such an ability, we can apply rulesets effectively when distributions are skewed or error costs are unequal. This paper empirically investigates different strategies for evaluating rule sets when the goal is to maximize the scoring (ROC) performance.
Bayesian approaches to failure prediction for disk drives
 In Proc. 18th ICML
, 2001
"... Hard disk drive failures are rare but are often costly. The ability to predict failures is important to consumers, drive manufacturers, and computer system manufacturers alike. In this paper we investigate the abilities of two Bayesian methods to predict disk drive failures based on measurements of ..."
Abstract

Cited by 33 (2 self)
 Add to MetaCart
(Show Context)
Hard disk drive failures are rare but are often costly. The ability to predict failures is important to consumers, drive manufacturers, and computer system manufacturers alike. In this paper we investigate the abilities of two Bayesian methods to predict disk drive failures based on measurements of drive internal conditions. We first view the problem from an anomaly detection stance. We introduce a mixture model of naive Bayes submodels (i.e. clusters) that is trained using expectationmaximization. The second method is a naive Bayes classifier, a supervised learning approach. Both methods are tested on realworld data concerning 1936 drives. The predictive accuracy of both algorithms is far higher than the accuracy of thresholding methods used in the disk drive industry today. 1.
Regression Error Characteristic CurVes
 Proceedings of the 20th International Conference on Machine Learning
, 2003
"... Receiver Operating Characteristic (ROC) curves provide a powerful tool for visualizing and comparing classification results. Regression Error Characteristic (REC) curves generalize ROC curves to regression. REC curves plot the error tolerance on the xaxis versus the percentage of points predicted wi ..."
Abstract

Cited by 22 (0 self)
 Add to MetaCart
(Show Context)
Receiver Operating Characteristic (ROC) curves provide a powerful tool for visualizing and comparing classification results. Regression Error Characteristic (REC) curves generalize ROC curves to regression. REC curves plot the error tolerance on the xaxis versus the percentage of points predicted within the tolerance on the yaxis. The resulting curve estimates the cumulative distribution function of the error. The REC curve visually presents commonlyused statistics. The areaoverthecurve (AOC) is a biased estimate of the expected error. The R 2 value can be estimated using the ratio of the AOC for a given model to the AOC for the null model. Users can quickly assess the relative merits of many regression functions by examining the relative position of their REC curves. The shape of the curve reveals additional information that can be used to guide modeling. 1.
Combining decision trees and neural networks for drug discovery
 In Genetic Programming, Proceedings of the 5th European Conference, EuroGP 2002
, 2002
"... Abstract. Genetic programming (GP) offers a generic method of automatically fusing together classifiers using their receiver operating characteristics (ROC) to yield superior ensembles. We combine decision trees (C4.5) and artificial neural networks (ANN) on a difficult pharmaceutical data mining (K ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
Abstract. Genetic programming (GP) offers a generic method of automatically fusing together classifiers using their receiver operating characteristics (ROC) to yield superior ensembles. We combine decision trees (C4.5) and artificial neural networks (ANN) on a difficult pharmaceutical data mining (KDD) drug discovery application. Specifically predicting inhibition of a P450 enzyme. Training data came from high throughput screening (HTS) runs. The evolved model may be used to predict behaviour of virtual (i.e. yet to be manufactured) chemicals. Measures to reduce over fitting are also described. 1
Comparison of adaboost and genetic programming for combining neural networks for drug discovery
 University of Essex, UK
, 2003
"... Abstract. Genetic programming (GP) based data fusion and AdaBoost can both improve in vitro prediction of Cytochrome P450 activity by combining artificial neural networks (ANN). Pharmaceutical drug design data provided by high throughput screening (HTS) is used to train many base ANN classifiers. In ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
Abstract. Genetic programming (GP) based data fusion and AdaBoost can both improve in vitro prediction of Cytochrome P450 activity by combining artificial neural networks (ANN). Pharmaceutical drug design data provided by high throughput screening (HTS) is used to train many base ANN classifiers. In data mining (KDD) we must avoid over fitting. The ensembles do extrapolate from the training data to other unseen molecules. I.e. they predict inhibition of a P450 enzyme by compounds unlike the chemicals used to train them. Thus the models might provide in silico screens of virtual chemicals as well as physical ones from Glaxo SmithKline (GSK)’s cheminformatics database. The receiver operating characteristics (ROC) of boosted and evolved ensemble are given. 1