Results 1  10
of
82
An Empirical Comparison of Supervised Learning Algorithms
 In Proc. 23 rd Intl. Conf. Machine learning (ICML’06
, 2006
"... A number of supervised learning methods have been introduced in the last decade. Unfortunately, the last comprehensive empirical evaluation of supervised learning was the Statlog Project in the early 90’s. We present a largescale empirical comparison between ten supervised learning methods: SVMs, n ..."
Abstract

Cited by 199 (6 self)
 Add to MetaCart
(Show Context)
A number of supervised learning methods have been introduced in the last decade. Unfortunately, the last comprehensive empirical evaluation of supervised learning was the Statlog Project in the early 90’s. We present a largescale empirical comparison between ten supervised learning methods: SVMs, neural nets, logistic regression, naive bayes, memorybased learning, random forests, decision trees, bagged trees, boosted trees, and boosted stumps. We also examine the effect that calibrating the models via Platt Scaling and Isotonic Regression has on their performance. An important aspect of our study is the use of a variety of performance criteria to evaluate the learning methods. 1.
AUC optimization vs. error rate minimization
 Advances in Neural Information Processing Systems
, 2004
"... The area under an ROC curve (AUC) is a criterion used in many applications to measure the quality of a classification algorithm. However, the objective function optimized in most of these algorithms is the error rate and not the AUC value. We give a detailed statistical analysis of the relationship ..."
Abstract

Cited by 149 (2 self)
 Add to MetaCart
(Show Context)
The area under an ROC curve (AUC) is a criterion used in many applications to measure the quality of a classification algorithm. However, the objective function optimized in most of these algorithms is the error rate and not the AUC value. We give a detailed statistical analysis of the relationship between the AUC and the error rate, including the first exact expression of the expected value and the variance of the AUC for a fixed error rate. Our results show that the average AUC is monotonically increasing as a function of the classification accuracy, but that the standard deviation for uneven distributions and higher error rates can be large. Thus, algorithms designed to minimize the error rate may not lead to the best possible AUC values. We show that under certain conditions the global function optimized by the RankBoost algorithm is exactly the AUC. We report results of our experiments with RankBoost in several datasets that demonstrate the benefits of an algorithm specifically designed to globally optimize the AUC over other existing algorithms optimizing an approximation of the AUC or only locally optimizing the AUC. 1
Active Sampling for Class Probability Estimation and Ranking
 Machine Learning
, 2004
"... In many costsensitive environments class probability estimates are used by decision makers to evaluate the expected utility from a set of alternatives. Supervised learning can be used to build class probability estimates; however, it often is very costly to obtain training data with class labels ..."
Abstract

Cited by 78 (9 self)
 Add to MetaCart
(Show Context)
In many costsensitive environments class probability estimates are used by decision makers to evaluate the expected utility from a set of alternatives. Supervised learning can be used to build class probability estimates; however, it often is very costly to obtain training data with class labels. Active learning acquires data incrementally, at each phase identifying especially useful additional data for labeling, and can be used to economize on examples needed for learning. We outline the critical features of an active learner and present a samplingbased active learning method for estimating class probabilities and classbased rankings. BOOT STRAPLV identifies particularly informative new data for learning based on the variance in probability estimates, and uses weighted sampling to account for a potential example's informative value for the rest of the input space. We show empirically that the method reduces the number of data items that must be obtained and labeled, across a wide variety of domains. We investigate the contribution of the components of the algorithm and show that each provides valuable information to help identify informative examples. We also compare BOOTSTRAP LV with UNCERTAINTY SAMPLING, an existing active learning method designed to maximize classification accuracy. The results show that BOOTSTRAPLV uses fewer examples to exhibit a certain estimation accuracy and provide insights to the behavior of the algorithms. Finally, we experiment with another new active sampling algorithm drawing from both UNCERTAINTY SAMPLING and BOOTSTRAPLV and show that it is significantly more competitive with BOOTSTRAPLV compared to UNCERTAINTY SAMPLING. The analysis suggests more general implications for improving existing active sampling ...
Toward Intelligent Assistance for a Data Mining Process: An OntologyBased Approach for CostSensitive Classification
 IEEE Transactions on Knowledge and Data Engineering
, 2005
"... For more information, please visit our website at ..."
(Show Context)
Learning ensembles from bites: A scalable and accurate approach
"... Bagging and boosting are two popular ensemble methods that typically achieve better accuracy than a single classifier. These techniques have limitations on massive datasets, as the size of the dataset can be a bottleneck. Voting many classifiers built on small subsets of data ("pasting small vo ..."
Abstract

Cited by 48 (7 self)
 Add to MetaCart
Bagging and boosting are two popular ensemble methods that typically achieve better accuracy than a single classifier. These techniques have limitations on massive datasets, as the size of the dataset can be a bottleneck. Voting many classifiers built on small subsets of data ("pasting small votes") is a promising approach for learning from massive datasets, one that can utilize the power of boosting and bagging. We propose a framework for building hundreds or thousands of such classifiers on small subsets of data in a distributed environment. Experiments show this approach is fast, accurate, and scalable.
Distributionbased aggregation for relational learning with identifier attributes
 Machine Learning
, 2004
"... Feature construction through aggregation plays an essential role in modeling relational domains with onetomany relationships between tables. Onetomany relationships lead to bags (multisets) of related entities, from which predictive information must be captured. This paper focuses on aggregation ..."
Abstract

Cited by 42 (10 self)
 Add to MetaCart
Feature construction through aggregation plays an essential role in modeling relational domains with onetomany relationships between tables. Onetomany relationships lead to bags (multisets) of related entities, from which predictive information must be captured. This paper focuses on aggregation from categorical attributes that can take many values (e.g., object identifiers). We present a novel aggregation method as part of a relational learning system ACORA, that combines the use of vector distance and metadata about the classconditional distributions of attribute values. We provide a theoretical foundation for this approach deriving a “relational fixedeffect ” model within a Bayesian framework, and discuss the implications of identifier aggregation on the expressive power of the induced model. One advantage of using identifier attributes is the circumvention of limitations caused either by missing/unobserved object properties or by independence assumptions. Finally, we show empirically that the novel aggregators can generalize in the presence of identifier (and other highdimensional) attributes, and also explore the limitations of the applicability of the methods. 1
Handling missing values when applying classification models. Journal of machine learning research
, 2007
"... Much work has studied the effect of different treatments of missing values on model induction, but little work has analyzed treatments for the common case of missing values at prediction time. This paper first compares several different methods—predictive value imputation, the distributionbased impu ..."
Abstract

Cited by 37 (5 self)
 Add to MetaCart
Much work has studied the effect of different treatments of missing values on model induction, but little work has analyzed treatments for the common case of missing values at prediction time. This paper first compares several different methods—predictive value imputation, the distributionbased imputation used by C4.5, and using reduced models—for applying classification trees to instances with missing values (and also shows evidence that the results generalize to bagged trees and to logistic regression). The results show that for the two most popular treatments, each is preferable under different conditions. Strikingly the reducedmodels approach, seldom mentioned or used, consistently outperforms the other two methods, sometimes by a large margin. The lack of attention to reduced modeling may be due in part to its (perceived) expense in terms of computation or storage. Therefore, we then introduce and evaluate alternative, hybrid approaches that allow users to balance between more accurate but computationally expensive reduced modeling and the other, less accurate but less computationally expensive treatments. The results show that the hybrid methods can scale gracefully to the amount of investment in computation/storage, and that they outperform imputation even for small investments.
ROC Confidence Bands: An Empirical Evaluation
 In: Proceedings of the TwentySecond International Conference on Machine Learning
, 2005
"... This paper is about constructing confidence bands around ROC curves. We first introduce to the machine learning community three bandgenerating methods from the medical field, and evaluate how well they perform. Such confidence bands represent the region where the "true" ROC curve is expec ..."
Abstract

Cited by 21 (1 self)
 Add to MetaCart
(Show Context)
This paper is about constructing confidence bands around ROC curves. We first introduce to the machine learning community three bandgenerating methods from the medical field, and evaluate how well they perform. Such confidence bands represent the region where the "true" ROC curve is expected to reside, with the designated confidence level. To assess the containment of the bands we begin with a synthetic world where we know the true ROC curvespecifically, where the classconditional model scores are normally distributed. The only method that attains reasonable containment outofthebox produces nonparametric, "fixedwidth" bands (FWBs). Next we move to a context more appropriate for machine learning evaluations: bands that with a certain confidence level will bound the performance of the model on future data. We introduce a correction to account for the larger uncertainty, and the widened FWBs continue to have reasonable containment. Finally, we assess the bands on 10 relatively large benchmark data sets. We conclude by recommending these FWBs, noting that being nonparametric they are especially attractive for machine learning studies, where the score distributions (1) clearly are not normal, and (2) even for the same data set vary substantially from learning method to learning method.
Experiment databases: Towards an improved experimental methodology in machine learning
 Proceedings of the 11th European Conference on Principles and Practice of Knowledge Discovery in Databases
, 2007
"... Machine learning research often has a large experimental component. While the experimental methodology employed in machine learning has improved much over the years, repeatability of experiments and generalizability of results remain a concern. In this paper we propose a methodology based on the us ..."
Abstract

Cited by 21 (7 self)
 Add to MetaCart
Machine learning research often has a large experimental component. While the experimental methodology employed in machine learning has improved much over the years, repeatability of experiments and generalizability of results remain a concern. In this paper we propose a methodology based on the use of experiment databases. Experiment databases facilitate largescale experimentation, guarantee repeatability of experiments, improve reusability of experiments, help explicitating the conditions under which certain results are valid, and support quick hypothesis testing as well as hypothesis generation. We show that they have the potential to significantly increase the ease with which new results in machine learning can be obtained and correctly interpreted. Introduction Given that empirical assessment is central to machine learning research, it has repeatedly been argued that care should be taken to ensure that (published) experimental results can be interpreted correctly. To this aim, it should first be clear how the experiments can be reproduced, which can be achieved by providing a complete description of both the experimental setup (which algorithms to run with which parameters on which datasets, including how these settings were chosen) and the experimental procedure