Results 1  10
of
33
Statistical Comparisons of Classifiers over Multiple Data Sets
, 2006
"... While methods for comparing two learning algorithms on a single data set have been scrutinized for quite some time already, the issue of statistical tests for comparisons of more algorithms on multiple data sets, which is even more essential to typical machine learning studies, has been all but igno ..."
Abstract

Cited by 328 (0 self)
 Add to MetaCart
While methods for comparing two learning algorithms on a single data set have been scrutinized for quite some time already, the issue of statistical tests for comparisons of more algorithms on multiple data sets, which is even more essential to typical machine learning studies, has been all but ignored. This article reviews the current practice and then theoretically and empirically examines several suitable tests. Based on that, we recommend a set of simple, yet safe and robust nonparametric tests for statistical comparisons of classifiers: the Wilcoxon signed ranks test for comparison of two classifiers and the Friedman test with the corresponding posthoc tests for comparison of more classifiers over multiple data sets. Results of the latter can also be neatly presented with the newly introduced CD (critical difference) diagrams.
Data Mining Static Code Attributes to Learn Defect Predictors
 IEEE Transactions on Software Engineering
, 2007
"... ..."
Supervised Machine Learning: A Review of Classification Techniques. Informatica 31:249–268
, 2007
"... Supervised machine learning is the search for algorithms that reason from externally supplied instances to produce general hypotheses, which then make predictions about future instances. In other words, the goal of supervised learning is to build a concise model of the distribution of class labels i ..."
Abstract

Cited by 70 (0 self)
 Add to MetaCart
Supervised machine learning is the search for algorithms that reason from externally supplied instances to produce general hypotheses, which then make predictions about future instances. In other words, the goal of supervised learning is to build a concise model of the distribution of class labels in terms of predictor features. The resulting classifier is then used to assign class labels to the testing instances where the values of the predictor features are known, but the value of the class label is unknown. This paper describes various supervised machine learning classification techniques. Of course, a single article cannot be a complete review of all supervised machine learning classification algorithms (also known induction classification algorithms), yet we hope that the references cited will cover the major theoretical issues, guiding the researcher in interesting research directions and suggesting possible bias combinations that have yet to be explored. Povzetek: Podan je pregled metod strojnega učenja. 1
Evaluating the replicability of significance tests for comparing learning algorithms
 In PAKDD
, 2004
"... Abstract. Empirical research in learning algorithms for classification tasks generally requires the use of significance tests. The quality of a test is typically judged on Type I error (how often the test indicates a difference when it should not) and Type II error (how often it indicates no differe ..."
Abstract

Cited by 30 (0 self)
 Add to MetaCart
Abstract. Empirical research in learning algorithms for classification tasks generally requires the use of significance tests. The quality of a test is typically judged on Type I error (how often the test indicates a difference when it should not) and Type II error (how often it indicates no difference when it should). In this paper we argue that the replicability of a test is also of importance. We say that a test has low replicability if its outcome strongly depends on the particular random partitioning of the data that is used to perform it. We present empirical measures of replicability and use them to compare the performance of several popular tests in a realistic setting involving standard learning algorithms and benchmark datasets. Based on our results we give recommendations on which test to use. 1
Lookaheadbased Algorithms for Anytime Induction of Decision Trees
 In ICML’04
, 2004
"... The majority of the existing algorithms for learning decision trees are greedya tree is induced topdown, making locally optimal decisions at each node. In most cases, however, the constructed tree is not globally optimal. Furthermore, the greedy algorithms require a fixed amount of time and are no ..."
Abstract

Cited by 16 (4 self)
 Add to MetaCart
The majority of the existing algorithms for learning decision trees are greedya tree is induced topdown, making locally optimal decisions at each node. In most cases, however, the constructed tree is not globally optimal. Furthermore, the greedy algorithms require a fixed amount of time and are not able to generate a better tree if additional time is available. To overcome this problem, we present two lookaheadbased algorithms for anytime induction of decision trees, thus allowing tradeoff between tree quality and learning time. The first one is depthk lookahead, where a larger time allocation permits larger k. The second algorithm uses a novel strategy for evaluating candidate splits; a stochastic version of ID3 is repeatedly invoked to estimate the size of the tree in which each split results, and the one that minimizes the expected size is preferred. Experimental results indicate that for several hard concepts, our proposed approach exhibits good anytime behavior and yields significantly better decision trees when more time is available.
Anytime learning of decision trees
 Journal of Machine Learning Research
"... The majority of existing algorithms for learning decision trees are greedy—a tree is induced topdown, making locally optimal decisions at each node. In most cases, however, the constructed tree is not globally optimal. Even the few nongreedy learners cannot learn good trees when the concept is diff ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
The majority of existing algorithms for learning decision trees are greedy—a tree is induced topdown, making locally optimal decisions at each node. In most cases, however, the constructed tree is not globally optimal. Even the few nongreedy learners cannot learn good trees when the concept is difficult. Furthermore, they require a fixed amount of time and are not able to generate a better tree if additional time is available. We introduce a framework for anytime induction of decision trees that overcomes these problems by trading computation speed for better tree quality. Our proposed family of algorithms employs a novel strategy for evaluating candidate splits. A biased sampling of the space of consistent trees rooted at an attribute is used to estimate the size of the minimal tree under that attribute, and an attribute with the smallest expected tree is selected. We present two types of anytime induction algorithms: a contract algorithm that determines the sample size on the basis of a pregiven allocation of time, and an interruptible algorithm that starts with a greedy tree and continuously improves subtrees by additional sampling. Experimental results indicate that, for several hard concepts, our proposed approach exhibits good anytime behavior and yields significantly better decision trees when more time is available.
Estimating Replicability of Classifier Learning Experiments
 In Proceedings of ICML
, 2004
"... Replicability of machine learning experiments measures how likely it is that the outcome of one experiment is repeated when performed with a di#erent randomization of the data. In this paper, we present an estimator of replicability of an experiment that is e#cient. More precisely, the estimat ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
Replicability of machine learning experiments measures how likely it is that the outcome of one experiment is repeated when performed with a di#erent randomization of the data. In this paper, we present an estimator of replicability of an experiment that is e#cient. More precisely, the estimator is unbiased and has lowest variance in the class of estimators formed by a linear combination of outcomes of experiments on a given data set.
Collective Classification of Congressional FloorDebate Transcripts
"... This paper explores approaches to sentiment classification of U.S. Congressional floordebate transcripts. Collective classification techniques are used to take advantage of the informal citation structure present in the debates. We use a range of methods based on local and global formulations and in ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
This paper explores approaches to sentiment classification of U.S. Congressional floordebate transcripts. Collective classification techniques are used to take advantage of the informal citation structure present in the debates. We use a range of methods based on local and global formulations and introduce novel approaches for incorporating the outputs of machine learners into collective classification algorithms. Our experimental evaluation shows that the meanfield algorithm obtains the best results for the task, significantly outperforming the benchmark technique. 1
Ordering and finding the best of K>2 supervised learning algorithms
 IEEE T. Pattern. Anal
, 2006
"... Abstract—Given a data set and a number of supervised learning algorithms, we would like to find the algorithm with the smallest expected error. Existing pairwise tests allow a comparison of two algorithms only; range tests and ANOVA check whether multiple algorithms have the same expected error and ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
Abstract—Given a data set and a number of supervised learning algorithms, we would like to find the algorithm with the smallest expected error. Existing pairwise tests allow a comparison of two algorithms only; range tests and ANOVA check whether multiple algorithms have the same expected error and cannot be used for finding the smallest. We propose a methodology, the MultiTest algorithm, whereby we order supervised learning algorithms taking into account 1) the result of pairwise statistical tests on expected error (what the data tells us), and 2) our prior preferences, e.g., due to complexity. We define the problem in graphtheoretic terms and propose an algorithm to find the “best ” learning algorithm in terms of these two criteria, or in the more general case, order learning algorithms in terms of their “goodness. ” Simulation results using five classification algorithms on 30 data sets indicate the utility of the method. Our proposed method can be generalized to regression and other loss functions by using a suitable pairwise test. Index Terms—Machine learning, classifier design and evaluation, experimental design. æ 1