Results 1  10
of
13
Bestfirst decision tree learning
 University of Waikato
, 2007
"... Decision trees are potentially powerful predictors and explicitly represent the structure of a dataset. Standard decision tree learners such as C4.5 expand nodes in depthfirst order (Quinlan, 1993), while in bestfirst decision tree learners the ”best ” node is expanded first. The ”best ” node is t ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
Decision trees are potentially powerful predictors and explicitly represent the structure of a dataset. Standard decision tree learners such as C4.5 expand nodes in depthfirst order (Quinlan, 1993), while in bestfirst decision tree learners the ”best ” node is expanded first. The ”best ” node is the node whose split leads to maximum reduction of impurity (e.g. Gini index or information in this thesis) among all nodes available for splitting. The resulting tree will be the same when fully grown, just the order in which it is built is different. In practice, some branches of a fullyexpanded tree do not truly reflect the underlying information in the domain. This problem is known as overfitting and is mainly caused by noisy data. Pruning is necessary to avoid overfitting the training data, and discards those parts that are not predictive of future data. Bestfirst node expansion enables us to investigate new pruning techniques by determining the number of expansions performed based on crossvalidation. This thesis first introduces the algorithm for building binary bestfirst decision trees for classification problems. Then, it investigates two new pruning methods that
An Analysis of Reduced Error Pruning
 Journal of Artificial Intelligence Research
, 2001
"... Topdown induction of decision trees has been observed to suffer from the inadequate functioning of the pruning phase. In particular, it is known that the size of the resulting tree grows linearly with the sample size, even though the accuracy of the tree does not improve. Reduced Error Pruning is a ..."
Abstract

Cited by 13 (4 self)
 Add to MetaCart
Topdown induction of decision trees has been observed to suffer from the inadequate functioning of the pruning phase. In particular, it is known that the size of the resulting tree grows linearly with the sample size, even though the accuracy of the tree does not improve. Reduced Error Pruning is an algorithm that has been used as a representative technique in attempts to explain the problems of decision tree learning. In this paper we present analyses of Reduced Error Pruning in three different settings. First we study the basic algorithmic properties of the method, properties that hold independent of the input decision tree and pruning examples. Then we examine a situation that intuitively should lead to the subtree under consideration to be replaced by a leaf node, one in which the class label and attribute values of the pruning examples are independent of each other. This analysis is conducted under two different assumptions. The general analysis shows that the pruning probability of a node fitting pure noise is bounded by a function that decreases exponentially as the size of the tree grows. In a specific analysis we assume that the examples are distributed uniformly to the tree. This assumption lets us approximate the number of subtrees that are pruned because they do not receive any pruning examples. This paper clarifies the different variants of the Reduced Error Pruning algorithm, brings new insight to its algorithmic properties, analyses the algorithm with less imposed assumptions than before, and includes the previously overlooked empty subtrees to the analysis.
Permutation Tests for Studying Classifier Performance
"... Abstract—We explore the framework of permutationbased pvalues for assessing the behavior of the classification error. In this paper we study two simple permutation tests. The first test estimates the null distribution by permuting the labels in the data; this has been used extensively in classific ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
Abstract—We explore the framework of permutationbased pvalues for assessing the behavior of the classification error. In this paper we study two simple permutation tests. The first test estimates the null distribution by permuting the labels in the data; this has been used extensively in classification problems in computational biology. The second test produces permutations of the features within classes, inspired by restricted randomization techniques traditionally used in statistics. We study the properties of these tests and present an extensive empirical evaluation on real and synthetic data. Our analysis shows that studying the classification error via permutation tests is effective; in particular, the restricted permutation test clearly reveals whether the classifier exploits the interdependency between the features in the data. Keywordsclassification, labeled data, permutation tests, restricted randomization, significance testing I.
Anytime induction of lowcost, lowerror classifiers: a samplingbased approach
, 2008
"... Machine learning techniques are gaining prevalence in the production of a wide range of classifiers for complex realworld applications with nonuniform testing and misclassification costs. The increasing complexity of these applications poses a real challenge to resource management during learning a ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Machine learning techniques are gaining prevalence in the production of a wide range of classifiers for complex realworld applications with nonuniform testing and misclassification costs. The increasing complexity of these applications poses a real challenge to resource management during learning and classification. In this work we introduce ACT (anytime costsensitive tree learner), a novel framework for operating in such complex environments. ACT is an anytime algorithm that allows learning time to be increased in return for lower classification costs. It builds a tree topdown and exploits additional time resources to obtain better estimations for the utility of the different candidate splits. Using sampling techniques, ACT approximates the cost of the subtree under each candidate split and favors the one with a minimal cost. As a stochastic algorithm, ACT is expected to be able to escape local minima, into which greedy methods may be trapped. Experiments with a variety of datasets were conducted to compare ACT to the stateoftheart costsensitive tree learners. The results show that for the majority of domains ACT produces significantly less costly trees. ACT also exhibits good anytime behavior with diminishing returns. 1.
Applying Machine Learning To Programming By Demonstration
, 2004
"... Familiar' is a tool that helps endusers automate iterative tasks in their applications by showing examples of what they want to do. It observes the user's actions, predicts what they will do next, and then o#ers to complete their task. Familiar learns in two ways. First, it creates a mode ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Familiar' is a tool that helps endusers automate iterative tasks in their applications by showing examples of what they want to do. It observes the user's actions, predicts what they will do next, and then o#ers to complete their task. Familiar learns in two ways. First, it creates a model, based on data gathered from training tasks, that selects the best prediction from among several candidates. Experiments show that decision trees outperform heuristic methods, and can be further improved by incrementally updating the classifier at task time. Second, it uses decision stumps inferred from analogous examples in the event trace to predict the parameters of conditional rules. Because data is sparsefor most users balk at giving more than a few training examplespermutation tests are used to calculate the statistical significance of each stump, successfully eliminating bias towards attributes with many di#erent values.
Structure and Majority Classes in Decision Tree Learning
"... To provide good classification accuracy on unseen examples, a decision tree, learned by an algorithm such as ID3, must have sufficient structure and also identify the correct majority class in each of its leaves. If there are inadequacies in respect of either of these, the tree will have a percentag ..."
Abstract
 Add to MetaCart
To provide good classification accuracy on unseen examples, a decision tree, learned by an algorithm such as ID3, must have sufficient structure and also identify the correct majority class in each of its leaves. If there are inadequacies in respect of either of these, the tree will have a percentage classification rate below that of the maximum possible for the domain, namely (100Bayes error rate). An error decomposition is introduced which enables the relative contributions of deficiencies in structure and in incorrect determination of majority class to be isolated and quantified. A subdecomposition of majority class error permits separation of the sampling error at the leaves from the possible bias introduced by the attribute selection method of the induction algorithm. It is shown that sampling error can extend to 25 % when there are more than two classes. Decompositions are obtained from experiments on several data sets. For ID3, the effect of selection bias is shown to vary from being statistically nonsignificant to being quite substantial, with the latter appearing to be associated with a simple underlying model.
Inferring and Revising Theories with Confidence: Analyzing the 1901 Canadian Census
, 2000
"... This paper shows how machine learning can help historians analyze and understand important social phenomena. Using data from the Canadian census of 1901, we discover the influences on bilingualism in Canada at beginning of the last century. The discovered theories partly agree with, and partly co ..."
Abstract
 Add to MetaCart
This paper shows how machine learning can help historians analyze and understand important social phenomena. Using data from the Canadian census of 1901, we discover the influences on bilingualism in Canada at beginning of the last century. The discovered theories partly agree with, and partly complement the existing views of historians on this question. Our approach, based around a decision tree, not only infers theories directly from data but also evaluates existing theories and revises them to improve their consistency with the data. One novel aspect of this work is the use of confidence intervals to determine which factors are both statistically and practically significant, and thus contribute appreciably to the overall accuracy of the theory. When inducing a decision tree directly from data, confidence intervals determine when new tests should be added. If an existing theory is being evaluated, confidence intervals also determine when old tests should be replaced or deleted to improve the theory. Our aim is to minimize the changes made to an existing theory to accommodate the new data. To this end, we propose a semantic measure of similarity between trees and demonstrate how this can be used to limit the changes made.
c ○ World Scientific Publishing Company IS ERRORBASED PRUNING REDEEMABLE?
, 2003
"... Error based pruning can be used to prune a decision tree and it does not require the use of validation data. It is implemented in the widely used C4.5 decision tree software. It uses a parameter, the certainty factor, that affects the size of the pruned tree. Several researchers have compared error ..."
Abstract
 Add to MetaCart
Error based pruning can be used to prune a decision tree and it does not require the use of validation data. It is implemented in the widely used C4.5 decision tree software. It uses a parameter, the certainty factor, that affects the size of the pruned tree. Several researchers have compared error based pruning with other approaches, and have shown results that suggest that error based pruning results in larger trees that give no increase in accuracy. They further suggest that as more data is added to the training set, the tree size after applying error based pruning continues to grow even though there is no increase in accuracy. It appears that these results were obtained with the default certainty factor value. Here, we show that varying the certainty factor allows significantly smaller trees to be obtained with minimal or no accuracy loss. Also, the growth of tree size with added data can be halted with an appropriate choice of certainty factor. Methods of determining the certainty factor are discussed for both small and large data sets. Experimental results support the conclusion that error based pruning can be used to produce appropriately sized trees with good accuracy when compared with reduced error pruning.