Results 1  10
of
14
Bestfirst decision tree learning
 University of Waikato
, 2007
"... Decision trees are potentially powerful predictors and explicitly represent the structure of a dataset. Standard decision tree learners such as C4.5 expand nodes in depthfirst order (Quinlan, 1993), while in bestfirst decision tree learners the ”best ” node is expanded first. The ”best ” node is t ..."
Abstract

Cited by 27 (0 self)
 Add to MetaCart
(Show Context)
Decision trees are potentially powerful predictors and explicitly represent the structure of a dataset. Standard decision tree learners such as C4.5 expand nodes in depthfirst order (Quinlan, 1993), while in bestfirst decision tree learners the ”best ” node is expanded first. The ”best ” node is the node whose split leads to maximum reduction of impurity (e.g. Gini index or information in this thesis) among all nodes available for splitting. The resulting tree will be the same when fully grown, just the order in which it is built is different. In practice, some branches of a fullyexpanded tree do not truly reflect the underlying information in the domain. This problem is known as overfitting and is mainly caused by noisy data. Pruning is necessary to avoid overfitting the training data, and discards those parts that are not predictive of future data. Bestfirst node expansion enables us to investigate new pruning techniques by determining the number of expansions performed based on crossvalidation. This thesis first introduces the algorithm for building binary bestfirst decision trees for classification problems. Then, it investigates two new pruning methods that
Permutation Tests for Studying Classifier Performance
"... Abstract—We explore the framework of permutationbased pvalues for assessing the behavior of the classification error. In this paper we study two simple permutation tests. The first test estimates the null distribution by permuting the labels in the data; this has been used extensively in classific ..."
Abstract

Cited by 19 (0 self)
 Add to MetaCart
(Show Context)
Abstract—We explore the framework of permutationbased pvalues for assessing the behavior of the classification error. In this paper we study two simple permutation tests. The first test estimates the null distribution by permuting the labels in the data; this has been used extensively in classification problems in computational biology. The second test produces permutations of the features within classes, inspired by restricted randomization techniques traditionally used in statistics. We study the properties of these tests and present an extensive empirical evaluation on real and synthetic data. Our analysis shows that studying the classification error via permutation tests is effective; in particular, the restricted permutation test clearly reveals whether the classifier exploits the interdependency between the features in the data. Keywordsclassification, labeled data, permutation tests, restricted randomization, significance testing I.
An Analysis of Reduced Error Pruning
 Journal of Artificial Intelligence Research
, 2001
"... Topdown induction of decision trees has been observed to suffer from the inadequate functioning of the pruning phase. In particular, it is known that the size of the resulting tree grows linearly with the sample size, even though the accuracy of the tree does not improve. Reduced Error Pruning is a ..."
Abstract

Cited by 16 (4 self)
 Add to MetaCart
(Show Context)
Topdown induction of decision trees has been observed to suffer from the inadequate functioning of the pruning phase. In particular, it is known that the size of the resulting tree grows linearly with the sample size, even though the accuracy of the tree does not improve. Reduced Error Pruning is an algorithm that has been used as a representative technique in attempts to explain the problems of decision tree learning. In this paper we present analyses of Reduced Error Pruning in three different settings. First we study the basic algorithmic properties of the method, properties that hold independent of the input decision tree and pruning examples. Then we examine a situation that intuitively should lead to the subtree under consideration to be replaced by a leaf node, one in which the class label and attribute values of the pruning examples are independent of each other. This analysis is conducted under two different assumptions. The general analysis shows that the pruning probability of a node fitting pure noise is bounded by a function that decreases exponentially as the size of the tree grows. In a specific analysis we assume that the examples are distributed uniformly to the tree. This assumption lets us approximate the number of subtrees that are pruned because they do not receive any pruning examples. This paper clarifies the different variants of the Reduced Error Pruning algorithm, brings new insight to its algorithmic properties, analyses the algorithm with less imposed assumptions than before, and includes the previously overlooked empty subtrees to the analysis.
Anytime induction of lowcost, lowerror classifiers: a samplingbased approach
, 2008
"... Machine learning techniques are gaining prevalence in the production of a wide range of classifiers for complex realworld applications with nonuniform testing and misclassification costs. The increasing complexity of these applications poses a real challenge to resource management during learning a ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Machine learning techniques are gaining prevalence in the production of a wide range of classifiers for complex realworld applications with nonuniform testing and misclassification costs. The increasing complexity of these applications poses a real challenge to resource management during learning and classification. In this work we introduce ACT (anytime costsensitive tree learner), a novel framework for operating in such complex environments. ACT is an anytime algorithm that allows learning time to be increased in return for lower classification costs. It builds a tree topdown and exploits additional time resources to obtain better estimations for the utility of the different candidate splits. Using sampling techniques, ACT approximates the cost of the subtree under each candidate split and favors the one with a minimal cost. As a stochastic algorithm, ACT is expected to be able to escape local minima, into which greedy methods may be trapped. Experiments with a variety of datasets were conducted to compare ACT to the stateoftheart costsensitive tree learners. The results show that for the majority of domains ACT produces significantly less costly trees. ACT also exhibits good anytime behavior with diminishing returns. 1.
Applying Machine Learning To Programming By Demonstration
, 2004
"... Familiar' is a tool that helps endusers automate iterative tasks in their applications by showing examples of what they want to do. It observes the user's actions, predicts what they will do next, and then o#ers to complete their task. Familiar learns in two ways. First, it creates a mode ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Familiar' is a tool that helps endusers automate iterative tasks in their applications by showing examples of what they want to do. It observes the user's actions, predicts what they will do next, and then o#ers to complete their task. Familiar learns in two ways. First, it creates a model, based on data gathered from training tasks, that selects the best prediction from among several candidates. Experiments show that decision trees outperform heuristic methods, and can be further improved by incrementally updating the classifier at task time. Second, it uses decision stumps inferred from analogous examples in the event trace to predict the parameters of conditional rules. Because data is sparsefor most users balk at giving more than a few training examplespermutation tests are used to calculate the statistical significance of each stump, successfully eliminating bias towards attributes with many di#erent values.
Tree Pruning With Subadditive Penalties
"... Abstract—In this paper we study the problem of pruning a binary tree by minimizing, over all pruned subtrees of the given tree, an objective function that combines an additive cost term with a penalty term that depends only on tree size. We present algorithms for general sizebased penalties, althou ..."
Abstract
 Add to MetaCart
Abstract—In this paper we study the problem of pruning a binary tree by minimizing, over all pruned subtrees of the given tree, an objective function that combines an additive cost term with a penalty term that depends only on tree size. We present algorithms for general sizebased penalties, although our focus is on subadditive penalties (roughly, penalties that grow more slowly than linear penalties with increasing tree size). Such penalties are motivated by recent results in statistical learning theory for decision trees, but may have wider application as well. We show that the family of pruned subtrees induced by a subadditive penalty is a subset of the family induced by an additive penalty. This implies (by known results about additive penalties) that the family induced by a subadditive penalty 1) is nested; 2) is unique; and 3) can be computed efficiently. It also implies that, when a single tree is to be selected by crossvalidation from the family of prunings, subadditive penalties will never present a richer set of options than an additive penalty. Index Terms—Decision trees, nonadditive penalties, subadditive penalties, tree pruning. I.
Structure and Majority Classes in Decision Tree Learning
"... To provide good classification accuracy on unseen examples, a decision tree, learned by an algorithm such as ID3, must have sufficient structure and also identify the correct majority class in each of its leaves. If there are inadequacies in respect of either of these, the tree will have a percentag ..."
Abstract
 Add to MetaCart
To provide good classification accuracy on unseen examples, a decision tree, learned by an algorithm such as ID3, must have sufficient structure and also identify the correct majority class in each of its leaves. If there are inadequacies in respect of either of these, the tree will have a percentage classification rate below that of the maximum possible for the domain, namely (100Bayes error rate). An error decomposition is introduced which enables the relative contributions of deficiencies in structure and in incorrect determination of majority class to be isolated and quantified. A subdecomposition of majority class error permits separation of the sampling error at the leaves from the possible bias introduced by the attribute selection method of the induction algorithm. It is shown that sampling error can extend to 25 % when there are more than two classes. Decompositions are obtained from experiments on several data sets. For ID3, the effect of selection bias is shown to vary from being statistically nonsignificant to being quite substantial, with the latter appearing to be associated with a simple underlying model.
Decision Tree Induction using Adaptive FSA
"... This paper introduces a new algorithm for the induction of decision trees, based on adaptive techniques. One of the main feature of this algorithm is the application of automata theory to formalize the problem of decision tree induction and the use of a hybrid approach, which integrates both syntact ..."
Abstract
 Add to MetaCart
This paper introduces a new algorithm for the induction of decision trees, based on adaptive techniques. One of the main feature of this algorithm is the application of automata theory to formalize the problem of decision tree induction and the use of a hybrid approach, which integrates both syntactical and statistical strategies. Some experimental results are also presented indicating that the adaptive approach is useful in the construction of efficient learning algorithms.
Keywords: Machine Learning, Decision Tree Induction, Adaptive Automata.