Results 1 - 10
of
23
Regression Trees With Unbiased Variable Selection and Interaction Detection
- STATISTICA SINICA
, 2002
"... We propose an algorithm for regression tree construction called GUIDE. It is specifically designed to eliminate variable selection bias, a problem that can undermine the reliability of inferences from a tree structure. GUIDE controls bias by employing chi-square analysis of residuals and bootstrap c ..."
Abstract
-
Cited by 40 (9 self)
- Add to MetaCart
We propose an algorithm for regression tree construction called GUIDE. It is specifically designed to eliminate variable selection bias, a problem that can undermine the reliability of inferences from a tree structure. GUIDE controls bias by employing chi-square analysis of residuals and bootstrap calibration of significance probabilities. This approach allows fast computation speed, natural extension to data sets with categorical variables, and direct detection of local two-variable interactions. Previous algorithms are not unbiased and are insensitive to local interactions during split selection. The speed of GUIDE enables two further enhancements—complex modeling at the terminal nodes, such as polynomial or best simple linear models, and bagging. In an experiment with real data sets, the prediction mean square error of the piecewise constant GUIDE model is within ±20 % of that of CART�. Piecewise linear GUIDE models are more accurate; with bagging they can outperform the spline-based MARS � method.
Unbiased recursive partitioning: A conditional inference framework
- JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS
, 2006
"... Recursive binary partitioning is a popular tool for regression analysis. Two fundamental problems of exhaustive search procedures usually applied to fit such models have been known for a long time: overfitting and a selection bias towards covariates with many possible splits or missing values. While ..."
Abstract
-
Cited by 25 (3 self)
- Add to MetaCart
Recursive binary partitioning is a popular tool for regression analysis. Two fundamental problems of exhaustive search procedures usually applied to fit such models have been known for a long time: overfitting and a selection bias towards covariates with many possible splits or missing values. While pruning procedures are able to solve the overfitting problem, the variable selection bias still seriously affects the interpretability of tree-structured regression models. For some special cases unbiased procedures have been suggested, however lacking a common theoretical foundation. We propose a unified framework for recursive partitioning which embeds tree-structured regression models into a well defined theory of conditional inference procedures. Stopping criteria based on multiple test procedures are implemented and it is shown that the predictive performance of the resulting trees is as good as the performance of established exhaustive search procedures. It turns out that the partitions and therefore the models induced by both approaches are structurally different, confirming the need for an unbiased variable selection. Moreover, it is shown that the prediction accuracy of trees with early stopping is equivalent to the prediction accuracy of pruned trees with unbiased variable selection. The methodology presented here is applicable to all kinds of regression problems, including nominal, ordinal, numeric, censored as well as multivariate response variables and arbitrary measurement scales of the covariates. Data from studies on glaucoma classification, node positive breast cancer survival and mammography experience are re-analyzed.
Unbiased split selection for classification trees based on the Gini Index
, 2006
"... The Gini gain is one of the most common variable selection criteria in machine learning. We derive the exact distribution of the maximally selected Gini gain in the context of binary classification using continuous predictors by means of a combinatorial approach. This distribution provides a formal ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
The Gini gain is one of the most common variable selection criteria in machine learning. We derive the exact distribution of the maximally selected Gini gain in the context of binary classification using continuous predictors by means of a combinatorial approach. This distribution provides a formal support for variable selection bias in favor of variables with a high amount of missing values when the Gini gain is used as split selection criterion, and we suggest to use the resulting p-value as an unbiased split selection criterion in recursive partitioning algorithms. We demonstrate the efficiency of our novel method in simulation- and real data- studies from veterinary gynecology in the context of binary classification and continuous predictor variables with different numbers of missing values. Our method is extendible to categorical and ordinal predictor variables and to other split selection criteria such as the cross-entropy criterion. 1
Anytime learning of decision trees
- Journal of Machine Learning Research
"... The majority of existing algorithms for learning decision trees are greedy—a tree is induced topdown, making locally optimal decisions at each node. In most cases, however, the constructed tree is not globally optimal. Even the few non-greedy learners cannot learn good trees when the concept is diff ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
The majority of existing algorithms for learning decision trees are greedy—a tree is induced topdown, making locally optimal decisions at each node. In most cases, however, the constructed tree is not globally optimal. Even the few non-greedy learners cannot learn good trees when the concept is difficult. Furthermore, they require a fixed amount of time and are not able to generate a better tree if additional time is available. We introduce a framework for anytime induction of decision trees that overcomes these problems by trading computation speed for better tree quality. Our proposed family of algorithms employs a novel strategy for evaluating candidate splits. A biased sampling of the space of consistent trees rooted at an attribute is used to estimate the size of the minimal tree under that attribute, and an attribute with the smallest expected tree is selected. We present two types of anytime induction algorithms: a contract algorithm that determines the sample size on the basis of a pre-given allocation of time, and an interruptible algorithm that starts with a greedy tree and continuously improves subtrees by additional sampling. Experimental results indicate that, for several hard concepts, our proposed approach exhibits good anytime behavior and yields significantly better decision trees when more time is available.
A note on split selection bias in classification trees
- Computational Statistics and Data Analysis
, 2004
"... A common approach to split selection in classification trees is to search through all possible splits generated by predictor variables. A splitting criterion is then used to evaluate those splits and the one with the largest criterion value is usually chosen to actually channel samples into correspo ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
A common approach to split selection in classification trees is to search through all possible splits generated by predictor variables. A splitting criterion is then used to evaluate those splits and the one with the largest criterion value is usually chosen to actually channel samples into corresponding subnodes. However, this greedy method is biased in variable selection when the numbers of the available split points for each variable are different. Such result may thus hamper the intuitively appealing nature of classification trees. The problem of the split selection bias for two-class tasks with numerical predictors is examined. The statistical explanation of its existence is given and a solution based on the P-values is provided, when the Pearson chisquare statistic is used as the splitting criterion. keyword Cramér V 2 statistic; Kolmogorov-Smirnov statistic, P-value; Pearson chi-square statistic 1
Statistical sources of variable selection bias in classification tree algorithms based on the gini index
, 2005
"... Evidence for variable selection bias in classification tree algorithms based on the Gini Index is reviewed from the literature and embedded into a broader explanatory scheme: Variable selection bias in classification tree algorithms based on the Gini Index can be caused not only by the statistical ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Evidence for variable selection bias in classification tree algorithms based on the Gini Index is reviewed from the literature and embedded into a broader explanatory scheme: Variable selection bias in classification tree algorithms based on the Gini Index can be caused not only by the statistical effect of multiple comparisons, but also by an increasing estimation bias and variance of the splitting criterion when plug-in estimates of entropy measures like the Gini Index are employed. The relevance of these sources of variable selection bias in the different simulation study designs is examined. Variable selection bias due to the explored sources applies to all classification tree algorithms based on empirical entropy measures like the Gini Index, Deviance and Information Gain, and to both binary and multiway splitting algorithms.
Combining methods in supervised classification: a comparative study on discrete and continuous problems
- in "REVSTAT
"... • Often in discriminant analysis several models are estimated but based on some validation criterion, a single model is selected. In the purpose of taking profit from several potential models, classification rules combining models are considered in this article. More precisely two ways of combining ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
• Often in discriminant analysis several models are estimated but based on some validation criterion, a single model is selected. In the purpose of taking profit from several potential models, classification rules combining models are considered in this article. More precisely two ways of combining models are considered: a serial combining method and a hierarchical combining method. Serial combining is a convex linear combination of a finite number of models. Hierarchical combining method leads to nested models structured in a binary tree. In this paper, several combining methods resorting from both points of view are presented and their performances are assessed on discrete and continuous classification problems. Key-Words: • Gaussian classification; eigenvalue decomposition; multinomial classification; conditional independence model; convex combining; hierarchical combining. AMS Subject Classification:
Model Selection in Omnivariate Decision Trees
"... Abstract. We propose an omnivariate decision tree architecture which contains univariate, multivariate linear or nonlinear nodes, matching the complexityofthenodetothecomplexityofthedatareachingthatnode. We compare the use of different model selection techniques including AIC, BIC, and CV to choose ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract. We propose an omnivariate decision tree architecture which contains univariate, multivariate linear or nonlinear nodes, matching the complexityofthenodetothecomplexityofthedatareachingthatnode. We compare the use of different model selection techniques including AIC, BIC, and CV to choose between the three types of nodes on standard datasets from the UCI repository and see that such omnivariate trees with a small percentage of multivariate nodes close to the root generalize better than pure trees with the same type of node everywhere. CV produces simpler trees than AIC and BIC without sacrificing from expected error. The only disadvantage of CV is its longer training time. 1
Growing and Visualizing Prediction Paths Trees in Market Basket Analysis
, 2002
"... This paper provides a new approach to Market Basket Analysis taking as measure unit the monetary value of each choice in the transaction. Furthermore, instead of association rules suitable prediction rules are defined in order to consider the causal links between different items. Two methodologies w ..."
Abstract
- Add to MetaCart
This paper provides a new approach to Market Basket Analysis taking as measure unit the monetary value of each choice in the transaction. Furthermore, instead of association rules suitable prediction rules are defined in order to consider the causal links between different items. Two methodologies will be presented in order to grow predictive paths through oriented graphs (either trees and neural networks) which facilitate the visual comparison among different basket typologies.

