Results 1 -
5 of
5
Split Selection Methods for Classification Trees
- STATISTICA SINICA
, 1997
"... Classification trees based on exhaustive search algorithms tend to be biased towards selecting variables that afford more splits. As a result, such trees should be interpreted with caution. This article presents an algorithm called QUEST that has negligible bias. Its split selection strategy shares ..."
Abstract
-
Cited by 53 (7 self)
- Add to MetaCart
Classification trees based on exhaustive search algorithms tend to be biased towards selecting variables that afford more splits. As a result, such trees should be interpreted with caution. This article presents an algorithm called QUEST that has negligible bias. Its split selection strategy shares similarities with the FACT method, but it yields binary splits and the final tree can be selected by a direct stopping rule or by pruning. Real and simulated data are used to compare QUEST with the exhaustive search approach. QUEST is shown to be substantially faster and the size and classification accuracy of its trees are typically comparable to those of exhaustive search.
Regression Trees With Unbiased Variable Selection and Interaction Detection
- STATISTICA SINICA
, 2002
"... We propose an algorithm for regression tree construction called GUIDE. It is specifically designed to eliminate variable selection bias, a problem that can undermine the reliability of inferences from a tree structure. GUIDE controls bias by employing chi-square analysis of residuals and bootstrap c ..."
Abstract
-
Cited by 40 (9 self)
- Add to MetaCart
We propose an algorithm for regression tree construction called GUIDE. It is specifically designed to eliminate variable selection bias, a problem that can undermine the reliability of inferences from a tree structure. GUIDE controls bias by employing chi-square analysis of residuals and bootstrap calibration of significance probabilities. This approach allows fast computation speed, natural extension to data sets with categorical variables, and direct detection of local two-variable interactions. Previous algorithms are not unbiased and are insensitive to local interactions during split selection. The speed of GUIDE enables two further enhancements—complex modeling at the terminal nodes, such as polynomial or best simple linear models, and bagging. In an experiment with real data sets, the prediction mean square error of the piecewise constant GUIDE model is within ±20 % of that of CART�. Piecewise linear GUIDE models are more accurate; with bagging they can outperform the spline-based MARS � method.
Piecewise-polynomial regression trees
- Statistica Sinica
, 1994
"... A nonparametric function 1 estimation method called SUPPORT (“Smoothed and Unsmoothed Piecewise-Polynomial Regression Trees”) is described. The estimate is typically made up of several pieces, each piece being obtained by fitting a polynomial regression to the observations in a subregion of the data ..."
Abstract
-
Cited by 23 (4 self)
- Add to MetaCart
A nonparametric function 1 estimation method called SUPPORT (“Smoothed and Unsmoothed Piecewise-Polynomial Regression Trees”) is described. The estimate is typically made up of several pieces, each piece being obtained by fitting a polynomial regression to the observations in a subregion of the data space. Partitioning is car-ried out recursively as in a tree-structured method. If the estimate is required to be smooth, the polynomial pieces may be glued together by means of weighted averaging. The smoothed estimate is thus obtained in three steps. In the first step, the regressor space is recursively partitioned until the data in each piece are adequately fitted by a polynomial of a fixed order. Partitioning is guided by analysis of the distributions of residuals and cross-validation estimates of prediction mean square error. In the sec-ond step, the data within a neighborhood of each partition are fitted by a polynomial. The final estimate of the regression function is obtained by averaging the polynomial pieces, using smooth weight functions each of which diminishes rapidly to zero outside its associated partition. Estimates of derivatives of the regression function may be
Tree-Structured Logistic Model for Over-Dispersed Binomial Data with Application to Modeling Developmental Effects
- Biometrics
, 1997
"... This article proposes tree-structured logistic regression modeling for over-dispersed binomial data. Recursive partitioning is performed using a combination of statistical tests and residual analysis. The splitting criterion in cross-validation is based on the deviance function. A nested grid algo ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
This article proposes tree-structured logistic regression modeling for over-dispersed binomial data. Recursive partitioning is performed using a combination of statistical tests and residual analysis. The splitting criterion in cross-validation is based on the deviance function. A nested grid algorithm to estimate the bootstrap parameters is developed. The regression tree procedure provides a new approach to explore the relationship between the binomial response and explanatory variables in detail. The proposed procedure is applied to model the relationship between the incidence of malformation, and dose and fetal weight using data from a developmental experiment conducted at the National Center for Toxicological Research. A conditional Gaussian chain model is used to account for the effect of fetal weight by dose. 1 Introduction Recently, tree-based methods have been developed by many researchers. The tree-structured approaches are used for classification (Breiman et al., 19...
Clinical
"... Survival of patients with nonseminomatous germ cell cancer: a review of the IGCC classification by Cox regression and recursive partitioning ..."
Abstract
- Add to MetaCart
Survival of patients with nonseminomatous germ cell cancer: a review of the IGCC classification by Cox regression and recursive partitioning

