Results 1  10
of
10
Split Selection Methods for Classification Trees
 STATISTICA SINICA
, 1997
"... Classification trees based on exhaustive search algorithms tend to be biased towards selecting variables that afford more splits. As a result, such trees should be interpreted with caution. This article presents an algorithm called QUEST that has negligible bias. Its split selection strategy shares ..."
Abstract

Cited by 75 (9 self)
 Add to MetaCart
Classification trees based on exhaustive search algorithms tend to be biased towards selecting variables that afford more splits. As a result, such trees should be interpreted with caution. This article presents an algorithm called QUEST that has negligible bias. Its split selection strategy shares similarities with the FACT method, but it yields binary splits and the final tree can be selected by a direct stopping rule or by pruning. Real and simulated data are used to compare QUEST with the exhaustive search approach. QUEST is shown to be substantially faster and the size and classification accuracy of its trees are typically comparable to those of exhaustive search.
Regression Trees With Unbiased Variable Selection and Interaction Detection
 STATISTICA SINICA
, 2002
"... We propose an algorithm for regression tree construction called GUIDE. It is specifically designed to eliminate variable selection bias, a problem that can undermine the reliability of inferences from a tree structure. GUIDE controls bias by employing chisquare analysis of residuals and bootstrap c ..."
Abstract

Cited by 54 (14 self)
 Add to MetaCart
We propose an algorithm for regression tree construction called GUIDE. It is specifically designed to eliminate variable selection bias, a problem that can undermine the reliability of inferences from a tree structure. GUIDE controls bias by employing chisquare analysis of residuals and bootstrap calibration of significance probabilities. This approach allows fast computation speed, natural extension to data sets with categorical variables, and direct detection of local twovariable interactions. Previous algorithms are not unbiased and are insensitive to local interactions during split selection. The speed of GUIDE enables two further enhancements—complex modeling at the terminal nodes, such as polynomial or best simple linear models, and bagging. In an experiment with real data sets, the prediction mean square error of the piecewise constant GUIDE model is within ±20 % of that of CART�. Piecewise linear GUIDE models are more accurate; with bagging they can outperform the splinebased MARS � method.
Piecewisepolynomial regression trees
 Statistica Sinica
, 1994
"... A nonparametric function 1 estimation method called SUPPORT (“Smoothed and Unsmoothed PiecewisePolynomial Regression Trees”) is described. The estimate is typically made up of several pieces, each piece being obtained by fitting a polynomial regression to the observations in a subregion of the data ..."
Abstract

Cited by 30 (7 self)
 Add to MetaCart
A nonparametric function 1 estimation method called SUPPORT (“Smoothed and Unsmoothed PiecewisePolynomial Regression Trees”) is described. The estimate is typically made up of several pieces, each piece being obtained by fitting a polynomial regression to the observations in a subregion of the data space. Partitioning is carried out recursively as in a treestructured method. If the estimate is required to be smooth, the polynomial pieces may be glued together by means of weighted averaging. The smoothed estimate is thus obtained in three steps. In the first step, the regressor space is recursively partitioned until the data in each piece are adequately fitted by a polynomial of a fixed order. Partitioning is guided by analysis of the distributions of residuals and crossvalidation estimates of prediction mean square error. In the second step, the data within a neighborhood of each partition are fitted by a polynomial. The final estimate of the regression function is obtained by averaging the polynomial pieces, using smooth weight functions each of which diminishes rapidly to zero outside its associated partition. Estimates of derivatives of the regression function may be
TreeStructured Logistic Model for OverDispersed Binomial Data with Application to Modeling Developmental Effects
 Biometrics
, 1997
"... This article proposes treestructured logistic regression modeling for overdispersed binomial data. Recursive partitioning is performed using a combination of statistical tests and residual analysis. The splitting criterion in crossvalidation is based on the deviance function. A nested grid algo ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
This article proposes treestructured logistic regression modeling for overdispersed binomial data. Recursive partitioning is performed using a combination of statistical tests and residual analysis. The splitting criterion in crossvalidation is based on the deviance function. A nested grid algorithm to estimate the bootstrap parameters is developed. The regression tree procedure provides a new approach to explore the relationship between the binomial response and explanatory variables in detail. The proposed procedure is applied to model the relationship between the incidence of malformation, and dose and fetal weight using data from a developmental experiment conducted at the National Center for Toxicological Research. A conditional Gaussian chain model is used to account for the effect of fetal weight by dose. 1 Introduction Recently, treebased methods have been developed by many researchers. The treestructured approaches are used for classification (Breiman et al., 19...
Mining event histories: a social science perspective
"... Abstract We explore how recent dataminingbased tools developed in domains such as biomedicine or textmining for extracting interesting knowledge from sequence data could be applied to personal life course data. We focus on two types of approaches: ‘Survival ’ trees that attempt to partition the d ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
Abstract We explore how recent dataminingbased tools developed in domains such as biomedicine or textmining for extracting interesting knowledge from sequence data could be applied to personal life course data. We focus on two types of approaches: ‘Survival ’ trees that attempt to partition the data into homogeneous groups regarding their survival characteristics, i.e the duration until a given event occurs, and the mining of typical discriminating episodes. We show how these approaches may fruitfully complement the outcome of more classical event history analyses and single out some specific issues raised by their application to sociodemographic data.
Clinical
"... Survival of patients with nonseminomatous germ cell cancer: a review of the IGCC classification by Cox regression and recursive partitioning ..."
Abstract
 Add to MetaCart
Survival of patients with nonseminomatous germ cell cancer: a review of the IGCC classification by Cox regression and recursive partitioning
From
, 2007
"... Please, do not quote without author’s permissionComparing and classifying personal life courses: ..."
Abstract
 Add to MetaCart
Please, do not quote without author’s permissionComparing and classifying personal life courses:
Joint Statistical Meetings Biometrics Sectionto include ENAR & WNAR Nonparametric TreeStructured Modeling for IntervalCensored Survival Data
, 2002
"... Survival analysis has been a major research area in statistics. In survival analysis, time to some event is usually the outcome variable. The objective of a survival study is to identify the relationship between ..."
Abstract
 Add to MetaCart
Survival analysis has been a major research area in statistics. In survival analysis, time to some event is usually the outcome variable. The objective of a survival study is to identify the relationship between
Mining event histories: A social scientist view
"... Individual longitudinal or sequence data are common to many fields. For instance, they are essential for understanding and predicting the evolution of a patient’s disease after it has been diagnosed (survival analysis), the behavior of a visitor of a web site (web log mining), but also for categoriz ..."
Abstract
 Add to MetaCart
Individual longitudinal or sequence data are common to many fields. For instance, they are essential for understanding and predicting the evolution of a patient’s disease after it has been diagnosed (survival analysis), the behavior of a visitor of a web site (web log mining), but also for categorizing or clustering signal sequences in domains such as telecommunication. This paper focuses on the analysis of individual longitudinal data within social sciences, especially in population science where we are interested in describing and understanding life courses. A life event can be seen as the change of state of some discrete variable, e.g. the marital status, the number of children, the job, the place of residence. Such life history data are collected in mainly two ways: As a collection of time stamped events or as state sequences. The former is used for instance by survival analysis that focuses on a given type of event and is concerned with its hazard rate or equivalently the duration until it happens. Sequence analysis on the other hand is concerned with the sequencing of the events and is best suited for characterizing whole life trajectories. We consider using dataminingbased approaches borrowed from other fields for analysing life courses with both a survival and a sequence perspective. We put stress on the social scientist’s expectations and address some of the statistical challenges they raise. 1