Results 1  10
of
40
Automatic Construction of Decision Trees from Data: A MultiDisciplinary Survey
 Data Mining and Knowledge Discovery
, 1997
"... Decision trees have proved to be valuable tools for the description, classification and generalization of data. Work on constructing decision trees from data exists in multiple disciplines such as statistics, pattern recognition, decision theory, signal processing, machine learning and artificial ne ..."
Abstract

Cited by 164 (1 self)
 Add to MetaCart
(Show Context)
Decision trees have proved to be valuable tools for the description, classification and generalization of data. Work on constructing decision trees from data exists in multiple disciplines such as statistics, pattern recognition, decision theory, signal processing, machine learning and artificial neural networks. Researchers in these disciplines, sometimes working on quite different problems, identified similar issues and heuristics for decision tree construction. This paper surveys existing work on decision tree construction, attempting to identify the important issues involved, directions the work has taken and the current state of the art. Keywords: classification, treestructured classifiers, data compaction 1. Introduction Advances in data collection methods, storage and processing technology are providing a unique challenge and opportunity for automated data exploration techniques. Enormous amounts of data are being collected daily from major scientific projects e.g., Human Genome...
Split Selection Methods for Classification Trees
 STATISTICA SINICA
, 1997
"... Classification trees based on exhaustive search algorithms tend to be biased towards selecting variables that afford more splits. As a result, such trees should be interpreted with caution. This article presents an algorithm called QUEST that has negligible bias. Its split selection strategy shares ..."
Abstract

Cited by 94 (9 self)
 Add to MetaCart
Classification trees based on exhaustive search algorithms tend to be biased towards selecting variables that afford more splits. As a result, such trees should be interpreted with caution. This article presents an algorithm called QUEST that has negligible bias. Its split selection strategy shares similarities with the FACT method, but it yields binary splits and the final tree can be selected by a direct stopping rule or by pruning. Real and simulated data are used to compare QUEST with the exhaustive search approach. QUEST is shown to be substantially faster and the size and classification accuracy of its trees are typically comparable to those of exhaustive search.
Regression Trees With Unbiased Variable Selection and Interaction Detection
 STATISTICA SINICA
, 2002
"... We propose an algorithm for regression tree construction called GUIDE. It is specifically designed to eliminate variable selection bias, a problem that can undermine the reliability of inferences from a tree structure. GUIDE controls bias by employing chisquare analysis of residuals and bootstrap c ..."
Abstract

Cited by 68 (14 self)
 Add to MetaCart
We propose an algorithm for regression tree construction called GUIDE. It is specifically designed to eliminate variable selection bias, a problem that can undermine the reliability of inferences from a tree structure. GUIDE controls bias by employing chisquare analysis of residuals and bootstrap calibration of significance probabilities. This approach allows fast computation speed, natural extension to data sets with categorical variables, and direct detection of local twovariable interactions. Previous algorithms are not unbiased and are insensitive to local interactions during split selection. The speed of GUIDE enables two further enhancements—complex modeling at the terminal nodes, such as polynomial or best simple linear models, and bagging. In an experiment with real data sets, the prediction mean square error of the piecewise constant GUIDE model is within ±20 % of that of CART�. Piecewise linear GUIDE models are more accurate; with bagging they can outperform the splinebased MARS � method.
Bayesian Treed Models
 Machine Learning
, 2000
"... When simple parametric models such as linear regression fail to adequately approximate a function across an entire set of data, an alternative may be to consider a partition of the data, and then use a separate simple model within each subset of the partition. Such an alternative is provided by ..."
Abstract

Cited by 34 (1 self)
 Add to MetaCart
When simple parametric models such as linear regression fail to adequately approximate a function across an entire set of data, an alternative may be to consider a partition of the data, and then use a separate simple model within each subset of the partition. Such an alternative is provided by a treed model which uses a binary tree to identify such a partition. However, treed models go further than conventional trees (eg CART, C4.5) by tting models rather than simple means or proportions across the partition. In this paper, we propose a Bayesian approach for nding and tting parametric treed models, in particular focusing on Bayesian treed regression. The potential of this approach is illustrated by a crossvalidation comparison of predictive performance with neural nets, MARS, and conventional trees on simulated and real data sets. Keywords: binary trees, Markov chain Monte Carlo, model selection, stochastic search. 1 Hugh Chipman is Associate Professor of Statistics...
An Adaptive Estimation of Dimension Reduction Space
, 2002
"... Searching for an effective dimension reduction space is an important problem in regression, especially for high dimensional data. In this paper, we propose an adaptive approach based on semiparametric models, which we call the minimum average (conditional) variance estimation (MAVE) method, within q ..."
Abstract

Cited by 27 (2 self)
 Add to MetaCart
Searching for an effective dimension reduction space is an important problem in regression, especially for high dimensional data. In this paper, we propose an adaptive approach based on semiparametric models, which we call the minimum average (conditional) variance estimation (MAVE) method, within quite a general setting. The MAVE method has the following advantages: (1) Most existing methods have to undersmooth the nonparametric link function estimator in order to achieve a faster rate of consistency for the estimator of the parameters (than for that of the nonparametric function). In contrast, a faster consistency rate can be achieved by the MAVE method even without undersmoothing the nonparametric link function estimator. (2) The MAVE method is applicable to a wide range of models, with fewer restrictions on the distribution of the covariates, to the extent that even time series can be included. (3) Because of the faster rate of consistency for the parameter estimators, it is possible for us to estimate the dimension of the space consistently.
SECRET: A scalable linear regression tree algorithm
 In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, 2002
"... Recently there has been an increasing interest in developing regression models for large datasets that are both accurate and easy to interpret. Regressors that have these properties are regression trees with linear models in the leaves, but so far, the algorithms proposed for constructing them are n ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
Recently there has been an increasing interest in developing regression models for large datasets that are both accurate and easy to interpret. Regressors that have these properties are regression trees with linear models in the leaves, but so far, the algorithms proposed for constructing them are not scalable. In this paper we propose a novel regression tree construction algorithm that is both accurate and can truly scale to very large datasets. The main idea is, for every intermediate node, to use the EM algorithm for Gaussian mixtures to find two clusters in the data and to locally transform the regression problem into a classification problem based on closeness to these clusters. Goodness of split measures, like the gini gain, can then be used to determine the split variable and the split point much like in classification tree construction. Scalability of the algorithm can be enhanced by employing scalable versions of the EM and the classification tree construction algorithms. Tests on real and artificial data show that the proposed algorithm has accuracy comparable to other linear regression tree algorithms but requires orders of magnitude less computation time for large datasets. 1.
Incremental Learning of Linear Model Trees
 Machine Learning
, 2005
"... A linear model tree is a decision tree with a linear functional model in each leaf. Previous model tree induction algorithms have operated on the entire training set, however there are many situations when an incremental learner is advantageous. In this paper we demonstrate that model trees ca ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
(Show Context)
A linear model tree is a decision tree with a linear functional model in each leaf. Previous model tree induction algorithms have operated on the entire training set, however there are many situations when an incremental learner is advantageous. In this paper we demonstrate that model trees can be induced incrementally using an algorithm that scales linearly with the number of examples.
Visual exploration of high dimensional scalar functions
 IEEE TRANS. VISUALIZATION AND COMPUTER GRAPHICS
, 2010
"... An important goal of scientific data analysis is to understand the behavior of a system or process based on a sample of the system. In many instances it is possible to observe both input parameters and system outputs, and characterize the system as a highdimensional function. Such data sets arise, ..."
Abstract

Cited by 13 (8 self)
 Add to MetaCart
(Show Context)
An important goal of scientific data analysis is to understand the behavior of a system or process based on a sample of the system. In many instances it is possible to observe both input parameters and system outputs, and characterize the system as a highdimensional function. Such data sets arise, for instance, in large numerical simulations, as energy landscapes in optimization problems, or in the analysis of image data relating to biological or medical parameters. This paper proposes an approach to analyze and visualizing such data sets. The proposed method combines topological and geometric techniques to provide interactive visualizations of discretely sampled highdimensional scalar fields. The method relies on a segmentation of the parameter space using an approximate MorseSmale complex on the cloud of point samples. For each crystal of the MorseSmale complex, a regression of the system parameters with respect to the output yields a curve in the parameter space. The result is a simplified geometric representation of the MorseSmale complex in the high dimensional input domain. Finally, the geometric representation is embedded in 2D, using dimension reduction, to provide a visualization platform. The geometric properties of the regression curves enable the visualization of additional information about each crystal such as local and global shape, width, length, and sampling densities. The method is illustrated on several synthetic examples of two dimensional functions. Two use cases, using data sets from the UCI machine learning repository, demonstrate the utility of the proposed approach on real data. Finally, in collaboration with domain experts the proposed
High dimensional data analysis via the SIR/PHD approach
, 2000
"... Dimensionality is an issue that can arise in every scientific field. Generally speaking, the difficulty lies on how to visualize a high dimensional function or data set. This is an area which has become increasingly more important due to the advent of computer and graphics technology. People often a ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
Dimensionality is an issue that can arise in every scientific field. Generally speaking, the difficulty lies on how to visualize a high dimensional function or data set. This is an area which has become increasingly more important due to the advent of computer and graphics technology. People often ask: “How do they look?”, “What structures are there?”, “What model should be used? ” Aside from the differences that underly the various scientific contexts, such kind of questions do have a common root in Statistics. This should be the driving force for the study of high dimensional data analysis. Sliced inverse regression(SIR) and principal Hessian direction(PHD) are two basic dimension reduction methods. They are useful for the extraction of geometric information underlying noisy data of several dimensions a crucial step in empirical model building which has been overlooked in the literature. In this Lecture Notes, I will review the theory of SIR/PHD and describe some ongoing research in various application areas. There are two parts. The first part is based on materials that have already appeared in the literature. The second part is just a collection of some manuscripts which are not yet published. They are included here for completeness.