## An Empirical Comparison of Decision Trees and Other Classification Methods (1998)

Citations: | 12 - 1 self |

### BibTeX

@TECHREPORT{Lim98anempirical,

author = {Tjen-sien Lim and Wei-yin Loh and Yu-Shan Shih},

title = {An Empirical Comparison of Decision Trees and Other Classification Methods},

institution = {},

year = {1998}

}

### OpenURL

### Abstract

Twenty two decision tree, nine statistical, and two neural network classifiers are compared on thirtytwo datasets in terms of classification error rate, computational time, and (in the case of trees) number of terminal nodes. It is found that the average error rates for a majority of the classifiers are not statistically significant but the computational times of the classifiers differ over a wide range. The statistical POLYCLASS classifier based on a logistic regression spline algorithm has the lowest average error rate. However, it is also one of the most computationally intensive. The classifier based on standard polytomous logistic regression and a decision tree classifier using the QUEST algorithm with linear splits have the second lowest average error rates and are about 50 times faster than POLYCLASS. Among decision tree classifiers with univariate splits, the classifiers based on the C4.5, IND-CART, and QUEST algorithms have the best combination of error rate and speed, althoug...

### Citations

4829 | Neural Networks for Pattern Recognition - Bishop - 1995 |

3245 | Self-Organizing Maps - Kohonen - 1995 |

2868 |
UCI repository of machine learning databases [http://www.ics.uci.edu/∼mlearn/mlrepository.html
- Blake, Merz
- 1998
(Show Context)
Citation Context ... classical and modern statistical classifiers, and two neural network classifiers. Many of the datasets were taken from the University of California, Irvine, Repository of Machine Learning Databases (=-=Merz and Murphy, 1996-=-). Fourteen of the datasets were from real-life domains and two were artificially constructed. Five of the datasets were also used in the StatLog project. We doubled the number of datasets by adding n... |

1203 | Categorical Data Analysis - Agresti - 1990 |

777 | C4.5: Programs for - Quinlan - 1993 |

438 | Very simple classification rules perform well on most commonly used datasets
- Holte
- 1993
(Show Context)
Citation Context ...implementation alone. 2. We evaluate some decision tree classifiers that were not included in the StatLog project, such as LMDT (Brodley and Utgoff, 1995), OC1 (Murthy, Kasif and Salzberg, 1994), T1 (=-=Holte, 1993-=-; Auer, Holte and Maass, 1995), and QUEST (Loh and Shih, 1997). QUEST is unique among decision trees in that it has negligible selection bias in its splits. 3. We include some of the newest and highly... |

310 | Statistics for Experimenters - Box, Hunter, et al. - 1978 |

250 | A system for induction of oblique decision trees - Murthy, Kasif, et al. - 1994 |

233 |
Multivariate adaptive regression splines (with discussion), The Annals of Statistics 19
- Friedman
- 1991
(Show Context)
Citation Context ...od. Therefore any multi-response regression technique can be postprocessed to improve classification performance. Only one adaptive nonparametric regression procedure is compared in this study: MARS (=-=Friedman, 1991-=-). We use the S-Plus (http://www.mathsoft.com/splus.html) function fda from the mda library of the StatLib S Archive. Two models are used: the additive model (degree=1, denoted by FM1) and the model c... |

200 | Improved Use of Continuous Attributes in c4.5
- Quinlan
- 1996
(Show Context)
Citation Context ...d Ripley, 1997) Lim, Loh & Shih Technical report 979 from the StatLib S Archive at http://lib.stat.cmu.edu/S/. The 0-SE and 1-SE trees are denoted by ST0 and ST1 respectively. C4.5: We use Release 8 (=-=Quinlan, 1996-=-) (http://www.cs.su.oz.au/~quinlan/) with the default settings, which include pruning. The algorithm is described in Quinlan (1993). After a tree is constructed, the C4.5 rules induction program is us... |

188 |
Construction and Assessment of Classification Rules
- Hand
- 1997
(Show Context)
Citation Context ...hat no classifier is uniformly most accurate over the datasets studied. Instead, many classifiers possess comparable accuracy. For such classifiers, computational speed may be an important criterion (=-=Hand, 1997-=-). The purpose of our paper is to extend the results of the StatLog project in the following ways: 1. In addition to classification accuracy and size of trees, we compare the relative computational sp... |

150 | Discriminant analysis by Gaussian mixtures
- Hastie, Tibshirani
- 1996
(Show Context)
Citation Context ...oblem is cast into a penalized regression framework via optimal scoring. PDA is implemented in S-Plus using the function fda with method=gen.ridge. MDA: This stands for mixture discriminant analysis (=-=Hastie and Tibshirani, 1996-=-). It fits Gaussian mixture density functions to each class to effect classification. MDA is implemented in S-Plus using the library mda. POL: This is the POLYCLASS algorithm due to Kooperberg, Bose a... |

131 | B.: Sparse discriminant analysis - Clemmensen, Hastie, et al. - 2011 |

115 |
Cancer diagnosis via linear programming
- Mangasarian, Wolberg
- 1990
(Show Context)
Citation Context ...fier is reported in Wolberg, Tanner, Loh and Vanichsetakul (1987), Wolberg, Tanner and Loh (1988), and Wolberg, Tanner and Loh (1989). The data has also been analyzed with linear programming methods (=-=Mangasarian and Wolberg, 1990-=-). Contraceptive method choice (cmc). The dataset is a subset of the 1987 National Indonesia Contraceptive Prevalence Survey. The samples are married women who were either not pregnant or do not know ... |

112 | Flexible discriminant analysis by optimal scoring - Hastie, Tibshirani, et al. - 1994 |

99 | Neural networks and statistical models - Sarle - 1994 |

91 |
Hedonic prices and the demand for clean air
- Harrison, Rubinfeld
- 1978
(Show Context)
Citation Context ...tions. We did not incorporate the cost matrix in our analyses. The error rates are estimated using 10-fold cross validation. Boston housing (bos). This dataset gives housing values in Boston suburbs (=-=Harrison and Rubinfeld, 1978-=-). There are 3 classes, 12 numerical attributes, 1 binary attribute, and 506 observations. The dataset can also be obtained from the UCI repository. Following Loh and Vanichsetakul (1988), the classes... |

82 | The New S Language - Becker, Chambers, et al. - 1988 |

81 | Tree-based models - Clark, Pregiborn - 1992 |

80 | Hazard regression - Kooperberg, Stone, et al. - 1995 |

75 | Theory and applications of agnostic PAC-learning with small decision trees
- Auer, RC, et al.
(Show Context)
Citation Context ...es in the training set using the chosen parameter values. It is denoted by CAL. T1: This is a decision tree that classifies examples on the basis of only one split on a single attribute (Holte, 1993; =-=Auer et al., 1995-=-). It is therefore a 1-level decision tree. The tree is denoted by T1. The software is obtained from http://www.csi.uottawa.ca/~holte/Learning/other-sites.html. 2.2 Statistical classifiers When a clas... |

70 | SAS/STAT user’s guide - INSTITUTE - 1992 |

66 | The effects of training set size on decision tree complexity - Oates, Jensen - 1997 |

53 | Tree-structured classification via generalized discriminant analysis - Loh, Vanichsetakul - 1988 |

50 | Classification and Regression Trees (Chapman - Breiman, Friedman, et al. - 1993 |

41 |
Split selection methods for classification trees. Statist. Sinica 7 815–840
- LOH, andSHIH
- 1997
(Show Context)
Citation Context ...e classifiers that were not included in the StatLog project, such as LMDT (Brodley and Utgoff, 1995), OC1 (Murthy, Kasif and Salzberg, 1994), T1 (Holte, 1993; Auer, Holte and Maass, 1995), and QUEST (=-=Loh and Shih, 1997-=-). QUEST is unique among decision trees in that it has negligible selection bias in its splits. 3. We include some of the newest and highly accurate spline-based statistical classifiers. The classific... |

30 | Multivariate versus univariate decision trees - Brodley, Utgoff - 1992 |

21 | Neural networks, decision tree induction and discriminant analysis: an empirical comparison - Cm-ram, Mingers - 1994 |

18 | A comparison of decision tree classifiers with backpropagation neural networks for multimodal classification problems - Brown, Corruble, et al. - 1993 |

15 | Increasing the efficiency of data mining algorithms with breadth-first marker propagation. KDD-97 - Aronis, Provost - 1997 |

12 |
Automatic construction of decision trees for classification
- Muller, Wysotzki
- 1994
(Show Context)
Citation Context ...values in the software obtained from http://yake.ecn.purdue.edu/~brodley/software/lmdt.html. CAL5: This is developed by the Fraunhofer Society, Institute for Information and Data Processing, Germany (=-=Muller and Wysotzki, 1994-=-; Muller and Wysotzki, 1997). We use version 2 due to W. Mueller (wmueller@epo.iitb.fhg.de). CAL5 is designed specifically for numerical valued attributes. However, it has a procedure to handle catego... |

10 |
The decision-tree algorithm CAL5 based on a statistical approach to its splitting algorithm
- Muller, Wysotzki
- 1997
(Show Context)
Citation Context ...ined from http://yake.ecn.purdue.edu/~brodley/software/lmdt.html. CAL5: This is developed by the Fraunhofer Society, Institute for Information and Data Processing, Germany (Muller and Wysotzki, 1994; =-=Muller and Wysotzki, 1997-=-). We use version 2 due to W. Mueller (wmueller@epo.iitb.fhg.de). CAL5 is designed specifically for numerical valued attributes. However, it has a procedure to handle categorical attributes so that mi... |

8 | Simplifying decision trees: A survey, The Knowledge Engineering Review 12(1 - Breslow, Aha - 1997 |

8 |
Introduction to IND Version 2.1
- Buntine, Caruana
- 1992
(Show Context)
Citation Context ...ly a short description of each algorithm is given. Details may be found in the cited references. 2.1 Trees and rules CART: We use the version of CART implemented in the cart style of the IND package (=-=Buntine and Caruana, 1992-=-) with the Gini index of diversity as the splitting criterion. The trees based on the 0-SE and 1-SE pruning rules are denoted by IC0 and IC1 respectively. S-Plus tree: This is a variant of the CART al... |

7 | Symbolic and neural learning algorithms: an empirical comparison - Shavlik, Mooney, et al. - 1991 |

7 | Diagnostic Schemes for Fine Needle Aspirates of Breast Masses, Analytical and Quantitative Cytology and Histology 10 - Wolberg, Tanner, et al. - 1988 |

6 |
Analysis of attitudes toward workplace smoking restrictions
- Bull
- 1994
(Show Context)
Citation Context ...islation Survey--Metropolitan Toronto 1988, which was funded by NHRDP (Health and Welfare Canada). It was collected by L. Pederson and S. Bull at the Institute for Social Research at York University (=-=Bull, 1994-=-). It was obtained from http://lib.stat.cmu.edu/datasets/csb/. The problem is to predict attitude toward restrictions on smoking in the workplace (prohibited, restricted, or unrestricted) based on byl... |

4 | Fast effective rule induction’, in A. Prieditis and S. Russell (eds - Cohen - 1995 |

4 |
Simultaneous Statistical Inference, second edn
- Miller
- 1981
(Show Context)
Citation Context ...s between datasets (called "blocks") (Box, Hunter and Hunter, 1978, p. 209). Simultaneous confidence intervals for differences between average error rates can then be obtained using the Tuke=-=y method (Miller, 1981-=-, p. 71). According to this procedure, a difference between the average error rates of two classifiers is statistically significant at the 10% level if they differ by more than 0.0584. To visualize th... |

3 | The determinants of contraceptive method and service point choice, Secondary Analysis - Lerman, Molyneaux, et al. - 1991 |

3 |
Modern Applied Statistics With S-Plus, 2 edn
- Venables, Ripley
- 1997
(Show Context)
Citation Context ...ee as a probability model and employs deviance as the splitting criterion. The best tree is chosen by 10-fold cross validation. Pruning is performed with the p.tree() function in the treefix library (=-=Venables and Ripley, 1997-=-) Lim, Loh & Shih Technical report 979 from the StatLib S Archive at http://lib.stat.cmu.edu/S/. The 0-SE and 1-SE trees are denoted by ST0 and ST1 respectively. C4.5: We use Release 8 (Quinlan, 1996)... |

3 | Fine needle aspiration for breast mass diagnosis - Wolberg, Tanner, et al. - 1989 |

2 |
Multivariate decision trees, Machine Learning 19: 45--77
- Brodley, Utgoff
- 1995
(Show Context)
Citation Context ... classification methods that these differences cannot be attributed to implementation alone. 2. We evaluate some decision tree classifiers that were not included in the StatLog project, such as LMDT (=-=Brodley and Utgoff, 1995-=-), OC1 (Murthy, Kasif and Salzberg, 1994), T1 (Holte, 1993; Auer, Holte and Maass, 1995), and QUEST (Loh and Shih, 1997). QUEST is unique among decision trees in that it has negligible selection bias ... |

1 | Learning classification trees, Statistics and Computing 2: 63--73 - Lim - 1992 |