## A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-three Old and New Classification Algorithms (2000)

### Cached

### Download Links

- [www.stat.wisc.edu]
- [www.cs.wisc.edu]
- [neuron.tuke.sk]
- [sci2s.ugr.es]
- DBLP

### Other Repositories/Bibliography

Citations: | 186 - 7 self |

### BibTeX

@MISC{Lim00acomparison,

author = {Tjen-Sien Lim and WEI-YIN LOH and W. Cohen},

title = {A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-three Old and New Classification Algorithms},

year = {2000}

}

### Years of Citing Articles

### OpenURL

### Abstract

. Twenty-two decision tree, nine statistical, and two neural network algorithms are compared on thirty-two datasets in terms of classication accuracy, training time, and (in the case of trees) number of leaves. Classication accuracy is measured by mean error rate and mean rank of error rate. Both criteria place a statistical, spline-based, algorithm called Polyclass at the top, although it is not statistically signicantly dierent from twenty other algorithms. Another statistical algorithm, logistic regression, is second with respect to the two accuracy criteria. The most accurate decision tree algorithm is Quest with linear splits, which ranks fourth and fth, respectively. Although spline-based statistical algorithms tend to have good accuracy, they also require relatively long training times. Polyclass, for example, is third last in terms of median training time. It often requires hours of training compared to seconds for other algorithms. The Quest and logistic regression algor...

### Citations

5438 |
C4.5: Programs for Machine Learning
- Quinlan
- 1993
(Show Context)
Citation Context ...ith the p.tree() function in the treefix library [47] from the StatLib S Archive at http://lib.stat.cmu.edu/S/. The 0-se and 1-se trees are denoted by ST0 and ST1 respectively. C4.5: We use Release 8 =-=[41, 42]-=- with the default settings including pruning (http://www.cse.unsw.edu.au/~quinlan/). After a tree is constructed, the C4.5 rule induction program is used to produce a set of rules. The trees are denot... |

5369 |
Neural Networks for Pattern Recognition
- Bishop
- 1995
(Show Context)
Citation Context ...gorithms rank in order of decreasing number of p -marks (in parentheses) as: POL(15), LOG(13), QL0(10), LDA(10), PDA(10), QL1(9), OCU(9), (1) QU0(8), QU1(8), C4R(8), IBO(8), RBF(8), C4T(7), IMO(6), IM=-=(5)-=-, IC1(5), ST0(5), FTU(4), IC0(4), CAL(4), IB(3), LMT(1). The top four algorithms in (1) also rank among the topsve in the upper half of Table 5. COMPARISON OF CLASSIFICATION ALGORITHMS 13 Table 4. Min... |

4457 |
Classification and Regression Trees
- Breiman, Friedman, et al.
- 1984
(Show Context)
Citation Context ... algorithm requires class prior probabilities, they are made proportional to the training sample sizes. COMPARISON OF CLASSIFICATION ALGORITHMS 3 2.1. Trees and rules CART: We use the version of Cart =-=[6]-=- implemented in the cart style of the Ind package [13] with the Gini index of diversity as the splitting criterion. The trees based on the 0-se and 1-se pruning rules are denoted by IC0 and IC1 respec... |

3732 | Self-organizing maps - Kohonen - 1997 |

3085 |
UCI repository of machine learning databases
- Blake, Merz
- 1998
(Show Context)
Citation Context ...ne classical and modern statistical algorithms, and two neural network algorithms. Several datasets are taken from the University of California, Irvine, Repository of Machine Learning Databases (UCI) =-=[33]-=-. Fourteen of the datasets are from real-life domains and two are articially constructed. Five of the datasets were used in the StatLog Project. To increase the number of datasets and to study the e... |

1715 |
Categorical data analysis
- Agresti
- 2002
(Show Context)
Citation Context ...eristics of the datasets. The last three columns give the number and type of added noise attributes for each dataset. The number of values taken by the class attribute is denoted by J. The notation \N=-=(0,1)-=-" denotes the standard normal distribution, \UI(m,n)" a uniform distribution over the integers m through n inclusive, and \U(0,1)" a uniform distribution over the unit interval. The abbreviation C(k) ... |

1212 | Pattern Recognition and Neural Networks - Ripley - 1996 |

1057 | Fast Effective Rule Induction - Cohen |

858 | C4.5: Programs for - Quinlan - 1993 |

778 | Applied multivariate statistical analysis - RA, Wichern - 2002 |

496 |
Nonparametric statistical methods
- Hollander, Wolfe
(Show Context)
Citation Context .... Therefore the null hypothesis that the algorithms are equally accurate on average is again rejected. Further, a dierence in mean ranks greater than 8.7 is statistically signicant at the 10% level =-=[24]-=-. Thus POL is not statistically signicantly dierent from the twenty other algorithms that have mean rank less than or equal to 17.0. Figure 2(a) shows a plot of median training time versus mean rank... |

474 | Very Simple Classification Rules Perform Well on Most Commonly Used Datasets
- Holte
- 1993
(Show Context)
Citation Context ...ibuted to implementation alone. 2. We include some decision tree algorithms that are not included in the STATLOG Project, such as S-PLUS tree (Clark & Pregibon, 1993), T1 (Auer, Holte, & Maass, 1995; =-=Holte, 1993-=-), OC1 (Murthy, Kasif, & Salzberg, 1994), LMDT (Brodley & Utgoff, 1995), and QUEST (Loh & Shih, 1997). 3. We also include several of the newest spline-based statistical algorithms. Their classificatio... |

469 |
Modern Applied Statistics with S-Plus
- RIPLEY, N
- 1994
(Show Context)
Citation Context ...Clark and Pregibon (1993). It employs deviance as the splitting criterion. The best tree is chosen by ten-fold cross-validation. Pruning is performed with the p.tree() function in the treefix library =-=[47]-=- from the StatLib S Archive at http://lib.stat.cmu.edu/S/. The 0-se and 1-se trees are denoted by ST0 and ST1 respectively. C4.5: We use Release 8 [41, 42] with the default settings including pruning ... |

449 |
Applied Linear Statistical Models
- Neter, Kutner, et al.
- 1996
(Show Context)
Citation Context ...ed eects analysis of variance can be used to test the simultaneous statistical signicance of dierences between mean error rates of the algorithms, while controlling for dierences between datasets =-=[39]-=-. Although it makes the assumption that the eects of the datasets act like a random sample from a normal distribution, it is quite robust against violation of the assumption. For our data, the proced... |

268 | A system for induction of oblique decision trees
- Murthy, Kasif, et al.
- 1994
(Show Context)
Citation Context ...) that the dierences cannot be attributed to implementation alone. 2. We include some decision tree algorithms that are not included in the StatLog Project, such as S-Plus tree [14], T1 [3, 25], Oc1 =-=[38]-=-, Lmdt [9], and Quest [30]. 3. We also include several of the newest spline-based statistical algorithms. Their classication accuracy may be used as benchmarks for comparison with other algorithms in... |

258 |
Multivariate Adaptive Regression Splines (with discussion
- Friedman
- 1991
(Show Context)
Citation Context ....edu/~limt/logdiscr/). FDA: This issexible discriminant analysis [23], a generalization of linear discriminant analysis that casts the classication problem as one involving regression. Only the Mars =-=[17]-=- nonparametric regression procedure is studied here. We use the S-Plus function fda from the mda library of the StatLib S Archive. Two models are used: an additive model (degree=1, denoted by FM1) and... |

217 | Improved use of continuous attributes in c4.5
- Quinlan
- 1996
(Show Context)
Citation Context ...ith the p.tree() function in the treefix library [47] from the StatLib S Archive at http://lib.stat.cmu.edu/S/. The 0-se and 1-se trees are denoted by ST0 and ST1 respectively. C4.5: We use Release 8 =-=[41, 42]-=- with the default settings including pruning (http://www.cse.unsw.edu.au/~quinlan/). After a tree is constructed, the C4.5 rule induction program is used to produce a set of rules. The trees are denot... |

211 |
Construction and Assessment of Classification Rules
- Hand
- 1997
(Show Context)
Citation Context ...t finds that no algorithm is uniformly most accurate over the datasets studied. Instead, many algorithms possess comparable accuracy. For such algorithms, excessive training times may be undesirable (=-=Hand, 1997-=-). The purpose of our paper is to extend the results of the STATLOG Project in the following ways: 1. In addition to classification accuracy and size of trees, we compare the training times of the alg... |

180 |
Simultaneous Statistical Inference
- Miller
- 1966
(Show Context)
Citation Context ... hypothesis that all the algorithms have the same mean error rate is strongly rejected. Simultaneous condence intervals for dierences between mean error rates can be obtained using the Tukey method =-=[35]-=-. According to this procedure, a dierence between the mean error rates of two algorithms is statistically signicant at the 10% level if they dier by more than 0.058. COMPARISON OF CLASSIFICATION AL... |

178 | SAS/STAT user's guide - Institute - 1990 |

170 | The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance - Friedman - 1937 |

159 | Discriminant Analysis by Gaussian Mixtures
- Hastie, Tibshirani
- 2007
(Show Context)
Citation Context ...roblem is cast into a penalized regression framework via optimal scoring. PDA is implemented in S-Plus using the function fda with method=gen.ridge. MDA: This stands for mixture discriminant analysis =-=[22]-=-. Itsts Gaussian mixture density functions to each class to produce a classier. MDA is implemented in S-Plus using the library mda. POL: This is the Polyclass algorithm [28]. Itsts a polytomous logis... |

152 | Penalized discriminant analysis
- Hastie, Buja, et al.
- 1995
(Show Context)
Citation Context ...rchive. Two models are used: an additive model (degree=1, denoted by FM1) and a model containingsrst-order interactions (degree=2 with penalty=3, denoted by FM2). PDA: This is a form of penalized LDA =-=[21]-=-. It is designed for situations in which there are many highly correlated attributes. The classication problem is cast into a penalized regression framework via optimal scoring. PDA is implemented in... |

133 | Learning classification trees - Buntine - 1992 |

124 |
Wolberg: "Cancer diagnosis via linear programming
- Mangasarian, H
- 1990
(Show Context)
Citation Context ...data using the Fact algorithm is reported in Wolberg, Tanner, Loh and Vanichsetakul (1987) and Wolberg, Tanner and Loh (1988, 1989). The dataset has also been analyzed with linear programming methods =-=[32]-=-. Contraceptive method choice (cmc). The data are taken from the 1987 National Indonesia Contraceptive Prevalence Survey. The samples are married women who were either not pregnant or did not know if ... |

123 | Multivariate decision trees
- Brodley, Utgoff
- 1995
(Show Context)
Citation Context ...dierences cannot be attributed to implementation alone. 2. We include some decision tree algorithms that are not included in the StatLog Project, such as S-Plus tree [14], T1 [3, 25], Oc1 [38], Lmdt =-=[9]-=-, and Quest [30]. 3. We also include several of the newest spline-based statistical algorithms. Their classication accuracy may be used as benchmarks for comparison with other algorithms in the futur... |

118 | Flexible discriminant analysis by optimal scoring
- Hastie, Tibshirani, et al.
- 1994
(Show Context)
Citation Context ...ned with a polytomous logistic regression (see, e.g., Agresti, 1990) Fortran 90 routine written by thesrst author (http://www.stat.wisc.edu/~limt/logdiscr/). FDA: This issexible discriminant analysis =-=[23]-=-, a generalization of linear discriminant analysis that casts the classication problem as one involving regression. Only the Mars [17] nonparametric regression procedure is studied here. We use the S... |

114 | Neural networks and statistical models
- Sarle
- 1994
(Show Context)
Citation Context ...nd 0.03 for , the learning rate parameter, in olvq1 and lvq1, respectively. 6 T.-S. LIM, W.-Y. LOH AND Y.-S. SHIH RBF: This is the radial basis function network implemented in the Sas tnn3.sas macro =-=[44]-=- for feedforward neural networks (http://www.sas.com). The network architecture is specied with the ARCH=RBF argument. In this study, we construct a network with only one hidden layer. The number of ... |

108 |
model
- Clark
(Show Context)
Citation Context ...s (seconds versus days) that the dierences cannot be attributed to implementation alone. 2. We include some decision tree algorithms that are not included in the StatLog Project, such as S-Plus tree =-=[14]-=-, T1 [3, 25], Oc1 [38], Lmdt [9], and Quest [30]. 3. We also include several of the newest spline-based statistical algorithms. Their classication accuracy may be used as benchmarks for comparison wi... |

95 |
Hedonic prices and the demand for clean air
- Harrison, Rubinfeld
- 1978
(Show Context)
Citation Context ...s here because some algorithms do not allow unequal costs. The error rates are estimated using ten-fold cross-validation. Boston housing (bos). This UCI dataset gives housing values in Boston suburbs =-=[20]-=-. There are three classes, twelve numerical attributes, one binary attribute, and 506 records. Following Loh and Vanichsetakul (1988), the classes are created from the attribute median value of owner-... |

94 | Hazard regression
- Kooperberg, Stone, et al.
- 1995
(Show Context)
Citation Context ...ure discriminant analysis [22]. Itsts Gaussian mixture density functions to each class to produce a classier. MDA is implemented in S-Plus using the library mda. POL: This is the Polyclass algorithm =-=[28]-=-. Itsts a polytomous logistic regression model using linear splines and their tensor products. It provides estimates for conditional class probabilities which can then be used to predict class labels.... |

94 | Split selection methods for classification trees
- Loh, Shih
- 1997
(Show Context)
Citation Context ...ed in the STATLOG Project, such as S-PLUS tree (Clark & Pregibon, 1993), T1 (Auer, Holte, & Maass, 1995; Holte, 1993), OC1 (Murthy, Kasif, & Salzberg, 1994), LMDT (Brodley & Utgoff, 1995), and QUEST (=-=Loh & Shih, 1997-=-). 3. We also include several of the newest spline-based statistical algorithms. Their classification accuracy may be used as benchmarks for comparison with other algorithms in the future. 4. We study... |

92 | The new S language
- Becker, Chambers, et al.
- 1988
(Show Context)
Citation Context ...ively. The software is obtained from the http address: ic-www.arc.nasa.gov/ic/projects/bayes-group/ind/IND-program.html. S-Plus tree: This is a variant of the Cart algorithm written in the S language =-=[4]-=-. It is described in Clark and Pregibon (1993). It employs deviance as the splitting criterion. The best tree is chosen by ten-fold cross-validation. Pruning is performed with the p.tree() function in... |

76 | Theory and application of agnostic PAC-learning with small decision trees
- Auer, Holte, et al.
- 1995
(Show Context)
Citation Context ...s versus days) that the dierences cannot be attributed to implementation alone. 2. We include some decision tree algorithms that are not included in the StatLog Project, such as S-Plus tree [14], T1 =-=[3, 25]-=-, Oc1 [38], Lmdt [9], and Quest [30]. 3. We also include several of the newest spline-based statistical algorithms. Their classication accuracy may be used as benchmarks for comparison with other alg... |

73 |
Fast eective rule induction
- Cohen
- 1995
(Show Context)
Citation Context ...FTL(1), OCM(1), ST1(1), FM2(1), MDA(1), FM1(2), OCL(3), QDA(3), NN(4), LVQ(4), T1(11). Excluding these, the remaining algorithms rank in order of decreasing number of p -marks (in parentheses) as: POL=-=(15)-=-, LOG(13), QL0(10), LDA(10), PDA(10), QL1(9), OCU(9), (1) QU0(8), QU1(8), C4R(8), IBO(8), RBF(8), C4T(7), IMO(6), IM(5), IC1(5), ST0(5), FTU(4), IC0(4), CAL(4), IB(3), LMT(1). The top four algorithms ... |

72 | Applied Multivariate Statistical Analysis. 3rd ed - Johnson, Wichern - 1992 |

68 | The effects of training set size on decision tree complexity - Oates, Jensen - 1997 |

59 | Tree-structured Classification via Generalized Discriminant Analysis, JAm Stat Assoc - LOH, VANICHSETAKUL - 1988 |

55 | Classification and Regression Trees, Chapman - Breiman, Friedman, et al. - 1984 |

51 | Modern Applied Statistics with S-PLUS, 2nd edn - Venables, Ripley - 1997 |

47 | Split selection methods for classification trees. Statistica Sinica, 7: 815-840 - LOH, SHIH - 1997 |

42 | Simplifying decision trees: A survey
- Breslow, Aha
- 1997
(Show Context)
Citation Context ...he remaining algorithms rank in order of decreasing number of p -marks (in parentheses) as: POL(15), LOG(13), QL0(10), LDA(10), PDA(10), QL1(9), OCU(9), (1) QU0(8), QU1(8), C4R(8), IBO(8), RBF(8), C4T=-=(7)-=-, IMO(6), IM(5), IC1(5), ST0(5), FTU(4), IC0(4), CAL(4), IB(3), LMT(1). The top four algorithms in (1) also rank among the topsve in the upper half of Table 5. COMPARISON OF CLASSIFICATION ALGORITHMS ... |

32 | Multivariate versus univariate decision trees
- BrodIey, Utgoff
- 1992
(Show Context)
Citation Context ...N(4), LVQ(4), T1(11). Excluding these, the remaining algorithms rank in order of decreasing number of p -marks (in parentheses) as: POL(15), LOG(13), QL0(10), LDA(10), PDA(10), QL1(9), OCU(9), (1) QU0=-=(8)-=-, QU1(8), C4R(8), IBO(8), RBF(8), C4T(7), IMO(6), IM(5), IC1(5), ST0(5), FTU(4), IC0(4), CAL(4), IB(3), LMT(1). The top four algorithms in (1) also rank among the topsve in the upper half of Table 5. ... |

27 | Neural networks, decision tree induction and discriminant analysis: an empirical comparison - Curram, Mingers - 1994 |

19 | A comparison of decision tree classifiers with backpropagation neural networks for multimodal classification problems - Brown, Corruble, et al. - 1993 |

19 | Nonparametric Statistical Methods, 2nd edn - Hollander, Wolfe - 1999 |

19 |
Very simple classi rules perform well on most commonly used datasets
- Holte
- 1993
(Show Context)
Citation Context ...s versus days) that the dierences cannot be attributed to implementation alone. 2. We include some decision tree algorithms that are not included in the StatLog Project, such as S-Plus tree [14], T1 =-=[3, 25]-=-, Oc1 [38], Lmdt [9], and Quest [30]. 3. We also include several of the newest spline-based statistical algorithms. Their classication accuracy may be used as benchmarks for comparison with other alg... |

16 |
Introduction to IND Version 2.1 and Recursive Partitioning
- Buntine, Caruana
- 1992
(Show Context)
Citation Context ...re made proportional to the training sample sizes. COMPARISON OF CLASSIFICATION ALGORITHMS 3 2.1. Trees and rules CART: We use the version of Cart [6] implemented in the cart style of the Ind package =-=[13]-=- with the Gini index of diversity as the splitting criterion. The trees based on the 0-se and 1-se pruning rules are denoted by IC0 and IC1 respectively. The software is obtained from the http address... |

15 | Increasing the efficiency of data mining algorithms with breadth-first marker propagation - Aronis, Provost - 1997 |

14 |
Automatic Construction of Decision Trees for Classification
- Mùˆller, Wysotzki
- 1994
(Show Context)
Citation Context ...e use the default values in the software from http://yake.ecn.purdue.edu/~brodley/software/lmdt.html. CAL5: This is from the Fraunhofer Society, Institute for Information and Data Processing, Germany =-=[36, 37]-=-. We use version 2. Cal5 is designed speci- cally for numerical-valued attributes. However, it has a procedure to handle categorical attributes so that mixed attributes (numerical and categorical) ca... |