## The Power of Decision Tables (1995)

Venue: | Proceedings of the European Conference on Machine Learning |

Citations: | 100 - 5 self |

### BibTeX

@INPROCEEDINGS{Kohavi95thepower,

author = {Ron Kohavi},

title = {The Power of Decision Tables},

booktitle = {Proceedings of the European Conference on Machine Learning},

year = {1995},

pages = {174--189},

publisher = {Springer Verlag}

}

### Years of Citing Articles

### OpenURL

### Abstract

. We evaluate the power of decision tables as a hypothesis space for supervised learning algorithms. Decision tables are one of the simplest hypothesis spaces possible, and usually they are easy to understand. Experimental results show that on artificial and real-world domains containing only discrete features, IDTM, an algorithm inducing decision tables, can sometimes outperform state-of-the-art algorithms such as C4.5. Surprisingly, performance is quite good on some datasets with continuous features, indicating that many datasets used in machine learning either do not require these features, or that these features have few values. We also describe an incremental method for performing crossvalidation that is applicable to incremental learning algorithms including IDTM. Using incremental cross-validation, it is possible to cross-validate a given dataset and IDTM in time that is linear in the number of instances, the number of features, and the number of label values. The time for incre...

### Citations

3909 |
Classification and Regression Trees
- Breiman, Friedman, et al.
- 1984
(Show Context)
Citation Context ...ry features, 2,000 training instances, and 1,186, test instances, is higher than many other state-of-the-art induction algorithms reported for this dataset in (Taylor et al. 1994). For example, CART (=-=Breiman et al. 1984-=-) achieves 91:5% accuracy, Backprop (Rumelhart, Hinton & Williams 1986) achieves 91:2% accuracy, CN2 (Clark & Niblett 1989) achieves 90:5% accuracy, and k-nearest neighbor achieves 84.5% accuracy. Tab... |

3354 | Induction of Decision Trees
- Quinlan
- 1986
(Show Context)
Citation Context ...from a table) is NP-complete. Hartmann, Varshney, Mehrotra & Gerberich (1982) show how to convert a decision table into a decision tree using mutual information. The algorithm is very similar to ID3 (=-=Quinlan 1986-=-). All these approaches, however, dealt with conversions that are information preserving, i.e., all entries in the table are correctly classified and the structures are not used for making predictions... |

2723 | Learning internal representations by error propagation - Rumelhart, Hinton, et al. - 1986 |

777 |
C4.5: Programs for
- Quinlan
- 1993
(Show Context)
Citation Context ...res. To determine how weak the performance of IDTM is on datasets with continuous features, we also report on such experiments. Surprisingly, performance is not significantly worse than that of C4.5 (=-=Quinlan 1993-=-) in some cases. On those that performance is not significantly worse than C4.5, the algorithm ignores the continuous features or uses those features that have few values. The paper is organized as fo... |

747 | The CN2 induction algorithm - Clark, Niblett - 1989 |

741 | Aha, “UCI repository of machine learning data bases,” http: //www.ics.uci.edu/~mlearn/MLRepository.html - Murphy, W - 1992 |

721 |
Cross-Validatory Choices and Assessment of Statistical Prediction (with Discussion
- Stone
- 1974
(Show Context)
Citation Context ...cy than the current best estimate. To estimate future prediction accuracy, cross-validation, a standard accuracy estimation technique (Weiss & Kulikowski 1991, Breiman, Friedman, Olshen & Stone 1984, =-=Stone 1974-=-), is used. Given an induction algorithm and a dataset, kfold cross-validation splits the data into k approximately equally sized partitions, or folds. The induction algorithm is executed k times; eac... |

595 | Irrelevant Features and the Subset Selection Problem - John, Kohavi, et al. - 1994 |

444 |
Rough sets
- Pawlak
- 1982
(Show Context)
Citation Context ...he table are correctly classified and the structures are not used for making predictions. The rough sets community has been using the hypothesis space of decision tables for a few years (Pawlak 1987, =-=Pawlak 1991-=-, Slowinski 1992). Researchers in the field of rough sets suggest using the degrees-of-dependency of a feature on the label (called fl) to determine which features should be included in a decision tab... |

438 | Very simple classification rules perform well on most commonly used datasets - Holte - 1993 |

365 | Computer Systems that Learn - Weiss, Kulikowski - 1995 |

212 | Learning with Many Irrelevant Features - Almuallim, Dietterich - 1991 |

211 |
Estimating the error rate of a prediction rule: improvement on crossvalidation
- Efron
- 1983
(Show Context)
Citation Context ... for the incremental operations. 3.3 Choosing the Number of Folds The time to incrementally cross-validate an IDTM and a dataset for any number of folds is the same. Leave-one-out is almost unbiased (=-=Efron 1983-=-) and was commonly considered the preferred method for cross-validation. Recently Zhang (1992) and Shao (1993) proved that, for linear models, using leave-one-out crossvalidation for model selection i... |

211 | Induction of selective Bayesian classifiers - Langley, Sage - 1994 |

186 | Greedy attribute selection - Caruana, Freitag - 1994 |

184 | Constructing optimal binary decision trees is NPcomplete - Hyafil, Rivest - 1976 |

146 |
A Conservation Law for Generalization Performance
- Schaffer
- 1994
(Show Context)
Citation Context ...e (1993), although he used a different algorithm. The IDTM algorithm described here performs better than Holte's algorithm and sometimes outperforms C4.5. Generalization without a bias is impossible (=-=Schaffer 1994-=-, Wolpert 1994). IDTM is biased to select a feature subset maximizing cross-validation accuracy estimates. When the estimates are good, IDTM should choose a feature subset that leads to high predictio... |

128 | Efficient algorithms for minimizing cross validation error - Moore, Lee - 1994 |

105 | A comparative evaluation of sequential feature selection algorithms
- Aha, Bankert
- 1996
(Show Context)
Citation Context ... lines.) 4 Experiments with IDTM We now describe experiments conducted with IDTM, the induction algorithm for DTMs. The experiments were done on all the datasets at the UC Irvine repository (Murphy & =-=Aha 1994-=-) and StatLog repository (Taylor, Michie & Spiegalhalter 1994) that contain only discrete features. To test the performance on datasets with continuous features, we chose the rest of the StatLog datas... |

101 | Hoeffding races: Accelerating model selection search for classification and function approximation - Maron, Moore - 1994 |

73 | Introduction to Algorithms. McGraw-Hill and - Cormen, Leiserson, et al. - 1990 |

46 |
Feature Selection Using Rough Sets Theory
- Modrzejewski
- 1993
(Show Context)
Citation Context .... Researchers in the field of rough sets suggest using the degrees-of-dependency of a feature on the label (called fl) to determine which features should be included in a decision table (Ziarko 1991, =-=Modrzejewski 1993-=-). Another suggestion was to use normalized entropy (Pawlak, Wong & Ziarko 1988), which is similar to the information gain measure of ID3. These approaches ignore the utility of the specific features ... |

46 | Rough sets: probabilistic versus deterministic approach - Pawlak, Wong, et al. - 1988 |

45 | Bottom-up Induction of Oblivious Read-Once Decision Diagrams
- Kohavi
- 1994
(Show Context)
Citation Context ..., Moore & Lee 1994, Caruana & Freitag 1994, Kohavi & Frasca 1994, Langley & Sage 1994, Aha & Bankert 1994). Decision tables have a bias similar to that of oblivious read-once decision graphs (OODGs) (=-=Kohavi 1994-=-a, Kohavi 1994b): all of the features chosen for the schema are tested during classification. This implies that it is easy to convert a decision table into an OODG, perhaps making it more comprehensib... |

45 | An improved algorithm for incremental induction of decision trees - Utgoff - 1994 |

42 | An Empirical Investigation of Brute Force to choose Features, Smoothers and Function Approximators - Moore, Hill, et al. - 1992 |

40 | The relationship between PAC, the statistical physics framework, the Bayesian framework, and the VC framework. In The mathematics of generalization, edited by D. Wo1pert
- Wolpert
- 1995
(Show Context)
Citation Context ...ugh he used a different algorithm. The IDTM algorithm described here performs better than Holte's algorithm and sometimes outperforms C4.5. Generalization without a bias is impossible (Schaffer 1994, =-=Wolpert 1994-=-). IDTM is biased to select a feature subset maximizing cross-validation accuracy estimates. When the estimates are good, IDTM should choose a feature subset that leads to high prediction accuracy. Ou... |

37 | Feature subset selection as search with probabilistic estimates
- Kohavi
- 1994
(Show Context)
Citation Context ..., Moore & Lee 1994, Caruana & Freitag 1994, Kohavi & Frasca 1994, Langley & Sage 1994, Aha & Bankert 1994). Decision tables have a bias similar to that of oblivious read-once decision graphs (OODGs) (=-=Kohavi 1994-=-a, Kohavi 1994b): all of the features chosen for the schema are tested during classification. This implies that it is easy to convert a decision table into an OODG, perhaps making it more comprehensib... |

34 |
Optimal binary identification procedures
- Garey
- 1972
(Show Context)
Citation Context ... measure of optimality using branch and bound procedures (Reinwald & Soland 1966, Reinwald & Soland 1967). In the early seventies, these procedures were improved using dynamic programming techniques (=-=Garey 1972-=-, Schumacher & Sevcik 1976). Hyafil & Rivest (1976) showed that building an optimal decision tree from instances (or from a table) is NP-complete. Hartmann, Varshney, Mehrotra & Gerberich (1982) show ... |

26 | Useful Feature Subsets and Rough Set Reducts
- Kohavi, Frasca
- 1994
(Show Context)
Citation Context ..., Moore & Lee 1994, Caruana & Freitag 1994, Kohavi & Frasca 1994, Langley & Sage 1994, Aha & Bankert 1994). Decision tables have a bias similar to that of oblivious read-once decision graphs (OODGs) (=-=Kohavi 1994-=-a, Kohavi 1994b): all of the features chosen for the schema are tested during classification. This implies that it is easy to convert a decision table into an OODG, perhaps making it more comprehensib... |

17 |
Intelligent Decision Support: Handbook of Applications and
- Slowinski
- 1992
(Show Context)
Citation Context ...correctly classified and the structures are not used for making predictions. The rough sets community has been using the hypothesis space of decision tables for a few years (Pawlak 1987, Pawlak 1991, =-=Slowinski 1992-=-). Researchers in the field of rough sets suggest using the degrees-of-dependency of a feature on the label (called fl) to determine which features should be included in a decision table (Ziarko 1991,... |

15 | Application of information theory to the construction of efficient decision trees - Hartmann, Varshney, et al. - 1982 |

11 | On the distributional properties of model selection criteria - ZHANG - 1992 |

10 | Conversion of limited-entry decision tables to optimal computer programs I: Minimum average processing time - Reinwald, Soland - 1966 |

9 | Optimal Subset Selection - Boyce, Farhi, et al. - 1974 |

8 | Small sample error rate estimation for k-nearest neighbor classifiers - Weiss - 1991 |

5 | Cross-validated c4.5: Using error estimation for automatic parameter selection - John - 1994 |

3 |
The synthetic approach to decision table conversion
- Sevcik
- 1976
(Show Context)
Citation Context ...ing branch and bound procedures (Reinwald & Soland 1966, Reinwald & Soland 1967). In the early seventies, these procedures were improved using dynamic programming techniques (Garey 1972, Schumacher & =-=Sevcik 1976-=-). Hyafil & Rivest (1976) showed that building an optimal decision tree from instances (or from a table) is NP-complete. Hartmann, Varshney, Mehrotra & Gerberich (1982) show how to convert a decision ... |

3 | The monk's problems: A performance comparison of different learning algorithms - etal - 1991 |

2 |
Decision tables --- a rough sets approach
- Pawlak
- 1987
(Show Context)
Citation Context ... entries in the table are correctly classified and the structures are not used for making predictions. The rough sets community has been using the hypothesis space of decision tables for a few years (=-=Pawlak 1987-=-, Pawlak 1991, Slowinski 1992). Researchers in the field of rough sets suggest using the degrees-of-dependency of a feature on the label (called fl) to determine which features should be included in a... |

1 | On learning more concepts - Dietterich - 1992 |

1 | Linear model seletion via cross-validation - Shao - 1993 |