## On the Boosting Ability of Top-Down Decision Tree Learning Algorithms (1995)

### Cached

### Download Links

- [ftp.math.tau.ac.il]
- [www.research.att.com]
- [www.math.tau.ac.il]
- [www.cs.tau.ac.il]
- [www.math.tau.ac.il]
- [www.cis.upenn.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | In Proceedings of the Twenty-Eighth Annual ACM Symposium on the Theory of Computing |

Citations: | 89 - 6 self |

### BibTeX

@INPROCEEDINGS{Kearns95onthe,

author = {Michael Kearns and Yishay Mansour},

title = {On the Boosting Ability of Top-Down Decision Tree Learning Algorithms},

booktitle = {In Proceedings of the Twenty-Eighth Annual ACM Symposium on the Theory of Computing},

year = {1995},

pages = {459--468},

publisher = {ACM Press}

}

### Years of Citing Articles

### OpenURL

### Abstract

We analyze the performance of top-down algorithms for decision tree learning, such as those employed by the widely used C4.5 and CART software packages. Our main result is a proof that such algorithms are boosting algorithms. By this we mean that if the functions used to label the internal nodes of the decision tree can weakly approximate the unknown target function, then the top-down algorithms we study will amplify this weak advantage to build a tree achieving any desired level of accuracy. The bounds we obtain for this amplification show an interesting dependence on the splitting criterion function G used by the top-down algorithm. More precisely, if the functions used to label the internal nodes have error 1=2 \Gamma fl as approximations to the target function, then for the splitting criteria used by CART and C4.5, trees of size (1=ffl) O(1=fl 2 ffl 2 ) and (1=ffl) O(log(1=ffl)=fl 2 ) (respectively) suffice to drive the error below ffl. Thus, small constant advantage over...

### Citations

5232 |
C4.5: Programs for Machine Learning
- Quinlan
- 1993
(Show Context)
Citation Context ...g" the data reaching this leaf into two new leaves, and reducing the empirical error on the given sample. The tremendous popularity of such programs (which include the software packages C4.5 and =-=CART [13, 3]-=-) is due to their efficiency and simplicity, the advantages of using decision trees (such as potential interpretability to humans), and of course, to their success in generating trees with good genera... |

4172 |
Classification and Regression Trees
- Breiman, Freidman, et al.
- 1984
(Show Context)
Citation Context ...g" the data reaching this leaf into two new leaves, and reducing the empirical error on the given sample. The tremendous popularity of such programs (which include the software packages C4.5 and =-=CART [13, 3]-=-) is due to their efficiency and simplicity, the advantages of using decision trees (such as potential interpretability to humans), and of course, to their success in generating trees with good genera... |

2426 | A Decision-Theoretic Generalization of Online Learning and an Application to Boosting
- Freund, Schapire
- 1997
(Show Context)
Citation Context ...f P , setting the probability of this subset to zero, and renormalizing. Thus, the resulting class of filtered distributions is in some sense simpler than those generated by other boosting algorithms =-=[14, 7, 8]-=-. Our goal now is to obtain for each G a lower bound on the local drop G(q) \Gamma (1 \Gammas)G(p) \Gamma G(r) to G t under the condition (1 \Gammas)ffisflq(1 \Gamma q) given by Lemma 5.1. We emphasiz... |

1734 | A theory of the learnable - Valiant - 1984 |

1719 | Experiments with a new boosting algorithm - Freund, Schapire - 1996 |

687 | The strength of weak learnability
- Schapire
- 1990
(Show Context)
Citation Context ...5 and CART. In this paper, we attempt to remedy this situation by examining top-down decision tree learning algorithms in the model of weak learning that has been the focus of several previous papers =-=[14, 8, 7]-=-. In the language of weak learning, we prove here that the standard top-down decision tree algorithms are in fact boosting algorithms. By this we mean that if we make a favorable and apparently necess... |

434 | Boosting a weak learning algorithm by majority
- Freund
- 1995
(Show Context)
Citation Context ...5 and CART. In this paper, we attempt to remedy this situation by examining top-down decision tree learning algorithms in the model of weak learning that has been the focus of several previous papers =-=[14, 8, 7]-=-. In the language of weak learning, we prove here that the standard top-down decision tree algorithms are in fact boosting algorithms. By this we mean that if we make a favorable and apparently necess... |

308 | Cryptographic Limitations on Learning Boolean Formulae and Finite Automata
- Kearns, Valiant
- 1994
(Show Context)
Citation Context ...say something about the relationship between F and the target function f . In the next section, we adopt the Weak Hypothesis Assumption (motivated by and closely related to the model of Weak Learning =-=[10, 14, 8]-=-) to quantify this relationship. We defer detailed discussion of the choice of the permissible splitting criterion G, since one of the main results of our analysis is a rather precise reason why some ... |

199 | Efficient distribution-free learning of probabilistic concepts
- Kearns, Schapire
- 1994
(Show Context)
Citation Context ...he statement holds for any P 2 P. We call the parameter fl the advantage. It is worth mentioning that this definition can be extended to the case where F is a class of probabilistic boolean functions =-=[12]-=-. All of our results hold for this more general setting. Note that if F actually contains the function f , then f trivially 1=2-satisfies the Weak Hypothesis Assumption with respect to F . If F does n... |

195 | Classi - cation and regression trees - Breiman, Friedman, et al. - 1984 |

188 | Learning decision trees using the fourier spectrum
- Kushilevitz, Mansour
- 1993
(Show Context)
Citation Context ...variants of this approach that have been proposed to date) [2]. The positive results for efficient decision tree learning in computational learning theory all make extensive use of membership queries =-=[11, 5, 4, 9]-=-, which provide the learning algorithm with black-box access to the target function (experimentation), rather than only an oracle for random examples. Clearly, the need for membership queries severely... |

174 |
An empirical comparison of selection measures for decision tree induction
- Mingers
- 1989
(Show Context)
Citation Context ...enced by the fact that the two most popular decision tree learning packages (C4.5 and CART) use different choices for G. There have also been a number of experimental papers examining various choices =-=[12, 6]-=-. Perhaps the insights in this paper most relevant to the practice of machine learning are those regarding the behavior of TopDown F;G as a function of G. 3 The Weak Hypothesis Assumption We now quant... |

165 | An efficient membership-query algorithm for learning dnf with respect to the uniform distribution
- Jackson
- 1995
(Show Context)
Citation Context ...variants of this approach that have been proposed to date) [2]. The positive results for efficient decision tree learning in computational learning theory all make extensive use of membership queries =-=[11, 5, 4, 9]-=-, which provide the learning algorithm with black-box access to the target function (experimentation), rather than only an oracle for random examples. Clearly, the need for membership queries severely... |

123 | Weakly learning DNF and characterizing statistical query learning using Fourier analysis - Blum, Furst, et al. - 1994 |

102 |
A Further Comparison of Splitting Rules for Decision Tree Induction
- Buntine, Niblett
- 1992
(Show Context)
Citation Context ...enced by the fact that the two most popular decision tree learning packages (C4.5 and CART) use different choices for G. There have also been a number of experimental papers examining various choices =-=[12, 6]-=-. Perhaps the insights in this paper most relevant to the practice of machine learning are those regarding the behavior of TopDown F;G as a function of G. 3 The Weak Hypothesis Assumption We now quant... |

82 |
Exact learning via the monotone theory
- Bshouty
- 1995
(Show Context)
Citation Context ...variants of this approach that have been proposed to date) [2]. The positive results for efficient decision tree learning in computational learning theory all make extensive use of membership queries =-=[11, 5, 4, 9]-=-, which provide the learning algorithm with black-box access to the target function (experimentation), rather than only an oracle for random examples. Clearly, the need for membership queries severely... |

47 | Applying the weak learning framework to understand and improve c4.5
- Dietterich, Kearns, et al.
- 1996
(Show Context)
Citation Context ...r G by C4.5 and CART, and indicates that both may be inferior to a new choice for G suggested by our analysis. (Preliminary experiments supporting this view are reported in our recent follow-up paper =-=[7]-=-.) In addition to providing a nontrivial analysis of the performance of top-down decision tree learning, the proof of our results gives a number of specific technical insights into the success and lim... |

17 | Simple learning algorithms for decision trees and multivariate polynomials
- Bshouty, Mansour
- 1995
(Show Context)
Citation Context |

17 | E cient distribution-free learning of probabilisitic concepts - Kearns, Shapire - 1994 |

3 |
Experiments with a new boosting algorithm. Unpublished manuscript
- Freund, Shapire
- 1996
(Show Context)
Citation Context ...here that despite the fact that existing boosting algorithms enjoy significantly better bounds than those given here for top-down heuristics, the algorithms are actually fairly comparable in practice =-=[8, 7]-=-. In a recent experimental paper investigating some of the theoretical issues raised here, we argue that this disparity between theory and experiment can largely be explained by regarding the advantag... |

3 | Some experiments with a new boosting algorithm - Freund, Schapire - 1996 |