### Citations

6603 |
C4.5: Programs for Machine Learning
- Quinlan
- 1993
(Show Context)
Citation Context ...lit at each node. The choice of the criterion is the main difference between the various tree growing methods that have been proposed in the literature, of which CHAID [Kas80], CART [BFOS84] and C4.5 =-=[Qui93]-=- are perhaps the most popular. A leaf is a terminal node. There are 4 leaves in Figure 1. In the machine learning community, predictors are also called attributes and the outcome variable the predicte... |

5965 |
Classification and Regression Trees
- Breiman, Friedman, et al.
- 1984
(Show Context)
Citation Context ... Special attention is devoted to computational aspects. Key words: Classification tree, Deviance, Goodness-of-fit, Chi-square statistics, BIC. 1 Introduction Induced decision trees have become, since =-=[BFOS84]-=- and [Qui86], popular multivariate tools for predicting continuous dependent variables and for classifying categorical ones from a set of predictors. They are called regression trees when the outcome ... |

4373 | Induction of decision trees
- Quinlan
- 1986
(Show Context)
Citation Context ...ntion is devoted to computational aspects. Key words: Classification tree, Deviance, Goodness-of-fit, Chi-square statistics, BIC. 1 Introduction Induced decision trees have become, since [BFOS84] and =-=[Qui86]-=-, popular multivariate tools for predicting continuous dependent variables and for classifying categorical ones from a set of predictors. They are called regression trees when the outcome is quantitat... |

3478 |
The Elements of Statistical Learning
- HASTIE, TIBSHIRANI, et al.
- 2001
(Show Context)
Citation Context ...ot mention it, and, as far as this model assessment issue is concerned, statistical learning focuses almost exclusively on the statistical properties of the classification error rate (see for example =-=[HTF01]-=- chap. 7). In statistical modeling, e.g. linear regression, logistic regression or more generally generalized linear models (GLM), the goodness-of-fit is usually assessed by two kinds of measures. On ... |

3141 | Data mining: concepts and techniques
- Han, Kamber, et al.
- 2006
(Show Context)
Citation Context ...t issue very similar to that encountered in the statistical modeling of multiway cross tables. According to our knowledge, however, it has not been addressed so far for induced trees. Textbooks, like =-=[HK01]-=- or [HMS01] for example, do not mention it, and, as far as this model assessment issue is concerned, statistical learning focuses almost exclusively on the statistical properties of the classification... |

2953 |
Categorical data analysis
- Agresti
- 1990
(Show Context)
Citation Context ...(see for instance [MN89]), this is measured by minus twice the log-likelihood of the model (−2LogLik(m)) and is just the log-likelihood ratio Chi-square in the modeling of multiway contingency tables =-=[Agr90]-=-. For a 2 way r × c table, it reads for instance r∑ c∑ ( ) nij D(m) = 2 nij ln , (1) i=1 j=1 where ˆnij is the estimation of the expected count provided by the model for cell (i, j). The likelihood is... |

584 | Bayesian model selection in social research (with discussion
- Raftery, Newton, et al.
- 1995
(Show Context)
Citation Context ...stant , where n is the number of cases and d the degrees of freedom in the tree m. The constant is arbitrary, which means that only differences in BIC values matter. Recall, that according to Raftery =-=[Raf95]-=-, a difference in BIC values greater than 10 provides strong evidence for the superiority of the model with the smaller BIC, in terms of trade-off between fit and complexity. . 4 Computational aspects... |

337 |
Measures of association for cross-classifications.
- Goodman, Kruskal
- 1954
(Show Context)
Citation Context ...bution for the tree as compared with the root node. The uncertainty coefficient u of [The70], which reads u = D(m0|m)/(−2 ∑ i ni· ln(ni·/n)) in terms of the deviance, and the association measure τ of =-=[GK54]-=- are two such measures. The first is the proportion of reduction in Shannon’s entropy and the second in quadratic entropy. These two indexes produce always very close values. They evolve almost in a q... |

321 |
Discrete Multivariate Analysis,
- Bishop, Fienberg, et al.
- 1975
(Show Context)
Citation Context ...ed count provided by the model for cell (i, j). The likelihood is obtained assuming simply a multinomial distribution which is by noway restrictive. Under some regularity conditions (see for instance =-=[BFH75]-=- chap. 4), the Log-Likelihood Ratio statistic has an approximate Chi-square distribution when the model is correct. The degrees of freedom d are given by the difference between the number of cells and... |

315 |
An Exploratory Technique for Investigating Large Quantities of Categorical Data.
- Kass
- 1980
(Show Context)
Citation Context ...erion to determine the “best” split at each node. The choice of the criterion is the main difference between the various tree growing methods that have been proposed in the literature, of which CHAID =-=[Kas80]-=-, CART [BFOS84] and C4.5 [Qui93] are perhaps the most popular. A leaf is a terminal node. There are 4 leaves in Figure 1. In the machine learning community, predictors are also called attributes and t... |

314 |
The measurement of urban travel demand.
- McFadden
- 1974
(Show Context)
Citation Context ...rmation in relative terms. Pseudo R2 ’s, for instance, represent the proportion of reduction in the root node deviance that can be achieved with the tree. Such pseudo R2 ’s come in different flavors. =-=[McF74]-=- proposed simply ( D(m0)−D(m) ) /D(m0). A better choice is the improvement of [CS89]’s proposition suggested by [Nag91]: R 2 Nagelkerke = ( D(m0) − D(m) ) } 2 1 − exp{ n 1 − exp{ 2 nD(m0)} The McFadde... |

224 | Automatic construction of decision trees from data: A multi-disciplinary survey, Data Mining and Knowledge Discovery 2(4
- Murthy
- 1998
(Show Context)
Citation Context ...tifying local structures in data sets, as well as alternatives to statistical descriptive methods like linear or logistic regression, discriminant analysis, and other mathematical modeling approaches =-=[Mur98]-=-. As descriptive tools, their attractiveness lies mainly in the ease with which end users can visualize and interpret a tree structure. This is much more immediate than interpreting for instance the v... |

54 |
J.A.: Generalized linear models
- McCullagh, Nelder
- 1989
(Show Context)
Citation Context ... deviance of a statistical model m is to measure how far the model is from the target, or more specifically how far the values predicted by the model are from the target. In general (see for instance =-=[MN89]-=-), this is measured by minus twice the log-likelihood of the model (−2LogLik(m)) and is just the log-likelihood ratio Chi-square in the modeling of multiway contingency tables [Agr90]. For a 2 way r ×... |

31 |
On the estimation of relationships involving qualitative variables.
- Theil
- 1970
(Show Context)
Citation Context ...ith Nagelkerke formula we get 0.98. We may also consider the percent reduction in uncertainty of the outcome distribution for the tree as compared with the root node. The uncertainty coefficient u of =-=[The70]-=-, which reads u = D(m0|m)/(−2 ∑ i ni· ln(ni·/n)) in terms of the deviance, and the association measure τ of [GK54] are two such measures. The first is the proportion of reduction in Shannon’s entropy ... |

18 |
Analysis of binary data, 2nd edn
- Cox, Snell
- 1989
(Show Context)
Citation Context ...duction in the root node deviance that can be achieved with the tree. Such pseudo R2 ’s come in different flavors. [McF74] proposed simply ( D(m0)−D(m) ) /D(m0). A better choice is the improvement of =-=[CS89]-=-’s proposition suggested by [Nag91]: R 2 Nagelkerke = ( D(m0) − D(m) ) } 2 1 − exp{ n 1 − exp{ 2 nD(m0)} The McFadden pseudo R2 is 0.99, and with Nagelkerke formula we get 0.98. We may also consider t... |

6 | Goodness-of-fit measures for induction trees
- Ritschard, Zighed
- 2003
(Show Context)
Citation Context ... of error rate over the root node will be null, the most frequent value remaining the same for both groups. We reconsider in this paper the deviance, which we showed how it can be applied to trees in =-=[RZ03]-=-. The deviance usefully complements the error rate and permits to make some statistical inference with trees. Firstly, we give a new presentation of how the deviance that is abundantly used in statist... |

2 |
A note on the general definition of the coefficient of
- Nagelkerke
(Show Context)
Citation Context ...that can be achieved with the tree. Such pseudo R2 ’s come in different flavors. [McF74] proposed simply ( D(m0)−D(m) ) /D(m0). A better choice is the improvement of [CS89]’s proposition suggested by =-=[Nag91]-=-: R 2 Nagelkerke = ( D(m0) − D(m) ) } 2 1 − exp{ n 1 − exp{ 2 nD(m0)} The McFadden pseudo R2 is 0.99, and with Nagelkerke formula we get 0.98. We may also consider the percent reduction in uncertainty... |

1 |
691–692 Olszak, M., Ritschard, G.: The behaviour of nominal and ordinal partial association measures. The Statistician 44
- Biometrika
- 1991
(Show Context)
Citation Context ...eduction in Shannon’s entropy and the second in quadratic entropy. These two indexes produce always very close values. They evolve almost in a quadratic way from no association to perfect association =-=[OR95]-=-. Their square root is therefore more representative of the position √ between √ these two extreme situations. For our induced tree, we have u = 0.56, and τ = 0.60, indicating that we are a bit more t... |