Results 1 
6 of
6
Data Mining in Social Networks
 In National Academy of Sciences Symposium on Dynamic Social Network Modeling and Analysis
, 2002
"... Abstract. Several techniques for learning statistical models have been developed recently by researchers in machine learning and data mining. All of these techniques must address a similar set of representational and algorithmic choices and must face a set of statistical challenges unique to learnin ..."
Abstract

Cited by 33 (1 self)
 Add to MetaCart
(Show Context)
Abstract. Several techniques for learning statistical models have been developed recently by researchers in machine learning and data mining. All of these techniques must address a similar set of representational and algorithmic choices and must face a set of statistical challenges unique to learning from relational data.
Using a Permutation Test for Attribute Selection in Decision Trees
 International Conference on Machine Learning
, 1998
"... Most techniques for attribute selection in decision trees are biased towards attributes with many values, and several ad hoc solutions to this problem have appeared in the machine learning literature. Statistical tests for the existence of an association with a prespecified significance level provid ..."
Abstract

Cited by 21 (2 self)
 Add to MetaCart
Most techniques for attribute selection in decision trees are biased towards attributes with many values, and several ad hoc solutions to this problem have appeared in the machine learning literature. Statistical tests for the existence of an association with a prespecified significance level provide a wellfounded basis for addressing the problem. However, many statistical tests are computed from a chisquared distribution, which is only a valid approximation to the actual distribution in the largesample caseand this patently does not hold near the leaves of a decision tree. An exception is the class of permutation tests. We describe how permutation tests can be applied to this problem. We choose one such test for further exploration, and give a novel twostage method for applying it to select attributes in a decision tree. Results on practical datasets compare favorably with other methods that also adopt a prepruning strategy. 1 Introduction Statistical tests provide a set of th...
ReducedError Pruning With Significance Tests
 Available: http://libra.msra.cn/paperdetail.aspx?id=305368
, 1998
"... When building classification models, it is common practice to prune them to counter spurious effects of the training data: this often improves performance and reduces model size. "Reducederror pruning" is a fast pruning procedure for decision trees that is known to produce small and accur ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
(Show Context)
When building classification models, it is common practice to prune them to counter spurious effects of the training data: this often improves performance and reduces model size. "Reducederror pruning" is a fast pruning procedure for decision trees that is known to produce small and accurate trees. Apart from the data from which the tree is grown, it uses an independent "pruning" set, and pruning decisions are based on the model's error rate on this fresh data. Recently it has been observed that reducederror pruning overfits the pruning data, producing unnecessarily large decision trees. This paper investigates whether standard statistical significance tests can be used to counter this phenomenon. The problem of overfitting to the pruning set highlights the need for significance testing. We investigate two classes of test, "parametric" and "nonparametric." The standard chisquared statistic can be used both in a parametric test and as the basis for a nonparametric permutation tes...
unknown title
"... Most techniques for attribute selection in decision trees are biased towards attributes with many values, and several ad hoc solutions to this problem have appeared in the machine learning literature. Statistical tests for the existence of an association with a prespecified significance level provid ..."
Abstract
 Add to MetaCart
Most techniques for attribute selection in decision trees are biased towards attributes with many values, and several ad hoc solutions to this problem have appeared in the machine learning literature. Statistical tests for the existence of an association with a prespecified significance level provide a wellfounded basis for addressing the problem. However, many statistical tests are computed from a chisquared distribution, which is only a valid approximation to the actual distribution in the largesample caseāand this patently does not hold near the leaves of a decision tree. An exception is the class of permutation tests. We describe how permutation tests can be applied to this problem. We choose one such test for further exploration, and give a novel twostage method for applying it to select attributes in a decision tree. Results on practical datasets compare favorably with other methods that also adopt a prepruning strategy. 1