Results 1 - 10
of
27
Decision Tree Induction Based on Efficient Tree Restructuring
- Machine Learning
, 1996
"... . The ability to restructure a decision tree efficiently enables a variety of approaches to decision tree induction that would otherwise be prohibitively expensive. Two such approaches are described here, one being incremental tree induction (ITI), and the other being non-incremental tree induction ..."
Abstract
-
Cited by 110 (5 self)
- Add to MetaCart
. The ability to restructure a decision tree efficiently enables a variety of approaches to decision tree induction that would otherwise be prohibitively expensive. Two such approaches are described here, one being incremental tree induction (ITI), and the other being non-incremental tree induction using a measure of tree quality instead of test quality (DMTI). These approaches and several variants offer new computational and classifier characteristics that lend themselves to particular applications. Keywords: decision tree, incremental induction, direct metric, binary test, example incorporation, missing value, tree transposition, installed test, virtual pruning, update cost. 1. Introduction Decision tree induction offers a highly practical method for generalizing from examples whose class membership is known. The most common approach to inducing a decision tree is to partition the labelled examples recursively until a stopping criterion is met. The partition is defined by selectin...
On Biases in Estimating Multi-Valued Attributes
, 1995
"... We analyse the biases of eleven measures for estimating the quality of the multi-valued attributes. The values of information gain, J- measure, gini-index, and relevance tend to linearly increase with the number of values of an attribute. The values of gain-ratio, distance measure, Relief , and the ..."
Abstract
-
Cited by 70 (4 self)
- Add to MetaCart
We analyse the biases of eleven measures for estimating the quality of the multi-valued attributes. The values of information gain, J- measure, gini-index, and relevance tend to linearly increase with the number of values of an attribute. The values of gain-ratio, distance measure, Relief , and the weight of evidence decrease for informative attributes and increase for irrelevant attributes. The bias of the statistic tests based on the chi-square distribution is similar but these functions are not able to discriminate among the attributes of different quality. We also introduce a new function based on the MDL principle whose value slightly decreases with the increasing number of attribute's values. 1 Introduction In top down induction of decision trees various impurity functions are used to estimate the quality of attributes in order to select the "best" one to split on. However, various heuristics tend to overestimate the multi-valued attributes. One possible approach to this proble...
Fast Sequential and Parallel Algorithms for Association Rule Mining: A Comparison
, 1995
"... The field of knowledge discovery in databases, or "Data Mining", has received increasing attention during recent years as large organizations have begun to realize the potential value of the information that is stored implicitly in their databases. One specific data mining task is the mining of Asso ..."
Abstract
-
Cited by 61 (0 self)
- Add to MetaCart
The field of knowledge discovery in databases, or "Data Mining", has received increasing attention during recent years as large organizations have begun to realize the potential value of the information that is stored implicitly in their databases. One specific data mining task is the mining of Association Rules, particularly from retail data. The task is to determine patterns (or rules) that characterize the shopping behavior of customers from a large database of previous consumer transactions. The rules can then be used to focus marketing efforts such as product placement and sales promotions. Because early algorithms required an unpredictably large number of IO operations, reducing IO cost has been the primary target of the algorithms presented in the literature. One of the most recent proposed algorithms, called PARTITION, uses a new TID-list data representation and a new partitioning technique. The partitioning technique reduces IO cost to a constant amount by processing one datab...
Efficient Learning of Selective Bayesian Network Classifiers
, 1995
"... In this paper, we present a computationally efficient method for inducing selective Bayesian network classifiers. Our approach is to use informationtheoretic metrics to efficiently select a subset of attributes from which to learn the classifier. We explore three conditional, information-theoretic ..."
Abstract
-
Cited by 42 (3 self)
- Add to MetaCart
In this paper, we present a computationally efficient method for inducing selective Bayesian network classifiers. Our approach is to use informationtheoretic metrics to efficiently select a subset of attributes from which to learn the classifier. We explore three conditional, information-theoretic metrics that are extensions of metrics used extensively in decision tree learning, namely Quinlan's gain and gain ratio metrics and Mantaras's distance metric. We experimentally show that the algorithms based on gain ratio and distance metric learn selective Bayesian networks that have predictive accuracies as good as or better than those learned by existing selective Bayesian network induction approaches (K2-AS), but at a significantly lower computational cost. We prove that the subset-selection phase of these information-based algorithms has polynomial complexity as compared to the worst-case exponential time complexity of the corresponding phase in K2-AS. We also compare the performance o...
Classification trees with unbiased multiway splits
- Journal of the American Statistical Association
, 2001
"... Two univariate split methods and one linear combination split method are proposed for the construction of classification trees with multiway splits. Examples are given where the trees are more compact and hence easier to interpret than binary trees. A major strength of the univariate split methods i ..."
Abstract
-
Cited by 35 (6 self)
- Add to MetaCart
Two univariate split methods and one linear combination split method are proposed for the construction of classification trees with multiway splits. Examples are given where the trees are more compact and hence easier to interpret than binary trees. A major strength of the univariate split methods is that they have negligible bias in variable selection, both when the variables differ in the number of splits they offer and when they differ in number of missing values. This is an advantage because inferences from the tree structures can be adversely affected by selection bias. The new methods are shown to be highly competitive in terms of computational speed and classification accuracy of future observations. Key words and phrases: Decision tree, linear discriminant analysis, missing value, selection bias. 1
General and Efficient Multisplitting of Numerical Attributes
, 1999
"... . Often in supervised learning numerical attributes require special treatment and do not fit the learning scheme as well as one could hope. Nevertheless, they are common in practical tasks and, therefore, need to be taken into account. We characterize the well-behavedness of an evaluation function, ..."
Abstract
-
Cited by 31 (7 self)
- Add to MetaCart
. Often in supervised learning numerical attributes require special treatment and do not fit the learning scheme as well as one could hope. Nevertheless, they are common in practical tasks and, therefore, need to be taken into account. We characterize the well-behavedness of an evaluation function, a property that guarantees the optimal multi-partition of an arbitrary numerical domain to be defined on boundary points. Well-behavedness reduces the number of candidate cut points that need to be examined in multisplitting numerical attributes. Many commonly used attribute evaluation functions possess this property; we demonstrate that the cumulative functions Information Gain and Training Set Error as well as the non-cumulative functions Gain Ratio and Normalized Distance Measure are all well-behaved. We also devise a method of finding optimal multisplits efficiently by examining the minimum number of boundary point combinations that is required to produce partitions which are optimal wit...
An Exact Probability Metric for Decision Tree Splitting
- Machine Learning
, 1997
"... ID3's information gain heuristic is well-known to be biased towards multi-valued attributes. This bias is only partially compensated by the gain ratio used in C4.5. Several alternatives have been proposed, notably orthogonality and Beta. Gain ratio and orthogonality are strongly correlated, and all ..."
Abstract
-
Cited by 27 (3 self)
- Add to MetaCart
ID3's information gain heuristic is well-known to be biased towards multi-valued attributes. This bias is only partially compensated by the gain ratio used in C4.5. Several alternatives have been proposed, notably orthogonality and Beta. Gain ratio and orthogonality are strongly correlated, and all of the metrics share a common bias towards splits with one or more small expected values, under circumstances where the split likely ocurred by chance. Both classical and Bayesian statistics lead to the multiple hypergeometric distribution as the posterior probability of the null hypothesis. Both gain and the chi-squared significance test are shown to arise in asymptotic approximations to the hypergeometric, revealing similar criteria for admissibility and showing the nature of their biases. Previous failures to find admissible stopping rules are traced to coupling these biased approximations with one another or with arbitrary thresholds; problems which are overcome by the hypergeometric. Em...
Memory-Based Grammatical Relation Finding
, 2002
"... This memory is called the instance base. For testing, those training instances that are most similar to the test instance are retrieved from memory and their labels are used to assign a label to the test instance. This direct use of all training instances (lazy learning) contrasts with the eager lea ..."
Abstract
-
Cited by 25 (0 self)
- Add to MetaCart
This memory is called the instance base. For testing, those training instances that are most similar to the test instance are retrieved from memory and their labels are used to assign a label to the test instance. This direct use of all training instances (lazy learning) contrasts with the eager learning of e.g. decision tree or rule learning algorithms which derive an abstract representation from the training instances and then use only this representation when processing test instances
Techniques for Dealing with Missing Values in Classification
, 1997
"... . A brief overview of the history of the development of decision tree induction algorithms is followed by a review of techniques for dealing with missing attribute values in the operation of these methods. The technique of dynamic path generation is described in the context of treebased classificati ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
. A brief overview of the history of the development of decision tree induction algorithms is followed by a review of techniques for dealing with missing attribute values in the operation of these methods. The technique of dynamic path generation is described in the context of treebased classification methods. The waste of data which can result from casewise deletion of missing values in statistical algorithms is discussed and alternatives proposed. Keywords: Missing values, Dynamic path generation, Intelligent data analysis, Inductive learning, Knowledge discovery, Data mining, Machine learning. 1 Introduction In the information age, data is generated almost everywhere: satellites orbiting the moons of Jupiter; submarines in the deepest ocean trench; even electronic point of sale machines in the high street produce data. All of these systems generate millions of megabytes of data every day. Some of these data contain information that could lead to important discoveries in science; s...

