Results 1  10
of
33
Decision Tree Induction Based on Efficient Tree Restructuring
 Machine Learning
, 1996
"... . The ability to restructure a decision tree efficiently enables a variety of approaches to decision tree induction that would otherwise be prohibitively expensive. Two such approaches are described here, one being incremental tree induction (ITI), and the other being nonincremental tree induction ..."
Abstract

Cited by 119 (5 self)
 Add to MetaCart
. The ability to restructure a decision tree efficiently enables a variety of approaches to decision tree induction that would otherwise be prohibitively expensive. Two such approaches are described here, one being incremental tree induction (ITI), and the other being nonincremental tree induction using a measure of tree quality instead of test quality (DMTI). These approaches and several variants offer new computational and classifier characteristics that lend themselves to particular applications. Keywords: decision tree, incremental induction, direct metric, binary test, example incorporation, missing value, tree transposition, installed test, virtual pruning, update cost. 1. Introduction Decision tree induction offers a highly practical method for generalizing from examples whose class membership is known. The most common approach to inducing a decision tree is to partition the labelled examples recursively until a stopping criterion is met. The partition is defined by selectin...
On biases in estimating multivalued attributes
 In Proceedings of the 14th International Joint Conference on Artificial Intelligence
, 1995
"... We analyse the biases of eleven measures for estimating the quality of the multivalued attributes. The values of information gain, Jmeasure, giniindex, and relevance tend to linearly increase with the number of values of an attribute. The values of gainratio, distance measure, Relief, and the we ..."
Abstract

Cited by 76 (5 self)
 Add to MetaCart
We analyse the biases of eleven measures for estimating the quality of the multivalued attributes. The values of information gain, Jmeasure, giniindex, and relevance tend to linearly increase with the number of values of an attribute. The values of gainratio, distance measure, Relief, and the weight of evidence decrease for informative attributes and increase for irrelevant attributes. The bias of the statistic tests based on the chisquare distribution is similar but these functions are not able to discriminate among the attributes of different quality. We also introduce a new function based on the MDL principle whose value slightly decreases with the increasing number of attributeâ€™s values. 1
Fast Sequential and Parallel Algorithms for Association Rule Mining: A Comparison
, 1995
"... The field of knowledge discovery in databases, or "Data Mining", has received increasing attention during recent years as large organizations have begun to realize the potential value of the information that is stored implicitly in their databases. One specific data mining task is the mining of Asso ..."
Abstract

Cited by 69 (0 self)
 Add to MetaCart
The field of knowledge discovery in databases, or "Data Mining", has received increasing attention during recent years as large organizations have begun to realize the potential value of the information that is stored implicitly in their databases. One specific data mining task is the mining of Association Rules, particularly from retail data. The task is to determine patterns (or rules) that characterize the shopping behavior of customers from a large database of previous consumer transactions. The rules can then be used to focus marketing efforts such as product placement and sales promotions. Because early algorithms required an unpredictably large number of IO operations, reducing IO cost has been the primary target of the algorithms presented in the literature. One of the most recent proposed algorithms, called PARTITION, uses a new TIDlist data representation and a new partitioning technique. The partitioning technique reduces IO cost to a constant amount by processing one datab...
Efficient Learning of Selective Bayesian Network Classifiers
, 1995
"... In this paper, we present a computationally efficient method for inducing selective Bayesian network classifiers. Our approach is to use informationtheoretic metrics to efficiently select a subset of attributes from which to learn the classifier. We explore three conditional, informationtheoretic ..."
Abstract

Cited by 49 (4 self)
 Add to MetaCart
In this paper, we present a computationally efficient method for inducing selective Bayesian network classifiers. Our approach is to use informationtheoretic metrics to efficiently select a subset of attributes from which to learn the classifier. We explore three conditional, informationtheoretic metrics that are extensions of metrics used extensively in decision tree learning, namely Quinlan's gain and gain ratio metrics and Mantaras's distance metric. We experimentally show that the algorithms based on gain ratio and distance metric learn selective Bayesian networks that have predictive accuracies as good as or better than those learned by existing selective Bayesian network induction approaches (K2AS), but at a significantly lower computational cost. We prove that the subsetselection phase of these informationbased algorithms has polynomial complexity as compared to the worstcase exponential time complexity of the corresponding phase in K2AS. We also compare the performance o...
Classification trees with unbiased multiway splits
 Journal of the American Statistical Association
, 2001
"... Two univariate split methods and one linear combination split method are proposed for the construction of classification trees with multiway splits. Examples are given where the trees are more compact and hence easier to interpret than binary trees. A major strength of the univariate split methods i ..."
Abstract

Cited by 42 (8 self)
 Add to MetaCart
Two univariate split methods and one linear combination split method are proposed for the construction of classification trees with multiway splits. Examples are given where the trees are more compact and hence easier to interpret than binary trees. A major strength of the univariate split methods is that they have negligible bias in variable selection, both when the variables differ in the number of splits they offer and when they differ in number of missing values. This is an advantage because inferences from the tree structures can be adversely affected by selection bias. The new methods are shown to be highly competitive in terms of computational speed and classification accuracy of future observations. Key words and phrases: Decision tree, linear discriminant analysis, missing value, selection bias. 1
General and Efficient Multisplitting of Numerical Attributes
, 1999
"... . Often in supervised learning numerical attributes require special treatment and do not fit the learning scheme as well as one could hope. Nevertheless, they are common in practical tasks and, therefore, need to be taken into account. We characterize the wellbehavedness of an evaluation function, ..."
Abstract

Cited by 40 (7 self)
 Add to MetaCart
. Often in supervised learning numerical attributes require special treatment and do not fit the learning scheme as well as one could hope. Nevertheless, they are common in practical tasks and, therefore, need to be taken into account. We characterize the wellbehavedness of an evaluation function, a property that guarantees the optimal multipartition of an arbitrary numerical domain to be defined on boundary points. Wellbehavedness reduces the number of candidate cut points that need to be examined in multisplitting numerical attributes. Many commonly used attribute evaluation functions possess this property; we demonstrate that the cumulative functions Information Gain and Training Set Error as well as the noncumulative functions Gain Ratio and Normalized Distance Measure are all wellbehaved. We also devise a method of finding optimal multisplits efficiently by examining the minimum number of boundary point combinations that is required to produce partitions which are optimal wit...
An Exact Probability Metric for Decision Tree Splitting
 Machine Learning
, 1997
"... ID3's information gain heuristic is wellknown to be biased towards multivalued attributes. This bias is only partially compensated by the gain ratio used in C4.5. Several alternatives have been proposed, notably orthogonality and Beta. Gain ratio and orthogonality are strongly correlated, and all ..."
Abstract

Cited by 32 (3 self)
 Add to MetaCart
ID3's information gain heuristic is wellknown to be biased towards multivalued attributes. This bias is only partially compensated by the gain ratio used in C4.5. Several alternatives have been proposed, notably orthogonality and Beta. Gain ratio and orthogonality are strongly correlated, and all of the metrics share a common bias towards splits with one or more small expected values, under circumstances where the split likely ocurred by chance. Both classical and Bayesian statistics lead to the multiple hypergeometric distribution as the posterior probability of the null hypothesis. Both gain and the chisquared significance test are shown to arise in asymptotic approximations to the hypergeometric, revealing similar criteria for admissibility and showing the nature of their biases. Previous failures to find admissible stopping rules are traced to coupling these biased approximations with one another or with arbitrary thresholds; problems which are overcome by the hypergeometric. Em...
MemoryBased Grammatical Relation Finding
, 2002
"... This memory is called the instance base. For testing, those training instances that are most similar to the test instance are retrieved from memory and their labels are used to assign a label to the test instance. This direct use of all training instances (lazy learning) contrasts with the eager lea ..."
Abstract

Cited by 30 (0 self)
 Add to MetaCart
This memory is called the instance base. For testing, those training instances that are most similar to the test instance are retrieved from memory and their labels are used to assign a label to the test instance. This direct use of all training instances (lazy learning) contrasts with the eager learning of e.g. decision tree or rule learning algorithms which derive an abstract representation from the training instances and then use only this representation when processing test instances
Predicting Rare Events in Temporal Domains
 Proc. of IEEE Intl. Conf. On Data Mining
, 2002
"... Temporal data mining aims at finding patterns in historical data. Our work proposes an approach to extract temporal patterns from data to predict the occurrence of target events, such as computer attacks on host networks, or fraudulent transactions in financial institutions. Our problem formulation ..."
Abstract

Cited by 29 (1 self)
 Add to MetaCart
Temporal data mining aims at finding patterns in historical data. Our work proposes an approach to extract temporal patterns from data to predict the occurrence of target events, such as computer attacks on host networks, or fraudulent transactions in financial institutions. Our problem formulation exhibits two major challenges: 1) we assume events being characterized by categorical features and displaying uneven interarrival times; such an assumption falls outside the scope of classical timeseries analysis, 2) we assume target events are highly infrequent; predictive techniques must deal with the classimbalance problem. We propose an efficient algorithm that tackles the challenges above by transforming the event prediction problem into a search for all frequent eventsets preceding target events. The class imbalance problem is overcome by a search for patterns on the minority class exclusively; the discrimination power of patterns is then validated against other classes. Patterns are then combined into a rulebased model for prediction. Our experimental analysis indicates the types of event sequences where target events can be accurately predicted. 1